Programming Assignment 1--CDA 5155 (Fall 2014) Assigned: Sept 9, 2014 Due: Sept 30, 2014 1. Purpose This project is intended to help you understand in detail how a pipelined implementation works. You will write a cycle-accurate simulator for a pipelined implementation of the LC2, complete with data forwarding and branch prediction. 2. Requirements This programming assignment requires the construction of a pipeline simulator, written in C or C++ (or others with permission), for a simple instruction set defined below. Solutions to this assignment will include the source code for the simulator, any test programs used to verify correct program execution and a writeup (of about 2 pages) describing how the test program verify correct execution of any legal program in the simulator. Failure to provide the writeup, or failure to provide a complete set suite will result in a lower grade, even if the simulator correctly executes all test programs. 3. LC2 Instruction-Set Architecture For the CDA 5155 programming assignments, you will be using the LC2 (Little Computer 2014). The LC2 is very simple, but it is general enough to solve complex problems. For this project, you will only need to know the instruction set and instruction format of the LC2. The LC2 is an 8-register, 32-bit computer. All addresses are word-addresses. The LC2 has 65536 words of memory. By assembly-language convention, register 0 will always contain the value 0. There are 4 instruction formats (bit 0 is the least-significant bit). Bits 31-25 are unused for all instructions, and should always be 0. R-type instructions (add, nand, mult): bits 24-22: opcode bits 21-19: reg A bits 18-16: reg B bits 15-3: unused (should all be 0) bits 2-0: destReg I-type instructions (lw, sw, beq): bits 24-22: opcode bits 21-19: reg A bits 18-16: reg B bits 15-0: offsetField (an 16-bit, 2's complement number with a range of -32768 to 32767) O-type instructions (halt, noop): bits 24-22: opcode bits 21-0: unused (should all be 0) ------------------------------------------------------------------------------- Table 1: Description of Machine Instructions ------------------------------------------------------------------------------- Assembly language Opcode in binary Action name for instruction (bits 24, 23, 22) ------------------------------------------------------------------------------- add (R-type format) 000 add contents of regA with contents of regB, store results in destReg. nand (R-type format) 001 nand contents of regA with contents of regB, store results in destReg. lw (I-type format) 010 load regB from memory. Memory address is formed by adding offsetField with the contents of regA. sw (I-type format) 011 store regB into memory. Memory address is formed by adding offsetField with the contents of regA. beq (I-type format) 100 if the contents of regA and regB are the same, then branch to the address PC+1+offsetField, where PC is the address of the beq instruction. mult (R-type format) 101 multiplies contents of regA with contents of regB, store results in destReg. halt (O-type format) 110 increment the PC (as with all instructions), then halt the machine (let the simulator notice that the machine halted). noop (O-type format) 111 do nothing. ------------------------------------------------------------------------------- 4. LC2 Assembly Language and Assembler You will be provided with an assembler that translates LC2 assembly code into machine code. The format for a line of assembly code is ( means a series of tabs and/or spaces): labelinstructionfield0field1field2comments The leftmost field on a line is the label field. Valid labels contain a maximum of 6 characters and can consist of letters and numbers (but must start with a letter). The label is optional (the white space following the label field is required). Labels make it much easier to write assembly-language programs, since otherwise you would need to modify all address fields each time you added a line to your assembly-language program! After the optional label is white space. Then follows the instruction field, where the instruction can be any of the assembly-language instruction names listed in the above table. After more white space comes a series of fields. All fields are given as decimal numbers or labels. The number of fields depends on the instruction, and unused fields should be ignored (treat them like comments). R-type instructions (add, nand) instructions require 3 fields: field0 is regA, field1 is regB, and field2 is destReg. I-type instructions (lw, sw, beq) require 3 fields: field0 is regA, field1 is regB, and field2 is either a numeric value for offsetField or a symbolic address. Numeric offsetFields can be positive or negative; symbolic addresses are discussed below. O-type instructions (noop and halt) require no fields. Symbolic addresses refer to labels. For lw or sw instructions, the assembler should compute offsetField to be equal to the address of the label. This could be used with a zero base register to refer to the label, or could be used with a non-zero base register to index into an array starting at the label. For beq instructions, the assembler should translate the label into the numeric offsetField needed to branch to that label. After the last used field comes more white space, then any comments. The comment field ends at the end of a line. Comments are vital to creating understandable assembly-language programs, because the instructions themselves are rather cryptic. In addition to LC2 instructions, an assembly-language program may contain directions for the assembler. The only assembler directive we will use is .fill (note the leading period). .fill tells the assembler to put a number into the place where the instruction would normally be stored. .fill instructions use one field, which can be either a numeric value or a symbolic address. For example, ".fill 32" puts the value 32 where the instruction would normally be stored. .fill with a symbolic address will store the address of the label. In the example below, ".fill start" will store the value 2, because the label "start" is at address 2. The assembler makes two passes over the assembly-language program. In the first pass, it will calculate the address for every symbolic label, assuming that the first instruction is at address 0. In the second pass, it will generate a machine-language instruction (in decimal) for each line of assembly language. For example, here is an assembly-language program (that counts down from 5, stopping when it hits 0). lw 0 1 five load reg1 with 5 (uses symbolic address) lw 1 2 3 load reg2 with -1 (uses numeric address) start add 1 2 1 decrement reg1 beq 0 1 2 goto end of program when reg1==0 beq 0 0 start go back to the beginning of the loop noop done halt end of program five .fill 5 neg1 .fill -1 stAddr .fill start will contain the address of start (2) And here is the corresponding machine language: (address 0): 8454151 (hex 0x810007) (address 1): 9043971 (hex 0x8a0003) (address 2): 655361 (hex 0xa0001) (address 3): 16842754 (hex 0x1010002) (address 4): 16842749 (hex 0x100fffd) (address 5): 29360128 (hex 0x1c00000) (address 6): 25165824 (hex 0x1800000) (address 7): 5 (hex 0x5) (address 8): -1 (hex 0xffffffff) (address 9): 2 (hex 0x2) Be sure you understand how the above assembly-language program got translated to machine language. Since your programs will always start at address 0, the program will only output the contents, not the addresses. 8454151 9043971 655361 16842754 16842749 29360128 25165824 5 -1 2 When executing the assembler, the first command line argument is the file name where the assembly-language program is stored, and the second argument is the file name where the output (the machine-code) is written. For example, with a program name of "assemble", an assembly-language program in "program.as", the following would generate a machine-code file "program.mc": assemble program.as program.mc 5. LC2 Pipelined Implementation For this project we will use the datapath from Appendix A of Patterson and Hennessy. Of course, since the MIPS and LC2 architectures are slightly different, we will have to make a few minor changes to the book's datapath. 1) Instead of a "4" input in the PC's adder, we will use a "1", since the LC2 is word-addressed instead of byte-addressed. 2) The instruction bit fields have to be modified to suit the LC2's instruction-set architecture. 3) The "shift left 2" component is not necessary, since both offsetField for branches and the PC use word-addressing. The main difference between this assignment and the pipelining done in the book is that we will add a pipeline register AFTER the write-back stage (the WBEND pipeline register). This will be used to simplify data forwarding so that the register file does not have to do any internal forwarding. To follow the pipelining done in the textbook as closely as possible, we will use the MIPS clocking scheme (e.g. register file and memory writes require the data to be present for the whole cycle). 5.1. Memory Note in the typedef of stateType below that there are two memories: instrMem and dataMem. When the program starts, read the machine-code file into BOTH instrMem and dataMem (i.e. they'll have the same contents in the beginning). During execution, read instructions from instrMem and perform load/stores using dataMem. That is, instrMem will never change after the program starts, but dataMem will change. (In a real machine, these two memories would be an instruction and data cache, and they would be kept consistent.) 5.2. Pipeline Registers To simplify the project and make the output formats uniform, you can use the following structures to hold pipeline register contents. Note that the instruction gets passed down the pipeline in its entirety. #define NUMMEMORY 65536 /* maximum number of data words in memory */ #define NUMREGS 8 /* number of machine registers */ #define ADD 0 #define NAND 1 #define LW 2 #define SW 3 #define BEQ 4 #define MULT 5 #define HALT 6 #define NOOP 7 #define NOOPINSTRUCTION 0x1c00000 typedef struct IFIDStruct { int instr; int pcPlus1; } IFIDType; typedef struct IDEXStruct { int instr; int pcPlus1; int readRegA; int readRegB; int offset; } IDEXType; typedef struct EXMEMStruct { int instr; int branchTarget; int aluResult; int readRegB; } EXMEMType; typedef struct MEMWBStruct { int instr; int writeData; } MEMWBType; typedef struct WBENDStruct { int instr; int writeData; } WBENDType; typedef struct stateStruct { int pc; int instrMem[NUMMEMORY]; int dataMem[NUMMEMORY]; int reg[NUMREGS]; int numMemory; IFIDType IFID; IDEXType IDEX; EXMEMType EXMEM; MEMWBType MEMWB; WBENDType WBEND; int cycles; /* number of cycles run so far */ } stateType; 6. Problem 6.1. Basic Structure Your task is to write a cycle-accurate simulator for the LC2. At the start of the program, initialize the pc and all registers to zero. Initialize the instruction field in all pipeline registers to the noop instruction (0x1c00000). run() will be a loop, where each iteration through the loop executes one cycle. At the beginning of the cycle, print the complete state of the machine (you may use the printState function at the end of this handout). In the body of the loop, you will figure out what the new state of the machine (memory, registers, pipeline registers) will be at the end of the cycle. Conceptually all stages of the pipeline compute their new state simultaneously. Since statements execute sequentially in C rather than simultaneously, you will need two state variables: state and newState. state will be the state of the machine while the cycle is executing; newState will be the state of the machine at the end of the cycle. Each stage of the pipeline will modify the newState variable using the current values in the state variable. E.g. in the ID stage, you will have a statement like newState.IDEX.instr = state.IFID.instr (to transfer the instruction in the IFID register to the IDEX register) In the body of the loop, you will use newState ONLY as the target of an assignment and you will use state ONLY as the source of an assignment (e.g. newState... = state...). state should never appear on the left-hand side of an assignment (except for array subscripts), and newState should never appear on the right-hand side of an assignment. Your simulator must be pipelined. This means that the work of carrying out an instruction should be done in different stages of the pipeline as done in the textbook and the execution of multiple instructions should be overlapped. The ID stage should be the ONLY stage that reads the register file; the other stages must get the register values from a pipeline register. If it violates these criteria, your program will get a 0. Here's the main loop in run(). Add to this code, but don't otherwise modify it (and leave the comments as is) so I can understand your program more easily. while (1) { printState(&state); /* check for halt */ if (opcode(state.MEMWB.instr) == HALT) { printf("machine halted\n"); printf("total of %d cycles executed\n", state.cycles); exit(0); } newState = state; newState.cycles++; /* --------------------- IF stage --------------------- */ /* --------------------- ID stage --------------------- */ /* --------------------- EX stage --------------------- */ /* --------------------- MEM stage --------------------- */ /* --------------------- WB stage --------------------- */ state = newState; /* this is the last statement before end of the loop. It marks the end of the cycle and updates the current state with the values calculated in this cycle */ } 6.2. Halting At what point does the pipelined computer know to halt? It's incorrect to halt as soon as a halt instruction is fetched because if an earlier branch was actually taken, then the halt instruction could actually have been branched around. To solve this problem, halt the machine when a halt instruction reaches the MEMWB register. This ensures that previously executed instructions have completed, and it also ensures that the machine won't branch around this halt. This solution is shown above; note how the final printState call before the check for halt will print the final state of the machine. 6.3. Begin Your Implementation Assuming No Hazards The easiest way to start is to first write your simulator so that it does not account for data or branch hazards. This will allow you to get started right away. Of course, the simulator will only be able to correctly run assembly-language programs that have no hazards. It is thus the responsibility of the assembly-language programmer to insert noop instructions so that there are no data or branch hazards. This means putting a number of noops in an assembly-language program after a branch and a number of noops in an assembly-language program before a dependent data operation (it's a good exercise to figure out the minimum number needed in each situation). 6.4. Accounting for Data Hazards Modifying your first implementation to account for data and branch hazards will probably be the hardest part of this assignment. Use data forwarding to resolve most data hazards. I.e. the ALU should be able to take its inputs from any pipeline register (instead of just the IDEX register). There is no need for forwarding within the register file. For this case of forwarding, you'll instead forward data from the WBEND pipeline register. Remember to take the most recent data (e.g. data in the EXMEM register gets priority over data in the MEMWB register). ONLY FORWARD DATA TO THE EX STAGE. You will need to stall for one type of data hazard: a lw followed by an instruction that uses the register being loaded. 6.4. Accounting for Control Hazards You will implement a branch predictor that uses a 4-entry pattern history table containing a 2-bit state machine. The initial state will be weakly NOT-TAKEN. You will also implement a 4-entry branch target buffer organized as a fully associatve cache with FIFO replacement. The branch target buffer tag is the PC of the branch instruction, and the data portion is the target address of the branch last time it was calculated. Entries are put into the BTB only when a branch is resolved and TAKEN. If you predict a branch to be taken but do not find an entry in the BTB, fetch (speculatively) from PC+1. 6.5. Output requirements In addition to the output generated by the PrintState() function, you should generate some additional statistics to be printed at the end of execution. These statistics include: CYCLES: cycle time to complete program (cycle when halt reaches MEM stage) FETCHED: # of instruction fetched (including instructions squashed because of branch misprediction) RETIRED: # of instruction completed BRANCHES: # of branches executed (i.e., resolved) MISPRED: # of branches incorrectly predicted 7. Running Your Program Your simulator should be run using the command format: simulate program.mc > output 8. Test Cases An integral (and graded) part of writing your pipeline simulator will be to write a suite of test cases to validate any LC2 pipeline simulator. This is common practice in the real world--software companies maintain a suite of test cases for their programs and use this suite to check the program's correctness after a change. Writing a comprehensive suite of test cases will deepen your understanding of the project specification and your program, and it will help you a lot as you debug your program. The test cases for this project will be short assembly-language programs that, after being assembled into machine code, serve as input to a simulator. You will submit your suite of test cases together with your simulator, and we will grade your test suite according to how thoroughly it exercises an LC2 pipeline simulator. Your test cases must include a comprehensive evaluation of both control and data hazard processing to receive full credit. 9. Writeup Finally you will produce a document describing the overall operation of your simulator as well as a discussion of how each test case demonstrates correct operation of some portion of your pipeline implementation. This includes executing each instruction type, correctly forwarding all data hazards and correct operation of both the BTB and gshare predictor. 10. Turning in the Project Use the unix tar command to create a file containing your simulator source, test programs and Project writeup (in text, pdf, ps or word). Email the tar file to Martin Brown (mbrown@cs.fsu.edu) with the subject "CDA 5155 P1 Submission". 11. Program Fragment Here's the code for printState and associated functions to help in unerstanding the instruction flow through the pipeline. void printState(stateType *statePtr) { int i; printf("\n@@@\nstate before cycle %d starts\n", statePtr->cycles); printf("\tpc %d\n", statePtr->pc); printf("\tdata memory:\n"); for (i=0; inumMemory; i++) { printf("\t\tdataMem[ %d ] %d\n", i, statePtr->dataMem[i]); } printf("\tregisters:\n"); for (i=0; ireg[i]); } printf("\tIFID:\n"); printf("\t\tinstruction "); printInstruction(statePtr->IFID.instr); printf("\t\tpcPlus1 %d\n", statePtr->IFID.pcPlus1); printf("\tIDEX:\n"); printf("\t\tinstruction "); printInstruction(statePtr->IDEX.instr); printf("\t\tpcPlus1 %d\n", statePtr->IDEX.pcPlus1); printf("\t\treadRegA %d\n", statePtr->IDEX.readRegA); printf("\t\treadRegB %d\n", statePtr->IDEX.readRegB); printf("\t\toffset %d\n", statePtr->IDEX.offset); printf("\tEXMEM:\n"); printf("\t\tinstruction "); printInstruction(statePtr->EXMEM.instr); printf("\t\tbranchTarget %d\n", statePtr->EXMEM.branchTarget); printf("\t\taluResult %d\n", statePtr->EXMEM.aluResult); printf("\t\treadRegB %d\n", statePtr->EXMEM.readRegB); printf("\tMEMWB:\n"); printf("\t\tinstruction "); printInstruction(statePtr->MEMWB.instr); printf("\t\twriteData %d\n", statePtr->MEMWB.writeData); printf("\tWBEND:\n"); printf("\t\tinstruction "); printInstruction(statePtr->WBEND.instr); printf("\t\twriteData %d\n", statePtr->WBEND.writeData); } int field0(int instruction) { return( (instruction>>19) & 0x7); } int field1(int instruction) { return( (instruction>>16) & 0x7); } int field2(int instruction) { return(instruction & 0xFFFF); } int opcode(int instruction) { return(instruction>>22); } void printInstruction(int instr) { char opcodeString[10]; if (opcode(instr) == ADD) { strcpy(opcodeString, "add"); } else if (opcode(instr) == NAND) { strcpy(opcodeString, "nand"); } else if (opcode(instr) == LW) { strcpy(opcodeString, "lw"); } else if (opcode(instr) == SW) { strcpy(opcodeString, "sw"); } else if (opcode(instr) == BEQ) { strcpy(opcodeString, "beq"); } else if (opcode(instr) == JALR) { strcpy(opcodeString, "jalr"); } else if (opcode(instr) == HALT) { strcpy(opcodeString, "halt"); } else if (opcode(instr) == NOOP) { strcpy(opcodeString, "noop"); } else { strcpy(opcodeString, "data"); } printf("%s %d %d %d\n", opcodeString, field0(instr), field1(instr), field2(instr)); } 12. Sample Assembly-Language Program and Output Here is a sample assembly-language program: lw 0 1 data1 $1= mem[data1] halt data1 .fill 12345 and its corresponding output. Note especially how halt is done (the add 0 0 0 instructions after the halt are from memory locations after the halt, which were initialized to 0). Do you know where the add 0 0 12345 instruction came from? memory[0]=8454146 memory[1]=25165824 memory[2]=12345 3 memory words instruction memory: instrMem[ 0 ] lw 0 1 2 instrMem[ 1 ] halt 0 0 0 instrMem[ 2 ] add 0 0 12345 @@@ state before cycle 0 starts pc 0 data memory: dataMem[ 0 ] 8454146 dataMem[ 1 ] 25165824 dataMem[ 2 ] 12345 registers: reg[ 0 ] 0 reg[ 1 ] 0 reg[ 2 ] 0 reg[ 3 ] 0 reg[ 4 ] 0 reg[ 5 ] 0 reg[ 6 ] 0 reg[ 7 ] 0 IFID: instruction noop 0 0 0 pcPlus1 -12973480 IDEX: instruction noop 0 0 0 pcPlus1 0 readRegA 6 readRegB 1 offset 0 EXMEM: instruction noop 0 0 0 branchTarget -12974332 aluResult -14024712 readRegB 12 MEMWB: instruction noop 0 0 0 writeData -14040720 WBEND: instruction noop 0 0 0 writeData -4262240 @@@ state before cycle 1 starts pc 1 data memory: dataMem[ 0 ] 8454146 dataMem[ 1 ] 25165824 dataMem[ 2 ] 12345 registers: reg[ 0 ] 0 reg[ 1 ] 0 reg[ 2 ] 0 reg[ 3 ] 0 reg[ 4 ] 0 reg[ 5 ] 0 reg[ 6 ] 0 reg[ 7 ] 0 IFID: instruction lw 0 1 2 pcPlus1 1 IDEX: instruction noop 0 0 0 pcPlus1 -12973480 readRegA 0 readRegB 0 offset 0 EXMEM: instruction noop 0 0 0 branchTarget 0 aluResult -14024712 readRegB 12 MEMWB: instruction noop 0 0 0 writeData -14040720 WBEND: instruction noop 0 0 0 writeData -14040720 @@@ state before cycle 2 starts pc 2 data memory: dataMem[ 0 ] 8454146 dataMem[ 1 ] 25165824 dataMem[ 2 ] 12345 registers: reg[ 0 ] 0 reg[ 1 ] 0 reg[ 2 ] 0 reg[ 3 ] 0 reg[ 4 ] 0 reg[ 5 ] 0 reg[ 6 ] 0 reg[ 7 ] 0 IFID: instruction halt 0 0 0 pcPlus1 2 IDEX: instruction lw 0 1 2 pcPlus1 1 readRegA 0 readRegB 0 offset 2 EXMEM: instruction noop 0 0 0 branchTarget -12973480 aluResult -14024712 readRegB 12 MEMWB: instruction noop 0 0 0 writeData -14040720 WBEND: instruction noop 0 0 0 writeData -14040720 @@@ state before cycle 3 starts pc 3 data memory: dataMem[ 0 ] 8454146 dataMem[ 1 ] 25165824 dataMem[ 2 ] 12345 registers: reg[ 0 ] 0 reg[ 1 ] 0 reg[ 2 ] 0 reg[ 3 ] 0 reg[ 4 ] 0 reg[ 5 ] 0 reg[ 6 ] 0 reg[ 7 ] 0 IFID: instruction add 0 0 12345 pcPlus1 3 IDEX: instruction halt 0 0 0 pcPlus1 2 readRegA 0 readRegB 0 offset 0 EXMEM: instruction lw 0 1 2 branchTarget 3 aluResult 2 readRegB 0 MEMWB: instruction noop 0 0 0 writeData -14040720 WBEND: instruction noop 0 0 0 writeData -14040720 @@@ state before cycle 4 starts pc 4 data memory: dataMem[ 0 ] 8454146 dataMem[ 1 ] 25165824 dataMem[ 2 ] 12345 registers: reg[ 0 ] 0 reg[ 1 ] 0 reg[ 2 ] 0 reg[ 3 ] 0 reg[ 4 ] 0 reg[ 5 ] 0 reg[ 6 ] 0 reg[ 7 ] 0 IFID: instruction add 0 0 0 pcPlus1 4 IDEX: instruction add 0 0 12345 pcPlus1 3 readRegA 0 readRegB 0 offset 12345 EXMEM: instruction halt 0 0 0 branchTarget 2 aluResult 2 readRegB 0 MEMWB: instruction lw 0 1 2 writeData 12345 WBEND: instruction noop 0 0 0 writeData -14040720 @@@ state before cycle 5 starts pc 5 data memory: dataMem[ 0 ] 8454146 dataMem[ 1 ] 25165824 dataMem[ 2 ] 12345 registers: reg[ 0 ] 0 reg[ 1 ] 12345 reg[ 2 ] 0 reg[ 3 ] 0 reg[ 4 ] 0 reg[ 5 ] 0 reg[ 6 ] 0 reg[ 7 ] 0 IFID: instruction add 0 0 0 pcPlus1 5 IDEX: instruction add 0 0 0 pcPlus1 4 readRegA 0 readRegB 0 offset 0 EXMEM: instruction add 0 0 12345 branchTarget 12348 aluResult 0 readRegB 0 MEMWB: instruction halt 0 0 0 writeData 12345 WBEND: instruction lw 0 1 2 writeData 12345 machine halted CYCLES: 5 FETCHED: 2 RETIRED: 2 BRANCHES: 0 MISPRED: 0