Programming Assignment 1--CDA 5155 (Fall 2014)
Assigned: Sept 9, 2014
Due: Sept 30, 2014

1. Purpose

This project is intended to help you understand in detail how a pipelined
implementation works.  You will write a cycle-accurate simulator for a
pipelined implementation of the LC2, complete with data forwarding and
branch prediction.

2. Requirements

This programming assignment requires the construction of a pipeline simulator,
written in C or C++ (or others with permission), for a simple instruction set 
defined below. Solutions to this assignment will include the source code for 
the simulator, any test programs used to verify correct program execution and 
a writeup (of about 2 pages) describing how the test program verify correct 
execution of any legal program in the simulator.  Failure to provide the 
writeup, or failure to provide a complete set suite will result in a lower 
grade, even if the simulator correctly executes all test programs.

3. LC2 Instruction-Set Architecture

For the CDA 5155 programming assignments, you will be using the LC2
(Little Computer 2014). The LC2 is very simple, but it is general
enough to solve complex problems. For this project, you will only need to know
the instruction set and instruction format of the LC2.

The LC2 is an 8-register, 32-bit computer.  All addresses are
word-addresses.  The LC2 has 65536 words of memory.  By assembly-language
convention, register 0 will always contain the value 0.

There are 4 instruction formats (bit 0 is the least-significant bit).  Bits
31-25 are unused for all instructions, and should always be 0.

R-type instructions (add, nand, mult):
    bits 24-22: opcode
    bits 21-19: reg A
    bits 18-16: reg B
    bits 15-3:  unused (should all be 0)
    bits 2-0:   destReg

I-type instructions (lw, sw, beq):
    bits 24-22: opcode
    bits 21-19: reg A
    bits 18-16: reg B
    bits 15-0:  offsetField (an 16-bit, 2's complement number with a range of
		    -32768 to 32767)

O-type instructions (halt, noop):
    bits 24-22: opcode
    bits 21-0:  unused (should all be 0)

-------------------------------------------------------------------------------
Table 1: Description of Machine Instructions
-------------------------------------------------------------------------------
Assembly language 	Opcode in binary		Action
name for instruction	(bits 24, 23, 22)
-------------------------------------------------------------------------------
add (R-type format)	000 			add contents of regA with
						contents of regB, store
						results in destReg.

nand (R-type format)	001			nand contents of regA with
						contents of regB, store
						results in destReg.

lw (I-type format)	010			load regB from memory. Memory
						address is formed by adding
						offsetField with the contents of
						regA.

sw (I-type format)	011			store regB into memory. Memory
						address is formed by adding
						offsetField with the contents of
						regA.

beq (I-type format)	100			if the contents of regA and
						regB are the same, then branch
						to the address PC+1+offsetField,
						where PC is the address of the
						beq instruction.

mult (R-type format)	101 			multiplies contents of regA with
						contents of regB, store
						results in destReg.

halt (O-type format)	110			increment the PC (as with all
						instructions), then halt the
						machine (let the simulator
						notice that the machine
						halted).

noop (O-type format)	111			do nothing.
-------------------------------------------------------------------------------

4. LC2 Assembly Language and Assembler

You will be provided with an assembler that translates LC2 assembly code
into machine code.  The format for a line of assembly code is (<white> means
a series of tabs and/or spaces):

label<white>instruction<white>field0<white>field1<white>field2<white>comments

The leftmost field on a line is the label field.  Valid labels contain a
maximum of 6 characters and can consist of letters and numbers (but must start
with a letter). The label is optional (the white space following the label
field is required).  Labels make it much easier to write assembly-language
programs, since otherwise you would need to modify all address fields each time
you added a line to your assembly-language program!

After the optional label is white space.  Then follows the instruction field,
where the instruction can be any of the assembly-language instruction names
listed in the above table.  After more white space comes a series of fields.
All fields are given as decimal numbers or labels.  The number of fields
depends on the instruction, and unused fields should be ignored (treat them
like comments).

    R-type instructions (add, nand) instructions require 3 fields: field0
    is regA, field1 is regB, and field2 is destReg.

    I-type instructions (lw, sw, beq) require 3 fields: field0 is regA, field1
    is regB, and field2 is either a numeric value for offsetField or a symbolic
    address.  Numeric offsetFields can be positive or negative; symbolic
    addresses are discussed below.

    O-type instructions (noop and halt) require no fields.

Symbolic addresses refer to labels.  For lw or sw instructions, the assembler
should compute offsetField to be equal to the address of the label.  This could
be used with a zero base register to refer to the label, or could be used with
a non-zero base register to index into an array starting at the label.  For beq
instructions, the assembler should translate the label into the numeric
offsetField needed to branch to that label.

After the last used field comes more white space, then any comments.  The
comment field ends at the end of a line.  Comments are vital to creating
understandable assembly-language programs, because the instructions themselves
are rather cryptic.

In addition to LC2 instructions, an assembly-language program may contain
directions for the assembler. The only assembler directive we will use is .fill
(note the leading period). .fill tells the assembler to put a number into the
place where the instruction would normally be stored. .fill instructions use
one field, which can be either a numeric value or a symbolic address.  For
example, ".fill 32" puts the value 32 where the instruction would normally be
stored.  .fill with a symbolic address will store the address of the label.
In the example below, ".fill start" will store the value 2, because the label
"start" is at address 2.

The assembler makes two passes over the assembly-language program. In the
first pass, it will calculate the address for every symbolic label, assuming
that the first instruction is at address 0.  In the second pass, it will
generate a machine-language instruction (in decimal) for each line of assembly
language.  For example, here is an assembly-language program (that counts down
from 5, stopping when it hits 0).

	lw	0	1	five	load reg1 with 5 (uses symbolic address)
	lw	1	2	3	load reg2 with -1 (uses numeric address)
start	add	1	2	1	decrement reg1
	beq	0	1	2	goto end of program when reg1==0
	beq	0	0	start	go back to the beginning of the loop
	noop
done	halt				end of program
five	.fill	5
neg1	.fill	-1
stAddr	.fill	start			will contain the address of start (2)

And here is the corresponding machine language:

(address 0): 8454151 (hex 0x810007)
(address 1): 9043971 (hex 0x8a0003)
(address 2): 655361 (hex 0xa0001)
(address 3): 16842754 (hex 0x1010002)
(address 4): 16842749 (hex 0x100fffd)
(address 5): 29360128 (hex 0x1c00000)
(address 6): 25165824 (hex 0x1800000)
(address 7): 5 (hex 0x5)
(address 8): -1 (hex 0xffffffff)
(address 9): 2 (hex 0x2)

Be sure you understand how the above assembly-language program got translated
to machine language.

Since your programs will always start at address 0, the program will only
output the contents, not the addresses.

8454151
9043971
655361
16842754
16842749
29360128
25165824
5
-1
2

When executing the assembler, the first command line argument is
the file name where the assembly-language program is stored, and the second
argument is the file name where the output (the machine-code) is written.
For example, with a program name of "assemble", an assembly-language program
in "program.as", the following would generate a machine-code file "program.mc":
    
    assemble program.as program.mc

5. LC2 Pipelined Implementation

For this project we will use the datapath from Appendix A of Patterson and
Hennessy.  Of course, since the MIPS and LC2 architectures are slightly
different, we will have to make a few minor changes to the book's datapath.

    1) Instead of a "4" input in the PC's adder, we will use a "1", since the
	LC2 is word-addressed instead of byte-addressed.
    2) The instruction bit fields have to be modified to suit the LC2's
	instruction-set architecture.
    3) The "shift left 2" component is not necessary, since both offsetField
	for branches and the PC use word-addressing.

The main difference between this assignment and the pipelining done in the
book is that we will add a pipeline register AFTER the write-back stage
(the WBEND pipeline register).  This will be used to simplify data forwarding
so that the register file does not have to do any internal forwarding.

To follow the pipelining done in the textbook as closely as possible, we will
use the MIPS clocking scheme (e.g. register file and memory writes require the
data to be present for the whole cycle).

5.1. Memory

Note in the typedef of stateType below that there are two memories: instrMem
and dataMem.  When the program starts, read the machine-code file into BOTH
instrMem and dataMem (i.e. they'll have the same contents in the beginning).
During execution, read instructions from instrMem and perform load/stores using
dataMem.  That is, instrMem will never change after the program starts, but
dataMem will change.  (In a real machine, these two memories would be an
instruction and data cache, and they would be kept consistent.)

5.2. Pipeline Registers

To simplify the project and make the output formats uniform, you can use the
following structures to hold pipeline register contents.
Note that the instruction gets passed down the pipeline in its entirety.

#define NUMMEMORY 65536 /* maximum number of data words in memory */
#define NUMREGS 8 /* number of machine registers */

#define ADD 0
#define NAND 1
#define LW 2
#define SW 3
#define BEQ 4
#define MULT 5
#define HALT 6
#define NOOP 7

#define NOOPINSTRUCTION 0x1c00000

typedef struct IFIDStruct {
    int instr;
    int pcPlus1;
} IFIDType;

typedef struct IDEXStruct {
    int instr;
    int pcPlus1;
    int readRegA;
    int readRegB;
    int offset;
} IDEXType;

typedef struct EXMEMStruct {
    int instr;
    int branchTarget;
    int aluResult;
    int readRegB;
} EXMEMType;

typedef struct MEMWBStruct {
    int instr;
    int writeData;
} MEMWBType;

typedef struct WBENDStruct {
    int instr;
    int writeData;
} WBENDType;

typedef struct stateStruct {
    int pc;
    int instrMem[NUMMEMORY];
    int dataMem[NUMMEMORY];
    int reg[NUMREGS];
    int numMemory;
    IFIDType IFID;
    IDEXType IDEX;
    EXMEMType EXMEM;
    MEMWBType MEMWB;
    WBENDType WBEND;
    int cycles; /* number of cycles run so far */
} stateType;

6. Problem

6.1. Basic Structure

Your task is to write a cycle-accurate simulator for the LC2.
At the start of the program, initialize the pc and all registers to zero.
Initialize the instruction field in all pipeline registers to the noop
instruction (0x1c00000).

run() will be a loop, where each iteration through the loop executes one cycle.
At the beginning of the cycle, print the complete state of the machine (you
may use the printState function at the end of this handout).  In the body of
the loop, you will figure out what the new state of the machine (memory,
registers, pipeline registers) will be at the end of the cycle. 
Conceptually all stages of the pipeline compute their new state
simultaneously.  Since statements execute sequentially in C rather than
simultaneously, you will need two state variables: state and newState.  state
will be the state of the machine while the cycle is executing; newState will be
the state of the machine at the end of the cycle.  Each stage of the pipeline
will modify the newState variable using the current values in the state
variable.  E.g. in the ID stage, you will have a statement like

    newState.IDEX.instr = state.IFID.instr (to transfer the instruction in
					the IFID register to the IDEX register)

In the body of the loop, you will use newState ONLY as the target of an
assignment and you will use state ONLY as the source of an assignment (e.g.
newState... = state...).  state should never appear on the left-hand side of an
assignment (except for array subscripts), and newState should never appear on
the right-hand side of an assignment.

Your simulator must be pipelined.  This means that the work of carrying out an
instruction should be done in different stages of the pipeline as done in the
textbook and the execution of multiple instructions should be overlapped.  The
ID stage should be the ONLY stage that reads the register file; the other
stages must get the register values from a pipeline register.  If it violates
these criteria, your program will get a 0.

Here's the main loop in run().  Add to this code, but don't otherwise modify it
(and leave the comments as is) so I can understand your program more easily.

    while (1) {

	printState(&state);

	/* check for halt */
	if (opcode(state.MEMWB.instr) == HALT) {
	    printf("machine halted\n");
	    printf("total of %d cycles executed\n", state.cycles);
	    exit(0);
	}

	newState = state;
	newState.cycles++;

	/* --------------------- IF stage --------------------- */

	/* --------------------- ID stage --------------------- */

	/* --------------------- EX stage --------------------- */

	/* --------------------- MEM stage --------------------- */

	/* --------------------- WB stage --------------------- */

	state = newState; /* this is the last statement before end of the loop.
			    It marks the end of the cycle and updates the
			    current state with the values calculated in this
			    cycle */
    }

6.2. Halting

At what point does the pipelined computer know to halt?  It's incorrect to halt
as soon as a halt instruction is fetched because if an earlier branch was
actually taken, then the halt instruction could actually have been branched
around.

To solve this problem, halt the machine when a halt instruction reaches the
MEMWB register.  This ensures that previously executed instructions have
completed, and it also ensures that the machine won't branch around this halt.
This solution is shown above; note how the final printState call before the
check for halt will print the final state of the machine.


6.3. Begin Your Implementation Assuming No Hazards

The easiest way to start is to first write your simulator so that it does not
account for data or branch hazards.  This will allow you to get started right
away.  Of course, the simulator will only be able to correctly run
assembly-language programs that have no hazards.  It is thus the responsibility
of the assembly-language programmer to insert noop instructions so that there
are no data or branch hazards.  This means putting a number of noops in an
assembly-language program after a branch and a number of noops in an
assembly-language program before a dependent data operation (it's a good
exercise to figure out the minimum number needed in each situation).

6.4. Accounting for Data Hazards

Modifying your first implementation to account for data and branch hazards will
probably be the hardest part of this assignment.

Use data forwarding to resolve most data hazards.  I.e. the ALU should be able
to take its inputs from any pipeline register (instead of just the IDEX
register).  There is no need for forwarding within the register file.  For
this case of forwarding, you'll instead forward data from the WBEND pipeline
register.  Remember to take the most recent data (e.g. data in the EXMEM
register gets priority over data in the MEMWB register).  ONLY FORWARD DATA
TO THE EX STAGE.

You will need to stall for one type of data hazard: a lw followed by an
instruction that uses the register being loaded. 

6.4. Accounting for Control Hazards

You will implement a branch predictor that uses a 4-entry pattern history table
containing a 2-bit state machine. The initial state will be weakly NOT-TAKEN.
You will also implement a 4-entry branch target buffer organized as a
fully associatve cache with FIFO replacement.
The branch target buffer tag is the PC of the branch instruction, and the data
portion is the target address of the branch last time it was calculated. 
Entries are put into the BTB only when a branch is resolved and TAKEN. 
If you predict a branch to be taken but do not find an entry in the BTB,
fetch (speculatively) from PC+1.

6.5. Output requirements

In addition to the output generated by the PrintState() function, you should
generate some additional statistics to be printed at the end of execution. 
These statistics include:

CYCLES:   cycle time to complete program (cycle when halt reaches MEM stage)
FETCHED:  # of instruction fetched (including instructions squashed because
          of branch misprediction)
RETIRED:  # of instruction completed
BRANCHES: # of branches executed (i.e., resolved)
MISPRED:  # of branches incorrectly predicted

7. Running Your Program

Your simulator should be run using the command format:

	simulate program.mc > output

8. Test Cases

An integral (and graded) part of writing your pipeline simulator will be to
write a suite of test cases to validate any LC2 pipeline simulator.  This
is common practice in the real world--software companies maintain a suite of
test cases for their programs and use this suite to check the program's
correctness after a change.  Writing a comprehensive suite of test cases will
deepen your understanding of the project specification and your program, and
it will help you a lot as you debug your program.

The test cases for this project will be short assembly-language programs that,
after being assembled into machine code, serve as input to a simulator.  You
will submit your suite of test cases together with your simulator, and we will
grade your test suite according to how thoroughly it exercises an LC2
pipeline simulator. 

Your test cases must include a comprehensive evaluation of both control and
data hazard processing to receive full credit.

9. Writeup

Finally you will produce a document describing the overall operation of your
simulator as well as a discussion of how each test case demonstrates correct
operation of some portion of your pipeline implementation.  This includes
executing each instruction type, correctly forwarding all data hazards and
correct operation of both the BTB and gshare predictor.

10. Turning in the Project

Use the unix tar command to create a file containing your simulator source,
test programs and Project writeup (in text, pdf, ps or word). Email the
tar file to Martin Brown (mbrown@cs.fsu.edu) with the subject
"CDA 5155 P1 Submission".

11. Program Fragment

Here's the code for printState and associated functions to help in
unerstanding the instruction flow through the pipeline.

void
printState(stateType *statePtr)
{
    int i;
    printf("\n@@@\nstate before cycle %d starts\n", statePtr->cycles);
    printf("\tpc %d\n", statePtr->pc);

    printf("\tdata memory:\n");
	for (i=0; i<statePtr->numMemory; i++) {
	    printf("\t\tdataMem[ %d ] %d\n", i, statePtr->dataMem[i]);
	}
    printf("\tregisters:\n");
	for (i=0; i<NUMREGS; i++) {
	    printf("\t\treg[ %d ] %d\n", i, statePtr->reg[i]);
	}
    printf("\tIFID:\n");
	printf("\t\tinstruction ");
	printInstruction(statePtr->IFID.instr);
	printf("\t\tpcPlus1 %d\n", statePtr->IFID.pcPlus1);
    printf("\tIDEX:\n");
	printf("\t\tinstruction ");
	printInstruction(statePtr->IDEX.instr);
	printf("\t\tpcPlus1 %d\n", statePtr->IDEX.pcPlus1);
	printf("\t\treadRegA %d\n", statePtr->IDEX.readRegA);
	printf("\t\treadRegB %d\n", statePtr->IDEX.readRegB);
	printf("\t\toffset %d\n", statePtr->IDEX.offset);
    printf("\tEXMEM:\n");
	printf("\t\tinstruction ");
	printInstruction(statePtr->EXMEM.instr);
	printf("\t\tbranchTarget %d\n", statePtr->EXMEM.branchTarget);
	printf("\t\taluResult %d\n", statePtr->EXMEM.aluResult);
	printf("\t\treadRegB %d\n", statePtr->EXMEM.readRegB);
    printf("\tMEMWB:\n");
	printf("\t\tinstruction ");
	printInstruction(statePtr->MEMWB.instr);
	printf("\t\twriteData %d\n", statePtr->MEMWB.writeData);
    printf("\tWBEND:\n");
	printf("\t\tinstruction ");
	printInstruction(statePtr->WBEND.instr);
	printf("\t\twriteData %d\n", statePtr->WBEND.writeData);
}

int
field0(int instruction)
{
    return( (instruction>>19) & 0x7);
}

int
field1(int instruction)
{
    return( (instruction>>16) & 0x7);
}

int
field2(int instruction)
{
    return(instruction & 0xFFFF);
}

int opcode(int instruction)
{
    return(instruction>>22);
}

void
printInstruction(int instr)
{
    char opcodeString[10];
    if (opcode(instr) == ADD) {
	strcpy(opcodeString, "add");
    } else if (opcode(instr) == NAND) {
	strcpy(opcodeString, "nand");
    } else if (opcode(instr) == LW) {
	strcpy(opcodeString, "lw");
    } else if (opcode(instr) == SW) {
	strcpy(opcodeString, "sw");
    } else if (opcode(instr) == BEQ) {
	strcpy(opcodeString, "beq");
    } else if (opcode(instr) == JALR) {
	strcpy(opcodeString, "jalr");
    } else if (opcode(instr) == HALT) {
	strcpy(opcodeString, "halt");
    } else if (opcode(instr) == NOOP) {
	strcpy(opcodeString, "noop");
    } else {
	strcpy(opcodeString, "data");
    }

    printf("%s %d %d %d\n", opcodeString, field0(instr), field1(instr),
	field2(instr));
}

12. Sample Assembly-Language Program and Output

Here is a sample assembly-language program:

	lw	0	1	data1	$1= mem[data1]
	halt
data1	.fill	12345

and its corresponding output.  Note especially how halt is done (the add 0 0 0
instructions after the halt are from memory locations after the halt, which
were initialized to 0).  Do you know where the add 0 0 12345 instruction came
from?

memory[0]=8454146
memory[1]=25165824
memory[2]=12345
3 memory words
	instruction memory:
		instrMem[ 0 ] lw 0 1 2
		instrMem[ 1 ] halt 0 0 0
		instrMem[ 2 ] add 0 0 12345

@@@
state before cycle 0 starts
	pc 0
	data memory:
		dataMem[ 0 ] 8454146
		dataMem[ 1 ] 25165824
		dataMem[ 2 ] 12345
	registers:
		reg[ 0 ] 0
		reg[ 1 ] 0
		reg[ 2 ] 0
		reg[ 3 ] 0
		reg[ 4 ] 0
		reg[ 5 ] 0
		reg[ 6 ] 0
		reg[ 7 ] 0
	IFID:
		instruction noop 0 0 0
		pcPlus1 -12973480
	IDEX:
		instruction noop 0 0 0
		pcPlus1 0
		readRegA 6
		readRegB 1
		offset 0
	EXMEM:
		instruction noop 0 0 0
		branchTarget -12974332
		aluResult -14024712
		readRegB 12
	MEMWB:
		instruction noop 0 0 0
		writeData -14040720
	WBEND:
		instruction noop 0 0 0
		writeData -4262240

@@@
state before cycle 1 starts
	pc 1
	data memory:
		dataMem[ 0 ] 8454146
		dataMem[ 1 ] 25165824
		dataMem[ 2 ] 12345
	registers:
		reg[ 0 ] 0
		reg[ 1 ] 0
		reg[ 2 ] 0
		reg[ 3 ] 0
		reg[ 4 ] 0
		reg[ 5 ] 0
		reg[ 6 ] 0
		reg[ 7 ] 0
	IFID:
		instruction lw 0 1 2
		pcPlus1 1
	IDEX:
		instruction noop 0 0 0
		pcPlus1 -12973480
		readRegA 0
		readRegB 0
		offset 0
	EXMEM:
		instruction noop 0 0 0
		branchTarget 0
		aluResult -14024712
		readRegB 12
	MEMWB:
		instruction noop 0 0 0
		writeData -14040720
	WBEND:
		instruction noop 0 0 0
		writeData -14040720

@@@
state before cycle 2 starts
	pc 2
	data memory:
		dataMem[ 0 ] 8454146
		dataMem[ 1 ] 25165824
		dataMem[ 2 ] 12345
	registers:
		reg[ 0 ] 0
		reg[ 1 ] 0
		reg[ 2 ] 0
		reg[ 3 ] 0
		reg[ 4 ] 0
		reg[ 5 ] 0
		reg[ 6 ] 0
		reg[ 7 ] 0
	IFID:
		instruction halt 0 0 0
		pcPlus1 2
	IDEX:
		instruction lw 0 1 2
		pcPlus1 1
		readRegA 0
		readRegB 0
		offset 2
	EXMEM:
		instruction noop 0 0 0
		branchTarget -12973480
		aluResult -14024712
		readRegB 12
	MEMWB:
		instruction noop 0 0 0
		writeData -14040720
	WBEND:
		instruction noop 0 0 0
		writeData -14040720

@@@
state before cycle 3 starts
	pc 3
	data memory:
		dataMem[ 0 ] 8454146
		dataMem[ 1 ] 25165824
		dataMem[ 2 ] 12345
	registers:
		reg[ 0 ] 0
		reg[ 1 ] 0
		reg[ 2 ] 0
		reg[ 3 ] 0
		reg[ 4 ] 0
		reg[ 5 ] 0
		reg[ 6 ] 0
		reg[ 7 ] 0
	IFID:
		instruction add 0 0 12345
		pcPlus1 3
	IDEX:
		instruction halt 0 0 0
		pcPlus1 2
		readRegA 0
		readRegB 0
		offset 0
	EXMEM:
		instruction lw 0 1 2
		branchTarget 3
		aluResult 2
		readRegB 0
	MEMWB:
		instruction noop 0 0 0
		writeData -14040720
	WBEND:
		instruction noop 0 0 0
		writeData -14040720

@@@
state before cycle 4 starts
	pc 4
	data memory:
		dataMem[ 0 ] 8454146
		dataMem[ 1 ] 25165824
		dataMem[ 2 ] 12345
	registers:
		reg[ 0 ] 0
		reg[ 1 ] 0
		reg[ 2 ] 0
		reg[ 3 ] 0
		reg[ 4 ] 0
		reg[ 5 ] 0
		reg[ 6 ] 0
		reg[ 7 ] 0
	IFID:
		instruction add 0 0 0
		pcPlus1 4
	IDEX:
		instruction add 0 0 12345
		pcPlus1 3
		readRegA 0
		readRegB 0
		offset 12345
	EXMEM:
		instruction halt 0 0 0
		branchTarget 2
		aluResult 2
		readRegB 0
	MEMWB:
		instruction lw 0 1 2
		writeData 12345
	WBEND:
		instruction noop 0 0 0
		writeData -14040720

@@@
state before cycle 5 starts
	pc 5
	data memory:
		dataMem[ 0 ] 8454146
		dataMem[ 1 ] 25165824
		dataMem[ 2 ] 12345
	registers:
		reg[ 0 ] 0
		reg[ 1 ] 12345
		reg[ 2 ] 0
		reg[ 3 ] 0
		reg[ 4 ] 0
		reg[ 5 ] 0
		reg[ 6 ] 0
		reg[ 7 ] 0
	IFID:
		instruction add 0 0 0
		pcPlus1 5
	IDEX:
		instruction add 0 0 0
		pcPlus1 4
		readRegA 0
		readRegB 0
		offset 0
	EXMEM:
		instruction add 0 0 12345
		branchTarget 12348
		aluResult 0
		readRegB 0
	MEMWB:
		instruction halt 0 0 0
		writeData 12345
	WBEND:
		instruction lw 0 1 2
		writeData 12345
machine halted
CYCLES:   5
FETCHED:  2
RETIRED:  2
BRANCHES: 0
MISPRED:  0