Chapter 1

see Concepts Introduced in Chapter 5

In this chapter we will go over some of the details for the implementation of a single cycle and multiple cycle processors for a subset of the MIPS instructions. We will investigate the construction of the datapath and control (control being accomplished with both finite state machines and microprogramming). We will also go over how exceptions and interrupts are handled by a processor.

see Subset of MIPS Instruction Set

We will look at the implementation of a processor that implements a subset of the MIPS instruction set. Note that there are no typically long running instructions, such as integer multiply and divide or floating-point instructions. We just don't have time to go over everything.

see Figure 5.1: An abstract view of the implementation of the MIPS subset showing the major functional units and the major connections between them.

Several points should be observed:
  1. Separate instruction and data memory.
  2. PC is used to determine where to fetch the next instruction from the instruction memory.
  3. The register operands used by an instruction are specified by fields of that instruction.
  4. ALU is used to calculate an arithmetic result, compute a memory address, or perform a comparison for a branch.
  5. The result of an arithmetic/logical instruction is written back into the register file.
  6. Details about the implementation of a branch instruction are missing (no update of the PC).

see Logic Elements

There are two different types of logic elements we will be using:
  1. combinational - Outputs depend only on the current inputs. This includes the ALU; given same set of inputs, it always produces the same output because no internal storage.
  2. state - Has some internal storage, has at least two inputs (the data value to be written and the clock which controls when the data value is to be stored) and one output (the value written in the previous clock). State elements include registers and memory. State elements allow values to be stored across cycles.

see Types of Signals

Three general types of signals (clock is when to do something, control is whether or not we should do it, data is the actual data to be stored or manipulated). Note asserted does not mean high voltage. It means logically true.

see Figure 5.2: Combinational logic, state elements, and the clock are closely related.

The clocking methodology defines when signals can be read or written. This text assumes an edge triggered clocking methodology, which means that state elements update their internal storage only on a clock edge. Combinational logic has its inputs coming from a set of state elements (e.g. register) and its outputs written to a set of state elements (e.g. register). The clock is another signal to these elements that periodically changes between high and low signals. The clock cycle time or period is the amount of time taken from one edge to the same edge (e.g. rising edge to next rising edge). The clock cycle time is broken into periods when the clock is high and low (two phase is common). There is a certain small window of time during which the inputs to a state element must have a stable value. The clock edge makes it appear instantaneous, but it isn't.

see Figure 5.3: An edge-triggered methodology allows a state element to be read and written in the same clock cycle without creating a race that could lead to indeterminate data values.

So a state element can be read, its value sent through combinational logic, and the new value be written back to the state element in a single clock cycle. Also, data can be written at the end of the previous clock cycle into a state element and read at the beginning of the next cycle from the same state element.

see Implementing an Instruction Set

We will look at both implementing the datapath and control of a our subset of the MIPS processor. First, we will examine an implementation that allows each instruction to complete in a single long clock cycle. Thus, all instructions take the same length of time. This is inefficient because some instructions take longer than others. We will next look at multiple cycle instructions, which is more efficient. In the next chapter, we will examine overlapping different instructions in a multicycle implementation, which is called pipelining.

see Figure 5.4: Two state elements are needed to store and access instructions, and an adder is needed to compute the next instruction address.

  1. Memory is needed to store instructions. Today most machines have a separate fast memory areas for instructions and data. Note with instructions, no processor write access is required. Note that instructions are at some point written to instruction memory (e.g. load time).
  2. The PC (program counter) is a 32-bit register used to hold the address of the current instruction.
  3. The adder is used to increment the PC to the next instruction.

see Figure 5.5: A portion of the datapath used for fetching instructions and incrementing the program counter.

The value in the PC is presented to the memory system, which will fetch the instruction. The value in the PC along with a value of 4 is input to the adder so we can prepare to fetch the next instruction. A value of 4 is used since memory is byte addressable and each instruction is 4 bytes long.

see Figure 5.6: The two elements needed to implement R-format ALU operations are the register file and the ALU.

For the R-type instructions (add, sub, and, or, and slt) they all read two registers and update a register as a result. The registers are stored in a structure called a register file, which can be read or updated by specifying the appropriate register number. There are two read input ports since we could read two registers in one instruction and one write input port specifying the register number since at most one register will be updated. There are three inputs that are 5 bits wide, which allows one of 32 registers to be specified. There is also one input data port that will contain the data to be written. There are two output ports containing the values of the registers specified to be read. These three ports are 32 bits wide since they contain data. Values are always read, but a write may not always occur. Thus, we need a write control signal.

The ALU takes two 32-bit inputs and generates one 32-bit output. The control signal is 3-bits wide in this implementation to specify the appropriate operation to be performed. Note that though there are 9 different instructions, some ALU operations (e.g. add) will be used for several instructions. The zero output will be used on a beq instruction.

see Figure 5.7: The datapath for R-type instructions.

Note that the registers are specified from the fields of the instruction. The outputs of the register file are input to the ALU. The output of the ALU is input to the register file.

see Figure 5.8: The two units needed to implement loads and stores, in addition to the register file and ALU of Figure 5.6, are the data memory unit and the sign-extension unit.

Note loads read from memory and stores write to memory. So, we need both read and write signals (could read, write, or neither when instruction does not reference memory), but never both at once. Need a 32-bit address and data input ports (data for stores) and a 32-bit output port for loads. Also, need to sign extend a 16-bit offset value to 32 bits before it is input to the ALU. Thus, the offset can be positive or negative (often negative offset from the frame pointer).

see Figure 5.9: The datapath for a load or store that does a register access, followed by a memory address calculation, then a read or write from memory, and a write into the register file if the instruction is a load.

The lw and sw each compute an address by adding an offset to the value in a register. Thus, they need to reference the register file and the ALU to perform the addition. Note the value of one register is added to the sign extended offset value in the instruction. On a store the data from the other register is sent to memory. On a load the data read from memory is placed in the register file at the location specified in the second register.

see Figure 5.10: The datapath for a branch uses an ALU for evaluation of the branch condition and a separate adder for computing the branch target as the sum of the incremented PC and the sign-extended lower 16 bits of the instruction (the branch displacement), shifted left 2 bits.

The beq instruction will cause a transfer of control if the two specified registers have equivalent values. The target of the branch is stored in the instruction as a displacement from the instruction following the branch (the PC was already incremented at the point it is referenced). The displacement value is sign extended (to allow backward branches) and left shifted by 2 since all instructions are aligned on 4 byte boundaries. The ALU provides an output (the zero signal) indicating whether the comparison result was equality. Other control logic (a multiplexor) uses the ALU result to determine whether the branch target should be assigned to the PC.

The j instruction takes the lower 26 bits, left shifts by 2, to produce the new value to be assigned to the PC. The top 4 bits of the PC are unaffected. This still leaves 2**26 possible addresses of instructions in the text segment or over 67 million instructions.

We will first look at creating a datapath to process these instructions in a single long clock cycle. Certain elements will have to be duplicated, such as memory for both data and instructions (i.e. split caches). A functional unit will have to be duplicated since the signals continue to pass through the combinational logic during the clock cycle.

see Figure 5.14: How the ALU control bits are set depending on the ALUOp control bits and the different function codes for the R-type instruction.

Now we come to the real challenge, specifying the control. The ALU will perform one of 5 functions (specified by three control lines). The ALUOp is a 2-bit control field. A 00 indicates add for loads and stores, a 01 is a subtract for branches (to see if two registers are equal), and a 10 indicates to use the funct field. Note the XXXXXX indicates that these bits are ignored.

The output of the ALU control unit is a 3-bit field that is fed into the ALU to select the operation to be performed. Note that (lw, sw, and add) and (branch equal and subtract) results in the same ALU control input signals.

see Figure 5.15: The truth table for the three ALU control bits (called Operation).

Creating a truth table is a nice intermediate step in designing the logic for the ALU control unit. Note that many of the bit combinations are now ignored. For instance, a 1X indicates that if the first bit is a 1, then it does not matter what the second bit is.

Once the truth table is constructed, it can be automatically optimized (e.g. Karnaugh maps) and turned into gates. Optimization takes care of the X (don't cares).

see MIPS Instruction Formats

  1. rd is the dest register. funct is the ALU function. The shamt field is used for shifting (it doesn't make sense to shift more than 31 positions for a 32 bit value, so only need 5 bits).
  2. rt is the dest register for loads and the source register containing the value to be stored for stores. rs is the base address register for the displacement addressing mode. address is the displacement in bytes.
  3. rs and rt are the registers to be compared for equality. address is the displacement in instructions that will be added to the PC.
  4. Note that the most significant 4 PC bits are unchanged for a jump. The least significant two bits are always zero. The 26 bits in the middle come from the address field.
So, the opcode, 16-bit offset, and various registers to be read are always in the same place. This simplifies the decoding of an instruction.

see Figure 5.18: The effect of each of the seven control signals.

Note the RegDst signal identifies one of two registers to be written (rt for loads, rd for others) and the RegWrite indicates whether one of these registers should be written (no for beq instruction).

Six of these can be set based on the opcode alone. Only the PCsrc is different. It should be set if the instruction is a beq and Zero output of the ALU (used for the equality test) is asserted.

see Figure 5.20: The setting of the control lines is completely determined by the opcode fields of the instruction.

This table shows how each of the control lines should be set based on the type of instruction to be performed (i.e. the opcode). So the inputs are the opcode bits and the outputs are the control signals. The RegDst and MemtoReg have no effect for the sw and beq since the RegWrite signal is deasserted. Note the only control lines not determined by the type of instruction is the zero output of the ALU, which is used to determine whether or not a branch is taken, and the output of the ALU control, which uses the funct field. This truth table can be used to automatically construct a PLA (Programmable Logic Array), which consists of an array of AND gates followed by an array of OR gates. This approach implements a sum of products, where the and gates for a set of products and the or gates form the sum of those products.

see Figure 5.29: The simple control and datapath are extended to handle the jump instruction.

Click on the names of individual elements within the figure to have the script frame automatically scrolled to an explanation of that element.

Summary of figure. This figure shows the design of a simple control and datapath within a processor to support single cycle execution of nine MIPS instructions (lw, sw, add, sub, and, or, slt, beq, j). You should read the explanation in Sections 5.2 and 5.3 before attempting to understand this figure.

PC. The PC (program counter) is a 32-bit register used to hold the address of the current instruction.

Instruction memory. Memory is needed to store instructions. Note with instruction memory, no processor write access is shown. Instructions are at some point written to instruction memory (e.g. load time), which will be described in more detail in Chapter 7. The input to the instruction memory is the 32-bit address from the PC. The output is a 32-bit instruction at that address.

Adder with PC and 4 as input. This adder adds 4 to the value of the PC. Each instruction on the MIPS is 4 bytes and the machine is byte addressable. Unless the instruction is a taken transfer of control, the address of the next instruction will be 4 bytes after the beginning of the current instruction.

Shift left 2 with Instruction [25-0] as input. The format for the jump instruction (as shown in Figure 5.28) has a 26-bit address in the least significant bits of the instruction. This field contains a word address. It needs to be converted to a byte address, so we left shift the address by 2. This has the effect of placing zeroes in the two least significant bits. Remember that all MIPS instructions are 4 bytes long. They are also aligned on a 4 byte boundary. In other words, the address of each instruction is an integer multiple of 4, which means that the two least significant bits of these addresses are always zero.

0-1 Mux with Instruction [20-16] and Instruction [15-11] as inputs. This multiplexor indicates which register is to be written to the register file. For R-type instructions (add, sub, and, or, and slt) the rd field in bits [15-11] of the instruction indicates the register destination. For the lw instruction, the destination register is in the rt field, which is in bits [20-16]. The multiplexor is controlled by the RegDst signal, which uses the instruction opcode to determine which field should be selected.

Control. The input to the control unit is the 6-bit opcode field from the instruction. Almost every control signal is asserted or deasserted based on the opcode field only. These control signals are used to select the values that are output from each multiplexor. They are also used to indicate how the data memory is to be accessed.

Registers. The register file contains the 32 MIPS general-purpose (integer/address) registers. One can view the register file as an array of registers. An R-type instruction uses two register values as input and one register as output. Thus, there are two register read data ports and one register write data port. Each of these data ports are 32 bits in width since registers contain 32 bits each. Besides the write data port, there are also 3 other inputs to the register file. Two of these inputs specify which two registers to be read and one specifies the register to be written. Each of these inputs are 5 bits wide, which allows one of 32 registers to be specified. Two values are read from register file on every cycle, even though sometimes one or both of these values may not be used. However, we don't want a write into the register file to occur on some instructions (sw and beq). The RegWrite signal controls whether the data received in the write data port will be written into the Write register.

Sign Extend. Sign extension is required for all instructions using the I-format, which includes loads, stores, and branches. The value being extended is the bits in the address field (the least significant 16 bits of the instruction). The output of the sign extension will eventually be input to some type of adder, which needs two 32-bit values as input. For loads and stores, we can have a negative displacement from the register value. For instance, negative displacements are often used when accessing local values from a stack pointer. Backward branches, such as those found in loops, will need a negative displacement from the program counter.

0-1 Mux with Read data port 2 and sign extended value as inputs. This multiplexor controls which one of two values is the second input value to the ALU. For R-type instructions it will be the value from Read data port 2 of the register file. For I-type instructions it will be the value from the sign-extension unit.

Shift left 2 with sign-extended value as input. A branch instruction contains an offset in instructions, not bytes. This offset needs to be added to the PC+4 value, which is a byte address. Thus, we need to convert the instruction offset to a byte offset. This is accomplished by left shifting the value by 2 since each instruction is 4 bytes in length.

ALU Control. The ALU control unit decides which type of result will be output from the ALU. One input to the ALU control unit is the ALUOp, which is a 2-bit control signal indicating a 00 (add for loads and stores), a 01 (subtract for branches), and a 10 (use the funct field). Another input is the funct field. Remember that for R-type instructions, the opcode field is always zero and the funct field is used to determine the type of operation to perform. The output of the ALU control unit is a 3-bit field that is fed into the ALU to select the operation to be performed. While the number of bits used for the encoding of the ALUOp and ALU Control output signals is determined by the number of operations it controls, the encoding of these signals is really arbitrary.

ALU. The ALU is the arithmetic/logic unit. It is used to perform all operations for R-type instructions, performs addition for loads and stores, and performs a subtraction for branches. The two input values are the value of the register specified by the rs field and either the value of the register specified by the rt field or the sign-extended value. The result that is output from the ALU is selected by the ALU Control signal. Remember that the ALU always performs an AND, OR, ADD. The value selected for the ALU result from among these operations (and the LESS operation) depends on the type of instruction being executed. The other output is the Zero signal, which is asserted when the two input values are identical.

Adder with PC+4 and Shift Left 2 as inputs. This adder is used to calculated the target address of a branch. One input is the PC+4, which is used instead of just PC since PC+4 could be calculated in parallel with loading the instruction from the instruction memory. The other input is the sign-extended value left shifted by two, which represents the offset of the target address from the PC+4 address. Note that the value was sign extended instead of zero extended since the offset could be negative (branch could be backwards). The ALU is not used to perform this addition since it in parallel detects if the values of the two specified registers are equal.

AND gate with Branch and Zero signals as input. The PC should only be updated with the calculated branch target when the instruction was a branch (branch asserted) and the two registers specified in the beq instruction have the same value (zero asserted).

Data memory. Data memory is needed since all the data values cannot fit in a limited number of registers. The 32-bit address of the memory to be accessed is input to the data memory. If the instruction was a store, then the value to be written is obtained from the Write data input port. If the instruction is a load, then the value to be read is written to the Read data port. There are two signals (MemRead and MemWrite) that are input to the data memory rather than just a single 1-bit signal. Access to data memory is sometimes expensive and may cause exceptions. Thus, data memory should only be accessed on a load or a store. If neither signal is asserted, then the data memory is not accessed during that cycle.

0-1 Mux with PC+4 and adder output as inputs. The next instruction to be fetched can be the next sequential instruction (PC+4) or the branch target (PC+4 + sign-extended offset<<2). The branch target is only selected when the instruction is a beq and the two register values compared were found to be equal.

1-0 Mux with Jump address and 0-1 Mux output as inputs. This multiplexor chooses between the jump address and the output of the previous multiplexor. The jump address is calculated by concatenating the 28-bit address left shifted by 2 address field with the 4 most significant bits of the PC. These 4 MS bits of the PC are assumed to never change. So there can be at most 2^26 (over 67 million) instructions in a program, which is almost always enough. The value selected is controlled by the Jump signal, which is only asserted when executing a j instruction. Remember that unlike branch instructions, jump instructions unconditionally transfer control to its specified target. Note that this multiplexor could have been combined with the previous multiplexor.

1-0 Mux with Read data and ALU result as inputs. This multiplexor selects whether the Read data or the ALU result will be sent to the register file. The MemtoReg is asserted on a load (selects the Read data input) and is deasserted on an R-type instruction (selects the ALU result). If a sw, beq, or j instruction is executing, then it does not matter whether the MemtoReg signal is asserted or deasserted since the RegWrite signal will be deasserted.

see Advantages of a Multicycle Implementation

The main advantage is performance. Some instructions require fewer steps (cycles) than other instructions. For the single cycle implementation, the cycle time is determined by the longest instruction. If we were implementing long running instructions, such as multiplies, divides, or floating-point operations, then the cycle time would be significantly increased. So the total time executed by a program can be reduced.

see Figure 5.34 (top portion): The action caused by the setting of each control signal in Figure 5.33 on page 383.

The 1-bit control signals are used when 2 possible actions are needed.

see Figure 5.34 (bottom portion): The action caused by the setting of each control signal in Figure 5.33 on page 383.

The 2-bit control signals are used when 3 or 4 actions are required.

see Multicycle Implementation Steps of Execution

The execution of an instruction will take at most 5 steps (clock cycles). Branches and jumps complete in 3 steps, R-type and sw instructions complete in 4 steps, and lw instructions complete in 5 steps. Note that all actions within a single step are performed in parallel.

see F5.35: Summary of the steps taken to execute any instruction class.

Click in a table entry within the table to have the script frame automatically scrolled to an explanation of that step.

Summary of table. This table shows the actions taken at the different steps of each type of instruction. These actions are illustrated as effects on registers, memory, internal registers, etc. This figure illustrates that classes of instructions can require different number of cycles to complete.

Step 1. Step 1 always fetches an instruction from memory and stores the instruction in the instruction register. Step 1 also always increments the PC to point to the next sequential instruction. Sometimes the PC will be updated with a different value in a later cycle of the instruction's execution. However, at this point the processor does not know the type of instruction and the PC is incremented in case the next sequential address will be needed.

Step 2. Step 2 always reads the values of two source registers from the register file specified by the rs and rt fields and store these values in the internal registers A and B, respectively. Step 2 also always assigns to the ALUOut the sum of the PC and the sign-extended least significant 16 bits of the instruction register that is left shifted by 2. Thus, the ALUOut contains the branch target address if the instruction was a branch. Step 2 determines the type of instruction and and the remaining steps that are taken depend on the instruction type.

Step 3 for the R-type instructions. This step performs the arithmetic operation on the values contained in the internal A and B registers. The result is placed in the internal ALUOut register.

Step 3 for load or store instructions. This step sign-extends the 16 least significant bits of the instruction register, adds that result to the value in the internal A register and stores the result in the internal ALUOut register. This result is the address of the data to be dereferenced in memory.

Step 3 for branch instructions. This step compares the values in the A and B registers and assigns the value of the ALUOut register to the PC only if the two values are equal. Note that there would be an != comparison if the branch was a bne instead of a beq instruction.

Step 3 for jump instructions. This step left shifts by 2 the 26 least significant bits in the instruction register, concatenates it with the 4 most significant bits of the PC, and assigns the result to the PC.

Step 4 for the R-type instructions. This step assigns the value in the ALUOut internal register to register in the register file specified by the rd field, which is in bits [15-11] of the instruction register.

Step 4 for load instructions. This step loads the value from memory at the address specified in the internal ALUOut register and copies the value into the internal MDR register.

Step 4 for store instructions. This step assigns the value in the internal B register and stores the value into memory at the address specified in the internal ALUOut register.

Step 5 for load instructions. This step copies the value in the internal MDR register to the register in the register file specified by the rt field, which is in bits [20-16] of the instruction register.

see Multicycle Control

Now we can design the control for the multicycle implementation.

see Finite State Machine for Multicycle Control

This looks a lot like a finite automata. However, we also have to specify the output, since we are specifying more than a recognizer. Each state in the finite state machine takes a single clock cycle. Another name for this would be hardwired control.

see Figure C.7: The control unit for the MIPS will consist of some control logic and a register to hold the states.

The logic determines the signals to assert and the next state. Appendix C goes into more detail.

see Figure C.8: The logic equations for the control unit shown in a shorthand form.

The control signals to be asserted or deasserted depend only on what state the machine is in. The nextstate signals depend both on the current state and the opcode of the instruction.

see Microprogramming for Multicycle Control

Microprogramming used to be the dominant method of specifying control. We will discuss later why it is no longer being used in recently designed machines. The dispatch is like a case statement in a high level language. The control is like control at the macroinstruction level. Typically, this is accomplished by using the opcode of the macroinstruction as the target of a microinstruction address, which is a portion of a table with each entry containing the address of a microroutine to implement the microinstruction. The set of microinstructions implementing a macroinstruction is a microroutine. The set of all microinstructions is called a microprogram.

see Exceptions and Interrupts

The hardest part of implementing control is dealing with exceptions and interrupts. An exception or interrupt causes a change to supervisor mode to permit actions not allowed in user mode. Control transfers to a location in the O.S. code, so this mode is safe since the O.S. is considered safe for that mode. The address of the exception causing instruction is saved so it can be restarted later.

see Figure 5.48: The multicycle datapath with the addition needed to implement exceptions.

Click in a element within the figure to have the script frame automatically scrolled to an explanation of that element.

Summary of figure. This figure shows the design of a simple control and datapath within a processor to support multicycle execution of nine MIPS instructions (lw, sw, add, sub, and, or, slt, beq, j). You should read the explanation in Sections 5.4 before attempting to understand this figure. The main differences between the multicycle implementation and the single cycle implementation are (a) a single memory for both instructions and data, (b) the elimination of two adders since this work is now done by the ALU, (c) internal registers to save state about an instruction between cycles. In fact, the existence of temporary registers gives a clue where data is stored at the end of one cycle and is read at the beginning of the next cycle. The multicycle implementation requires more complicated control since different control signals need to be asserted in the different cycles of an instruction. More details about the control for a multicycle implementation is described in Figure 5.50.

PC. The PC (program counter) is a 32-bit register used to hold the address of the current instruction.

0-1 Mux with PC and ALUOut as inputs. This multiplexer selects between the PC and the output of the ALU for the address to access memory.

Memory. There is only a single memory in the multicycle implementation. One input to the memory unit is the 32-bit address from the multiplexor output. If the instruction was a store, then the MemWrite signal is asserted and the value to be written is obtained from the Write data input port. If the instruction is a load or the instruction needs to be fetched, then the MemRead signal is asserted and the value to be read is written to the MemData port.

OR gate. This gate is used to determine if the PC should be updated during this cycle. The possible reasons for updating the PC include incrementing its value by 4 to point to the next sequential instruction, performing an unconditional jump, or having a conditional branch be taken.

AND gate. This gate is used to determine if the PC should be updated due to a conditional branch being taken. A conditional branch is only take when both the PCWriteCond is asserted (due to the beq conditional branch instruction being executed in a specific cycle) and the Zero signal asserted (due to two registers having the same value).

Instruction register. The instruction register is an internal register that contains the bits of the instruction fetched from memory. It is only updated when the IRWrite signal is asserted.

Memory data register. The memory data register is an internal register that contains the data fetched from memory. There is no control signal into this unit, so the value in the register must be used on succeeding cycle or it will be overwritten.

Control. The input to the control unit is the 6-bit opcode field from the instruction. The criteria for asserting or deasserting every control signal is based on the opcode field. These control signals are used to select the values that are output from each multiplexor. What is not shown in this figure is that control signals are not asserted or deasserted based on the opcode field alone. The relative cycle for the instruction also plays an important role. How this can be accomplished is described in a later figure.

0-1 Mux with Instruction [20-16] and Instruction [15-11] as inputs. This multiplexor indicates which register is to be written to the register file. For R-type instructions (add, sub, and, or, and slt) the rd field in bits [15-11] of the instruction indicates the register destination. For the lw instruction, the destination register is in the rt field, which is in bits [20-16]. The multiplexor is controlled by the RegDst signal, which uses the instruction opcode to determine which field should be selected.

0-1 Mux with Memory data register and ALUOut as inputs. This multiplexor indicates which value is to be written to the register file. One input can come from the Memory data register, which obtained its value as the result of a load from memory. The other input is from the ALUOut, which is the result of an R-type (and, or, add, sub, or slt) instruction. This multiplexor is controlled by the MemtoReg signal.

Registers. The register file contains the 32 MIPS general-purpose (integer/address) registers. One can view the register file as an array of registers. An R-type instruction uses two register values as input and one register as output. Thus, there are two register read data ports and one register write data port. Each of these data ports are 32 bits in width since registers contain 32 bits each. Besides the write data port, there are also 3 other inputs to the register file. Two of these inputs specify which two registers to be read and one specifies the register to be written. Each of these inputs are 5 bits wide, which allows one of 32 registers to be specified. Two values are read from register file on every cycle, even though sometimes one or both of these values may not be used. However, we don't want a write into the register file to occur on some instructions (sw and beq). The RegWrite signal controls whether the data received in the write data port will be written into the Write register.

Sign Extend. Sign extension is required for all instructions using the I-format, which includes loads, stores, and branches. The value being extended is the bits in the address field (the least significant 16 bits of the instruction). The output of the sign extension will eventually be input to some type of adder, which needs two 32-bit values as input. For loads and stores, we can have a negative displacement from the register value. For instance, negative displacements are often used when accessing local values from a stack pointer. Backward branches, such as those found in loops, will need a negative displacement from the program counter.

Shift left 2 with sign-extended value as input. A branch instruction contains an offset in instructions, not bytes. This offset needs to be added to the PC+4 value, which is a byte address. Thus, we need to convert the instruction offset to a byte offset. This is accomplished by left shifting the value by 2 since each instruction is 4 bytes in length.

A. This is an internal register that contains the value of the register loaded from the register file that is referred to by the rs field. This register is used to keep this value between two consecutive clock cycles.

B. This is an internal register that contains the value of the register loaded from the register file that is referred to by the rt field. This register is used to keep this value between two consecutive clock cycles.

0-1 Mux with PC and adder output as inputs. The next instruction to be fetched can be the next sequential instruction (PC+4) or a branch target (PC+4 + sign-extended offset << 2). In either case the PC must at some point be incremented. Most instructions eventually perform some type of operation on the register specified by the rs field. The ALUSrcA signal decides if the first input to the ALU will come from the PC or the register file.

0-1-2-3 Mux. This multiplexor decides whether the input is from the second register value obtained from the register file (e.g. for R-type instructions), a constant of 4 to increment the PC to point to the next sequential instruction, the sign-extended value used as an offset from a register in a displacement address for loads and stores, or the left-shifted by 2 sign-extended value, which is used as an offset for branches. The ALUSrcB 2-bit signal is used to select from among these 4 possible inputs.

ALU Control. The ALU control unit decides which type of result will be output from the ALU. One input to the ALU control unit is the ALUOp, which is a 2-bit control signal indicating a 00 (add for loads and stores), a 01 (subtract for branches), and a 10 (use the funct field). Another input is the funct field. Remember that for R-type instructions, the opcode field is always zero and the funct field is used to determine the type of operation to perform. The output of the ALU control unit is a 3-bit field that is fed into the ALU to select the operation to be performed. While the number of bits used for the encoding of the ALUOp and ALU Control output signals is determined by the number of operations it controls, the encoding of these signals is really arbitrary.

ALU. The ALU is the arithmetic/logic unit. It is used to perform all operations for R-type instructions, performs addition for loads and stores, and performs a subtraction for branches. The two input values are the value of the register specified by the rs field and either the value of the register specified by the rt field or the sign-extended value. The result that is output from the ALU is selected by the ALU Control signal. Remember that the ALU always performs an AND, OR, ADD. The value selected for the ALU result from among these operations (and the LESS operation) depends on the type of instruction being executed. The other output is the Zero signal, which is asserted when the two input values are identical.

Shift left 2 with Instruction [25-0] as input. The format for the jump instruction (as shown in Figure 5.28) has a 26-bit address in the least significant bits of the instruction. This field contains a word address. It needs to be converted to a byte address, so we left shift the address by 2. This has the effect of placing zeroes in the two least significant bits. Remember that all MIPS instructions are 4 bytes long. They are also aligned on a 4 byte boundary. In other words, the address of each instruction is an integer multiple of 4, which means that the two least significant bits of these addresses are always zero.

ALUOut. This is an internal register to hold the output of the ALU between clock cycles.

0-1 Mux selecting an input for the Cause register. This multiplexor selects an input for the Cause internal register, which depends on the IntCause signal indicating the type of exception.

0-1-2-3 Mux selecting an input to be routed to the PC. The next instruction to be fetched can be the next sequential instruction (PC+4), the branch target (PC+4 + sign-extended_offset<<2), the jump target (PC[31-28] | jump_address <<2), or the address for the exception handling routine. The branch target is only selected when the instruction is a beq and the two register values compared were found to be equal. The jump address is calculated by concatenating the 28-bit address left shifted by 2 address field with the 4 most significant bits of the PC. These 4 MS bits of the PC are assumed to never change. So there can be at most 2^26 (over 67 million) instructions in a program, which is almost always enough. The value selected is controlled by the PCSource signal, which is a 2-bit signal that indicates one of 3 values. The exception handler address is set when an exception occurs.

EPC. This is an internal register that contains the address of the instruction that caused the exception. The reason this is needed in case the operating system can handle the exception without aborting the program and it needs to transfer control back to the program.

Cause. This is an internal register that contains a code indicating the cause of the exception. The exception handling code looks at the cause register to decide which portion of the exception handler should be used to handle the exception.

see Figure 5.50: This shows the finite state machine with the additions to handle exception detection.

Click in a state (bubble) within the figure to have the script frame automatically scrolled to an explanation of that state.

Summary of figure. This figure shows the representation for a finite state machine to control the datapath of the multicycle implementation (Figure 5.33). Each bubble represents a state. The arcs represent transitions between states. When there is more than one transition from a state, the transition is labeled to indicate on what condition that transition is made. Within each state are the signals that are asserted or deasserted (assigned a particular set of values for multibit signals). Note that all paths eventually lead back to the start state (state 0) to allow the next instruction to be executed.

State 0. This is the start state. The MemRead is asserted so a read can occur from memory. The ALUSrcA is deasserted so that the PC is input to the ALU so it can be incremented to contain the address of the next sequential instruction. The IorD is deasserted so the address used to access memory comes from the PC. The IRWrite is asserted so the instruction register will be updated with the value read from memory. The ALUSrcB is set so the second input to the ALU is the value 4 so the PC can be incremented to the next sequential instruction. The PCWrite is asserted so the PC will be updated with the address of the next sequential instruction. The PCSource is set so the value assigned to the PC is the immediate output of the ALU. The next state is always state 1 since the type of instruction has yet to be determined.

State 1. The ALUSrcA is set so the first input to the ALU comes from the PC. The ALUSrcB is set so the second input to the ALU comes from sign-extended, left-shifted by 2, 16-bit address field of the instruction. The ALUOp is set so the ALU performs an addition. Thus, the branch target address is calculated and the ALU result is stored in the ALUOut register. The next state is determined by the opcode field of the instruction.

State 2. This state is used for load and store instructions. The ALUSrcA signal is asserted so the first input to the ALU comes from the value of the internal A register. The ALUSrcB signal is set so that the second input to the ALU comes from the 16-bit sign-extended value. The ALUOp is set so an addition is performed by the ALU. The ALU Result is the address to access memory and is placed in the ALUOut internal register. The next state depends on whether the opcode indicates if the instruction was a load or a store.

State 3. This state is used for load instructions. The MemRead signal is asserted so a value can be loaded from memory. The IorD is asserted so the address to access memory comes from the ALUOut internal register. The value loaded from memory is stored in the MDR internal register.

State 4. This state is used for load instructions. The RegDst is deasserted so that the specified register to update comes from bits [20-16] (field rt) of the instruction register. The RegWrite signal is asserted so the register file will be updated. The MemtoReg signal is asserted so the value to be stored in the specified register comes from the MDR internal register.

State 5. This state is used for store instructions. The MemWrite signal is asserted so that memory will be updated. The IorD is asserted so the address to access memory comes from the ALUOut internal register. This value is stored in memory during this cycle.

State 6. This state is used for R-type instructions. The ALUSrcA signal is asserted so the first ALU input comes from the A internal register. The ALUSrcB signal is set so that the second ALU input comes from the B internal register. The ALUOp signal is set so that the operation performed by the ALU is determined by examining bits [5-0] (the funct field) of the instruction register. The ALU Result is stored in the ALUOut internal register.

State 7. This state is used for R-type instructions. The RegDst signal is asserted so that the register updated in the register file is specified by bits [15-11] (field rd) of the instruction register. The RegWrite signal is asserted so that the register file will be updated. The MemtoReg signal is deasserted so that the value to store in the register file comes from the ALUOut internal register.

State 8. This state is used for branch instructions. The ALUSrcA signal is asserted so that the first input to the ALU is from the A internal register. The ALUSrcB signal is set so that the second input to the ALU is from the B internal register. The ALUOp signal is set so that the ALU performs a subtraction and compares the two values to see if they are equal. The PCWriteCond signal is asserted so that the Zero signal be considered in which value to assign to the PC. The PCSource signal is set so that the value in the ALUOut internal register is the one that is routed to the PC. The PC will only be updated if the Zero gets asserted as a result of the ALU operation.

State 9. This state is used for jump instructions. The PCWrite signal is asserted since the PC needs to be updated. The PCSource signal is set so that the Jump address is routed to the PC. The PC gets updated with the jump target address.

State 10. This state is used when an undefined instruction is encountered. The IntCause signal is deasserted to indicate the cause of the exception. The CauseWrite signal is asserted so the Cause register will get updated. The ALUSrcA signal is deasserted so the first input to the ALU will come from the PC. The ALUSrcB is set so that 4 is the second ALU input. The ALUOp signal is set so that a subtraction occurs. Remember for the multicycle implementation, the PC has already been incremented by 4. So the processor needs to subtract 4 to get the address of the instruction that caused the exception. The EPCWrite signal is asserted so the EPC register will be updated. The EPC will hold the return address. The PCWrite signal is asserted to cause the PC to be updated. The PCSource signal is set so that the address for the exception handling routine will be routed to the PC.

State 11. This state is used when an arithmetic overflow occurs. The IntCause signal is asserted to indicate the cause of the exception. The CauseWrite signal is asserted so the Cause register will get updated. The ALUSrcA signal is deasserted so the first input to the ALU will come from the PC. The ALUSrcB is set so that 4 is the second ALU input. The ALUOp signal is set so that a subtraction occurs. Remember for the multicycle implementation, the PC has already been incremented by 4. So the processor needs to subtract 4 to get the address of the instruction that caused the exception. The EPCWrite signal is asserted so the EPC register will be updated. The EPC will hold the return address. The PCWrite signal is asserted to cause the PC to be updated. The PCSource signal is set so that the address for the exception handling routine will be routed to the PC.

see Fallacies and Pitfalls

  1. Pitfall: Microcode implementing a complex instruction may not be faster than a sequence using simpler instructions. Almost all machines used to be microcoded. Now few machines are designed today with control implemented via microprogramming. The introduction of caches no longer caused the microcode to have a fetch performance advantage (microcode and cache both have same fetch time). An optimizing compiler can exploit instructions generated at a similar level to microcode. Microcoded machines tend to be slower than hardwired machines. Also CAD programs can automate most of the process of control, so there is little advantage to using microcode. Will understand this more when we go over pipelining.
  2. Fallacy: If there is space in the control store, new instructions are free of cost. One has to consider the design cost for these new instructions. Also, upward compatibility of executables would make the manufacturer keep these instructions even when you need the extra space for other purposes.