Chapter 1
see Concepts Introduced in Chapter 5
In this chapter we will go over some of the details for the implementation
of a single cycle and multiple cycle processors for a subset of the MIPS
instructions.
We will investigate the construction of the datapath and control (control
being accomplished with both finite state machines and microprogramming).
We will also go over how exceptions and interrupts are handled by a
processor.
see Subset of MIPS Instruction Set
We will look at the implementation of a processor that implements a
subset of the MIPS instruction set.
Note that there are no typically long running instructions, such as
integer multiply and divide or floating-point instructions.
We just don't have time to go over everything.
see Figure 5.1: An abstract view of the implementation of the MIPS
subset showing the major functional units and the major connections
between them.
Several points should be observed:
-
Separate instruction and data memory.
-
PC is used to determine where to fetch the next instruction from the
instruction memory.
-
The register operands used by an instruction are specified by fields of
that instruction.
-
ALU is used to calculate an arithmetic result, compute a memory address,
or perform a comparison for a branch.
-
The result of an arithmetic/logical instruction is written back into
the register file.
-
Details about the implementation of a branch instruction are missing
(no update of the PC).
see Logic Elements
There are two different types of logic elements we will be using:
-
combinational - Outputs depend only on the current inputs.
This includes the ALU; given same set of inputs, it always produces
the same output because no internal storage.
-
state - Has some internal storage, has at least two inputs
(the data value to be written and the clock which controls when
the data value is to be stored) and one output (the value written
in the previous clock).
State elements include registers and memory.
State elements allow values to be stored across cycles.
see Types of Signals
Three general types of signals (clock is when to do something,
control is whether or not we should do it, data is the actual
data to be stored or manipulated).
Note asserted does not mean high voltage.
It means logically true.
see Figure 5.2: Combinational logic, state elements, and the clock
are closely related.
The clocking methodology defines when signals can be read or
written.
This text assumes an edge triggered clocking methodology,
which means that state elements update their internal storage only
on a clock edge.
Combinational logic has its inputs coming from a set of state
elements (e.g. register) and its outputs written to a set of state
elements (e.g. register).
The clock is another signal to these elements that periodically
changes between high and low signals.
The clock cycle time or period is the amount of time taken from
one edge to the same edge (e.g. rising edge to next rising edge).
The clock cycle time is broken into periods when the clock is
high and low (two phase is common).
There is a certain small window of time during which the inputs
to a state element must have a stable value.
The clock edge makes it appear instantaneous, but it isn't.
see Figure 5.3: An edge-triggered methodology allows a state
element to be read and written in the same clock cycle without
creating a race that could lead to indeterminate data values.
So a state element can be read, its value sent through combinational
logic, and the new value be written back to the state element in a
single clock cycle.
Also, data can be written at the end of the previous clock cycle into
a state element and read at the beginning of the next cycle from the
same state element.
see Implementing an Instruction Set
We will look at both implementing the datapath and control of a
our subset of the MIPS processor.
First, we will examine an implementation that allows each instruction to
complete in a single long clock cycle.
Thus, all instructions take the same length of time.
This is inefficient because some instructions take longer than others.
We will next look at multiple cycle instructions, which is more
efficient.
In the next chapter, we will examine overlapping different instructions
in a multicycle implementation, which is called pipelining.
see Figure 5.4: Two state elements are needed to store and access
instructions, and an adder is needed to compute the next instruction
address.
-
Memory is needed to store instructions.
Today most machines have a separate fast memory areas for instructions
and data.
Note with instructions, no processor write access is required.
Note that instructions are at some point written to instruction memory
(e.g. load time).
-
The PC (program counter) is a 32-bit register used to hold the address
of the current instruction.
-
The adder is used to increment the PC to the next instruction.
see Figure 5.5: A portion of the datapath used for fetching instructions
and incrementing the program counter.
The value in the PC is presented to the memory system, which will fetch
the instruction.
The value in the PC along with a value of 4 is input to the adder so we
can prepare to fetch the next instruction.
A value of 4 is used since memory is byte addressable and each instruction
is 4 bytes long.
see Figure 5.6: The two elements needed to implement R-format ALU
operations are the register file and the ALU.
For the R-type instructions (add
,
sub
, and
,
or
, and slt
)
they all read two registers and update a register as a result.
The registers are stored in a structure called a register file,
which can be read or updated by specifying the appropriate register
number.
There are two read input ports since we could read two registers in one
instruction and one write input port specifying the register number
since at most one register will be updated.
There are three inputs that are 5 bits wide, which allows one of 32 registers
to be specified.
There is also one input data port that will contain the data to be
written.
There are two output ports containing the values of the registers specified
to be read.
These three ports are 32 bits wide since they contain data.
Values are always read, but a write may not always occur.
Thus, we need a write control signal.
The ALU takes two 32-bit inputs and generates one 32-bit output.
The control signal is 3-bits wide in this implementation to specify the
appropriate operation to be performed.
Note that though there are 9 different instructions, some ALU operations
(e.g. add) will be used for several instructions.
The zero output will be used on a beq instruction.
see Figure 5.7: The datapath for R-type instructions.
Note that the registers are specified from the fields of the instruction.
The outputs of the register file are input to the ALU.
The output of the ALU is input to the register file.
see Figure 5.8: The two units needed to implement loads and stores,
in addition to the register file and ALU of Figure 5.6,
are the data memory unit and the sign-extension unit.
Note loads read from memory and stores write to memory.
So, we need both read and write signals (could read, write, or neither
when instruction does not reference memory), but never both at once.
Need a 32-bit address and data input ports (data for stores) and a 32-bit
output port for loads.
Also, need to sign extend a 16-bit offset value to 32 bits before it is
input to the ALU.
Thus, the offset can be positive or negative (often negative offset from
the frame pointer).
see Figure 5.9: The datapath for a load or store that does a register
access, followed by a memory address calculation, then a read or write
from memory, and a write into the register file if the instruction is a
load.
The lw
and sw
each compute an address by adding an offset to the value in a register.
Thus, they need to reference the register file and the ALU to perform
the addition.
Note the value of one register is added to the sign extended offset
value in the instruction.
On a store the data from the other register is sent to memory.
On a load the data read from memory is placed in the register file
at the location specified in the second register.
see Figure 5.10: The datapath for a branch uses an ALU for evaluation
of the branch condition and a separate adder for computing the branch
target as the sum of the incremented PC and the sign-extended lower 16
bits of the instruction (the branch displacement), shifted left 2 bits.
The beq
instruction will cause a transfer of
control if the two specified registers have equivalent values.
The target of the branch is stored in the instruction as a displacement
from the instruction following the branch (the PC was already incremented
at the point it is referenced).
The displacement value is sign extended (to allow backward branches) and
left shifted by 2 since all instructions are aligned on 4 byte boundaries.
The ALU provides an output (the zero signal) indicating whether the comparison
result was equality.
Other control logic (a multiplexor) uses the ALU result to determine whether
the branch target should be assigned to the PC.
The j
instruction takes the lower 26 bits, left
shifts by 2, to produce the new value to be assigned to the PC.
The top 4 bits of the PC are unaffected.
This still leaves 2**26 possible addresses of instructions in the text
segment or over 67 million instructions.
We will first look at creating a datapath to process these instructions
in a single long clock cycle.
Certain elements will have to be duplicated, such as memory for both
data and instructions (i.e. split caches).
A functional unit will have to be duplicated since the signals continue
to pass through the combinational logic during the clock cycle.
see Figure 5.14: How the ALU control bits are
set depending on the ALUOp control bits and the different function
codes for the R-type instruction.
Now we come to the real challenge, specifying the control.
The ALU will perform one of 5 functions (specified by three control
lines).
The ALUOp is a 2-bit control field.
A 00 indicates add for loads and stores, a 01 is a subtract for
branches (to see if two registers are equal), and a 10 indicates to
use the funct field.
Note the XXXXXX indicates that these bits are ignored.
The output of the ALU control unit is a 3-bit field that is fed into
the ALU to select the operation to be performed.
Note that (lw
, sw
,
and add) and (branch equal and subtract) results in the same ALU control input
signals.
see Figure 5.15: The truth table for the three ALU control bits
(called Operation).
Creating a truth table is a nice intermediate step in designing the
logic for the ALU control unit.
Note that many of the bit combinations are now ignored.
For instance, a 1X indicates that if the first bit is a 1, then it does
not matter what the second bit is.
Once the truth table is constructed, it can be automatically optimized
(e.g. Karnaugh maps) and turned into gates.
Optimization takes care of the X (don't cares).
see MIPS Instruction Formats
-
rd is the dest register.
funct is the ALU function.
The shamt field is used for shifting (it doesn't make sense to shift
more than 31 positions for a 32 bit value, so only need 5 bits).
-
rt is the dest register for loads and the source register containing
the value to be stored for stores.
rs is the base address register for the displacement addressing mode.
address is the displacement in bytes.
-
rs and rt are the registers to be compared for equality.
address is the displacement in instructions that will be added to the
PC.
-
Note that the most significant 4 PC bits are unchanged for a jump.
The least significant two bits are always zero.
The 26 bits in the middle come from the address field.
So, the opcode, 16-bit offset, and various registers to be read are
always in the same place.
This simplifies the decoding of an instruction.
see Figure 5.18: The effect of each of the seven control signals.
Note the RegDst signal identifies one of two registers to be written
(rt for loads, rd for others) and the RegWrite indicates whether one
of these registers should be written (no for beq
instruction).
Six of these can be set based on the opcode alone.
Only the PCsrc is different.
It should be set if the instruction is a beq
and
Zero output of the ALU (used for the equality test) is asserted.
see Figure 5.20: The setting of the control lines is completely
determined by the opcode fields of the instruction.
This table shows how each of the control lines should be set based
on the type of instruction to be performed (i.e. the opcode).
So the inputs are the opcode bits and the outputs are the control signals.
The RegDst and MemtoReg have no effect for the
sw
and beq
since the RegWrite signal is deasserted.
Note the only control lines not determined by the type of instruction
is the zero output of the ALU, which is used to determine whether or
not a branch is taken, and the output of the ALU control, which uses the
funct field.
This truth table can be used to automatically construct a PLA
(Programmable Logic Array), which consists of an array of AND gates
followed by an array of OR gates.
This approach implements a sum of products, where the
and
gates for a set of products and the
or
gates form the sum of those products.
see Figure 5.29: The simple control and datapath are extended to handle
the jump instruction.
Click on the names of individual elements within the figure to have the
script frame automatically scrolled to an explanation of that element.
Summary of figure.
This figure shows the design of a simple control and datapath within a
processor to support single cycle execution of nine MIPS instructions
(lw
, sw
,
add
, sub
,
and
, or
,
slt
, beq
,
j
).
You should read the explanation in Sections 5.2 and 5.3 before attempting
to understand this figure.
PC.
The PC (program counter) is a 32-bit register used to hold the address
of the current instruction.
Instruction memory.
Memory is needed to store instructions.
Note with instruction memory, no processor write access is shown.
Instructions are at some point written to instruction memory
(e.g. load time), which will be described in more detail in Chapter 7.
The input to the instruction memory is the 32-bit address from the PC.
The output is a 32-bit instruction at that address.
Adder with PC and 4 as input.
This adder adds 4 to the value of the PC.
Each instruction on the MIPS is 4 bytes and the machine is byte addressable.
Unless the instruction is a taken transfer of control, the address of the
next instruction will be 4 bytes after the beginning of the current
instruction.
Shift left 2 with Instruction [25-0] as input.
The format for the jump instruction (as shown in Figure 5.28) has
a 26-bit address in the least significant bits of the instruction.
This field contains a word address.
It needs to be converted to a byte address, so we left shift the address
by 2.
This has the effect of placing zeroes in the two least significant bits.
Remember that all MIPS instructions are 4 bytes long.
They are also aligned on a 4 byte boundary.
In other words, the address of each instruction is an integer multiple of
4, which means that the two least significant bits of these addresses are
always zero.
0-1 Mux with Instruction [20-16] and Instruction [15-11] as inputs.
This multiplexor indicates which register is to be written to the register
file.
For R-type instructions (add
,
sub
, and
,
or
, and slt
) the
rd field in bits [15-11] of the instruction indicates the register
destination.
For the lw
instruction, the destination register
is in the rt field, which is in bits [20-16].
The multiplexor is controlled by the RegDst signal, which uses the
instruction opcode to determine which field should be selected.
Control.
The input to the control unit is the 6-bit opcode field from the instruction.
Almost every control signal is asserted or deasserted based on the opcode
field only.
These control signals are used to select the values that are output from each
multiplexor.
They are also used to indicate how the data memory is to be accessed.
Registers.
The register file contains the 32 MIPS general-purpose (integer/address)
registers.
One can view the register file as an array of registers.
An R-type instruction uses two register values as input and one register
as output.
Thus, there are two register read data ports and one register write data port.
Each of these data ports are 32 bits in width since registers contain
32 bits each.
Besides the write data port, there are also 3 other inputs to the register
file.
Two of these inputs specify which two registers to be read and one specifies
the register to be written.
Each of these inputs are 5 bits wide, which allows one of 32 registers to
be specified.
Two values are read from register file on every cycle, even though sometimes
one or both of these values may not be used.
However, we don't want a write into the register file to occur on some
instructions (sw
and
beq
).
The RegWrite signal controls whether the data received in the write
data port will be written into the Write register.
Sign Extend.
Sign extension is required for all instructions using the I-format, which
includes loads, stores, and branches.
The value being extended is the bits in the address field (the
least significant 16 bits of the instruction).
The output of the sign extension will eventually be input to some type of
adder, which needs two 32-bit values as input.
For loads and stores, we can have a negative displacement from the register
value.
For instance, negative displacements are often used when accessing local
values from a stack pointer.
Backward branches, such as those found in loops, will need a negative
displacement from the program counter.
0-1 Mux with Read data port 2 and sign extended value as inputs.
This multiplexor controls which one of two values is the second input
value to the ALU.
For R-type instructions it will be the value from Read data port 2 of
the register file.
For I-type instructions it will be the value from the sign-extension
unit.
Shift left 2 with sign-extended value as input.
A branch instruction contains an offset in instructions, not bytes.
This offset needs to be added to the PC+4 value, which is a byte
address.
Thus, we need to convert the instruction offset to a byte offset.
This is accomplished by left shifting the value by 2 since each
instruction is 4 bytes in length.
ALU Control.
The ALU control unit decides which type of result will be output
from the ALU.
One input to the ALU control unit is the ALUOp, which is a 2-bit control
signal indicating a 00 (add for loads and stores), a 01 (subtract for
branches), and a 10 (use the funct field).
Another input is the funct field.
Remember that for R-type instructions, the opcode field is always
zero and the funct field is used to determine the type of operation
to perform.
The output of the ALU control unit is a 3-bit field that is fed into the
ALU to select the operation to be performed.
While the number of bits used for the encoding of the ALUOp and
ALU Control output signals is determined by the number of operations
it controls, the encoding of these signals is really arbitrary.
ALU.
The ALU is the arithmetic/logic unit.
It is used to perform all operations for R-type instructions,
performs addition for loads and stores, and performs a subtraction
for branches.
The two input values are the value of the register specified by the
rs field and either the value of the register specified by the
rt field or the sign-extended value.
The result that is output from the ALU is selected by the ALU Control
signal.
Remember that the ALU always performs an AND, OR, ADD.
The value selected for the ALU result from among these operations
(and the LESS operation) depends on the type of instruction being executed.
The other output is the Zero signal, which is asserted when the two
input values are identical.
Adder with PC+4 and Shift Left 2 as inputs.
This adder is used to calculated the target address of a branch.
One input is the PC+4, which is used instead of just PC since PC+4
could be calculated in parallel with loading the instruction from
the instruction memory.
The other input is the sign-extended value left shifted by two, which
represents the offset of the target address from the PC+4 address.
Note that the value was sign extended instead of zero extended since
the offset could be negative (branch could be backwards).
The ALU is not used to perform this addition since it in parallel
detects if the values of the two specified registers are equal.
AND gate with Branch and Zero signals as input.
The PC should only be updated with the calculated branch target when
the instruction was a branch (branch asserted) and the two registers
specified in the beq
instruction have the same
value (zero asserted).
Data memory.
Data memory is needed since all the data values cannot fit in a limited
number of registers.
The 32-bit address of the memory to be accessed is input to the data memory.
If the instruction was a store, then the value to be written is obtained
from the Write data input port.
If the instruction is a load, then the value to be read is written to the
Read data port.
There are two signals (MemRead and MemWrite) that are input
to the data memory rather than just a single 1-bit signal.
Access to data memory is sometimes expensive and may cause exceptions.
Thus, data memory should only be accessed on a load or a store.
If neither signal is asserted, then the data memory is not accessed during
that cycle.
0-1 Mux with PC+4 and adder output as inputs.
The next instruction to be fetched can be the next sequential instruction
(PC+4) or the branch target (PC+4 + sign-extended offset<<2).
The branch target is only selected when the instruction is a
beq
and the two register values compared were found to be equal.
1-0 Mux with Jump address and 0-1 Mux output as inputs.
This multiplexor chooses between the jump address and the output
of the previous multiplexor.
The jump address is calculated by concatenating the 28-bit address
left shifted by 2 address field with the 4 most significant bits of the PC.
These 4 MS bits of the PC are assumed to never change.
So there can be at most 2^26 (over 67 million) instructions in a program,
which is almost always enough.
The value selected is controlled by the Jump signal, which is only
asserted when executing a j
instruction.
Remember that unlike branch instructions, jump instructions unconditionally
transfer control to its specified target.
Note that this multiplexor could have been combined with the previous
multiplexor.
1-0 Mux with Read data and ALU result as inputs.
This multiplexor selects whether the Read data or the ALU result will
be sent to the register file.
The MemtoReg is asserted on a load (selects the Read data input)
and is deasserted on an R-type instruction (selects the ALU result).
If a sw
, beq
, or
j
instruction is executing,
then it does not matter whether the MemtoReg signal is asserted
or deasserted since the RegWrite signal will be deasserted.
see Advantages of a Multicycle Implementation
The main advantage is performance.
Some instructions require fewer steps (cycles) than other instructions.
For the single cycle implementation, the cycle time is determined by
the longest instruction.
If we were implementing long running instructions, such as multiplies,
divides, or floating-point operations, then the cycle time would be
significantly increased.
So the total time executed by a program can be reduced.
see Figure 5.34 (top portion): The action caused by the setting of each control
signal in Figure 5.33 on page 383.
The 1-bit control signals are used when 2 possible actions are needed.
see Figure 5.34 (bottom portion): The action caused by the setting of each
control signal in Figure 5.33 on page 383.
The 2-bit control signals are used when 3 or 4 actions are required.
see Multicycle Implementation Steps of Execution
The execution of an instruction will take at most 5 steps (clock cycles).
Branches and jumps complete in 3 steps, R-type and
sw
instructions complete in 4 steps, and lw
instructions complete in 5 steps.
Note that all actions within a single step are performed in parallel.
see F5.35: Summary of the steps taken to execute any instruction class.
Click in a table entry within the table to have the
script frame automatically scrolled to an explanation of that step.
Summary of table.
This table shows the actions taken at the different steps of each
type of instruction.
These actions are illustrated as effects on registers, memory,
internal registers, etc.
This figure illustrates that classes of instructions can
require different number of cycles to complete.
Step 1.
Step 1 always fetches an instruction from memory and stores the instruction
in the instruction register.
Step 1 also always increments the PC to point to the next
sequential instruction.
Sometimes the PC will be updated with a different value in a later
cycle of the instruction's execution.
However, at this point the processor does not know the type of instruction
and the PC is incremented in case the next sequential address will
be needed.
Step 2.
Step 2 always reads the values of two source registers from the register
file specified by the rs and rt fields and store these
values in the internal registers A and B, respectively.
Step 2 also always assigns to the ALUOut the sum of the
PC and the sign-extended least significant 16 bits of the
instruction register that is left shifted by 2.
Thus, the ALUOut contains the branch target address if the
instruction was a branch.
Step 2 determines the type of instruction and and the remaining steps that
are taken depend on the instruction type.
Step 3 for the R-type instructions.
This step performs the arithmetic operation on the values contained in
the internal A and B registers.
The result is placed in the internal ALUOut register.
Step 3 for load or store instructions.
This step sign-extends the 16 least significant bits of the instruction
register, adds that result to the value in the internal A register
and stores the result in the internal ALUOut register.
This result is the address of the data to be dereferenced in memory.
Step 3 for branch instructions.
This step compares the values in the A and B registers
and assigns the value of the ALUOut register to the PC
only if the two values are equal.
Note that there would be an !=
comparison if
the branch was a bne
instead of a
beq
instruction.
Step 3 for jump instructions.
This step left shifts by 2 the 26 least significant bits in the
instruction register, concatenates it with the 4 most significant
bits of the PC, and assigns the result to the PC.
Step 4 for the R-type instructions.
This step assigns the value in the ALUOut internal register
to register in the register file specified by the rd field,
which is in bits [15-11] of the instruction register.
Step 4 for load instructions.
This step loads the value from memory at the address specified in
the internal ALUOut register and copies the value into the
internal MDR register.
Step 4 for store instructions.
This step assigns the value in the internal B register and
stores the value into memory at the address specified in
the internal ALUOut register.
Step 5 for load instructions.
This step copies the value in the internal MDR register to the
register in the register file specified by the rt field,
which is in bits [20-16] of the instruction register.
see Multicycle Control
Now we can design the control for the multicycle implementation.
see Finite State Machine for Multicycle Control
This looks a lot like a finite automata.
However, we also have to specify the output, since we are specifying
more than a recognizer.
Each state in the finite state machine takes a single clock cycle.
Another name for this would be hardwired control.
see Figure C.7: The control unit for the MIPS will consist of some
control logic and a register to hold the states.
The logic determines the signals to assert and the next state.
Appendix C goes into more detail.
see Figure C.8: The logic equations for the control unit shown in a
shorthand form.
The control signals to be asserted or deasserted depend only on what state
the machine is in.
The nextstate signals depend both on the current state and the opcode of
the instruction.
see Microprogramming for Multicycle Control
Microprogramming used to be the dominant method of specifying control.
We will discuss later why it is no longer being used in recently designed
machines.
The dispatch is like a case statement in a high level language.
The control is like control at the macroinstruction level.
Typically, this is accomplished by using the opcode of the macroinstruction
as the target of a microinstruction address, which is a portion of a table
with each entry containing the address of a microroutine to implement the
microinstruction.
The set of microinstructions implementing a macroinstruction is a
microroutine.
The set of all microinstructions is called a microprogram.
see Exceptions and Interrupts
The hardest part of implementing control is dealing with exceptions
and interrupts.
An exception or interrupt causes a change to supervisor mode to permit
actions not allowed in user mode.
Control transfers to a location in the O.S. code, so this mode is safe
since the O.S. is considered safe for that mode.
The address of the exception causing instruction is saved so it can be
restarted later.
see Figure 5.48: The multicycle datapath with the addition needed to
implement exceptions.
Click in a element within the figure to have the
script frame automatically scrolled to an explanation of that element.
Summary of figure.
This figure shows the design of a simple control and datapath within a
processor to support multicycle execution of nine MIPS instructions
(lw
, sw
,
add
, sub
,
and
, or
,
slt
, beq
,
j
).
You should read the explanation in Sections 5.4 before attempting
to understand this figure.
The main differences between the multicycle implementation and the
single cycle implementation are (a) a single memory for both instructions
and data, (b) the elimination of two adders since this work is now done
by the ALU, (c) internal registers to save state about an instruction
between cycles.
In fact, the existence of temporary registers gives a clue where
data is stored at the end of one cycle and is read at the beginning of
the next cycle.
The multicycle implementation requires more complicated control since
different control signals need to be asserted in the different cycles
of an instruction.
More details about the control for a multicycle implementation is described
in Figure 5.50.
PC.
The PC (program counter) is a 32-bit register used to hold the address
of the current instruction.
0-1 Mux with PC and ALUOut as inputs.
This multiplexer selects between the PC and the output of the ALU for
the address to access memory.
Memory.
There is only a single memory in the multicycle implementation.
One input to the memory unit is the 32-bit address from the multiplexor
output.
If the instruction was a store, then the MemWrite signal is asserted
and the value to be written is obtained from the Write data input
port.
If the instruction is a load or the instruction needs to be fetched, then the
MemRead signal is asserted and the value to be read is written to the
MemData port.
OR gate.
This gate is used to determine if the PC should be updated during this cycle.
The possible reasons for updating the PC include incrementing its value by 4
to point to the next sequential instruction, performing an unconditional jump,
or having a conditional branch be taken.
AND gate.
This gate is used to determine if the PC should be updated due to a conditional
branch being taken.
A conditional branch is only take when both the PCWriteCond is
asserted (due to the beq
conditional branch
instruction being executed in a specific cycle) and the Zero signal
asserted (due to two registers having the same value).
Instruction register.
The instruction register is an internal register that contains the
bits of the instruction fetched from memory.
It is only updated when the IRWrite signal is asserted.
Memory data register.
The memory data register is an internal register that contains the
data fetched from memory.
There is no control signal into this unit, so the value in the register
must be used on succeeding cycle or it will be overwritten.
Control.
The input to the control unit is the 6-bit opcode field from the instruction.
The criteria for asserting or deasserting every control signal is based on
the opcode field.
These control signals are used to select the values that are output from each
multiplexor.
What is not shown in this figure is that control signals are not asserted
or deasserted based on the opcode field alone.
The relative cycle for the instruction also plays an important role.
How this can be accomplished is described in a later figure.
0-1 Mux with Instruction [20-16] and Instruction [15-11] as inputs.
This multiplexor indicates which register is to be written to the register
file.
For R-type instructions (add
,
sub
, and
,
or
,
and slt
) the rd
field in bits [15-11] of the instruction indicates the register destination.
For the lw
instruction, the destination register
is in the rt field, which is in bits [20-16].
The multiplexor is controlled by the RegDst signal, which uses the
instruction opcode to determine which field should be selected.
0-1 Mux with Memory data register and ALUOut as inputs.
This multiplexor indicates which value is to be written to the register
file.
One input can come from the Memory data register, which obtained
its value as the result of a load from memory.
The other input is from the ALUOut, which is the result of an
R-type (and
, or
,
add
, sub
, or
slt
) instruction.
This multiplexor is controlled by the MemtoReg signal.
Registers.
The register file contains the 32 MIPS general-purpose (integer/address)
registers.
One can view the register file as an array of registers.
An R-type instruction uses two register values as input and one register
as output.
Thus, there are two register read data ports and one register write data port.
Each of these data ports are 32 bits in width since registers contain
32 bits each.
Besides the write data port, there are also 3 other inputs to the register
file.
Two of these inputs specify which two registers to be read and one specifies
the register to be written.
Each of these inputs are 5 bits wide, which allows one of 32 registers to
be specified.
Two values are read from register file on every cycle, even though sometimes
one or both of these values may not be used.
However, we don't want a write into the register file to occur on some
instructions (sw
and
beq
).
The RegWrite signal controls whether the data received in the write
data port will be written into the Write register.
Sign Extend.
Sign extension is required for all instructions using the I-format, which
includes loads, stores, and branches.
The value being extended is the bits in the address field (the
least significant 16 bits of the instruction).
The output of the sign extension will eventually be input to some type of
adder, which needs two 32-bit values as input.
For loads and stores, we can have a negative displacement from the register
value.
For instance, negative displacements are often used when accessing local
values from a stack pointer.
Backward branches, such as those found in loops, will need a negative
displacement from the program counter.
Shift left 2 with sign-extended value as input.
A branch instruction contains an offset in instructions, not bytes.
This offset needs to be added to the PC+4 value, which is a byte
address.
Thus, we need to convert the instruction offset to a byte offset.
This is accomplished by left shifting the value by 2 since each
instruction is 4 bytes in length.
A.
This is an internal register that contains the value of the register
loaded from the register file that is referred to by the rs field.
This register is used to keep this value between two consecutive clock cycles.
B.
This is an internal register that contains the value of the register
loaded from the register file that is referred to by the rt field.
This register is used to keep this value between two consecutive clock cycles.
0-1 Mux with PC and adder output as inputs.
The next instruction to be fetched can be the next sequential instruction
(PC+4) or a branch target (PC+4 + sign-extended offset << 2).
In either case the PC must at some point be incremented.
Most instructions eventually perform some type of operation on the register
specified by the rs field.
The ALUSrcA signal decides if the first input to the ALU will
come from the PC or the register file.
0-1-2-3 Mux.
This multiplexor decides whether the input is from the second register value
obtained from the register file (e.g. for R-type instructions), a constant
of 4 to increment the PC to point to the next sequential instruction, the
sign-extended value used as an offset from a register in a displacement
address for loads and stores, or the left-shifted by 2 sign-extended value,
which is used as an offset for branches.
The ALUSrcB 2-bit signal is used to select from among these 4 possible
inputs.
ALU Control.
The ALU control unit decides which type of result will be output
from the ALU.
One input to the ALU control unit is the ALUOp, which is a 2-bit
control signal indicating a 00 (add for loads and stores), a 01 (subtract for
branches), and a 10 (use the funct field).
Another input is the funct field.
Remember that for R-type instructions, the opcode field is always
zero and the funct field is used to determine the type of operation
to perform.
The output of the ALU control unit is a 3-bit field that is fed into the
ALU to select the operation to be performed.
While the number of bits used for the encoding of the ALUOp and
ALU Control output signals is determined by the number of operations
it controls, the encoding of these signals is really arbitrary.
ALU.
The ALU is the arithmetic/logic unit.
It is used to perform all operations for R-type instructions,
performs addition for loads and stores, and performs a subtraction
for branches.
The two input values are the value of the register specified by the
rs field and either the value of the register specified by the
rt field or the sign-extended value.
The result that is output from the ALU is selected by the ALU Control
signal.
Remember that the ALU always performs an AND, OR, ADD.
The value selected for the ALU result from among these operations
(and the LESS operation) depends on the type of instruction being executed.
The other output is the Zero signal, which is asserted when the two
input values are identical.
Shift left 2 with Instruction [25-0] as input.
The format for the jump instruction (as shown in Figure 5.28) has
a 26-bit address in the least significant bits of the instruction.
This field contains a word address.
It needs to be converted to a byte address, so we left shift the address
by 2.
This has the effect of placing zeroes in the two least significant bits.
Remember that all MIPS instructions are 4 bytes long.
They are also aligned on a 4 byte boundary.
In other words, the address of each instruction is an integer multiple of
4, which means that the two least significant bits of these addresses are
always zero.
ALUOut.
This is an internal register to hold the output of the ALU between
clock cycles.
0-1 Mux selecting an input for the Cause register.
This multiplexor selects an input for the Cause internal register,
which depends on the IntCause signal indicating the type of exception.
0-1-2-3 Mux selecting an input to be routed to the PC.
The next instruction to be fetched can be the next sequential instruction
(PC+4), the branch target (PC+4 + sign-extended_offset<<2), the jump target
(PC[31-28] | jump_address <<2), or the address for the exception handling
routine.
The branch target is only selected when the instruction is a
beq
and the two register values compared were found to be equal.
The jump address is calculated by concatenating the 28-bit address
left shifted by 2 address field with the 4 most significant bits of the PC.
These 4 MS bits of the PC are assumed to never change.
So there can be at most 2^26 (over 67 million) instructions in a program,
which is almost always enough.
The value selected is controlled by the PCSource signal, which is a
2-bit signal that indicates one of 3 values.
The exception handler address is set when an exception occurs.
EPC.
This is an internal register that contains the address of the instruction
that caused the exception.
The reason this is needed in case the operating system can handle the
exception without aborting the program and it needs to transfer control
back to the program.
Cause.
This is an internal register that contains a code indicating the cause of
the exception.
The exception handling code looks at the cause register to decide which portion
of the exception handler should be used to handle the exception.
see Figure 5.50: This shows the finite state machine with the additions
to handle exception detection.
Click in a state (bubble) within the figure to have the
script frame automatically scrolled to an explanation of that state.
Summary of figure.
This figure shows the representation for a finite state machine to control
the datapath of the multicycle implementation (Figure 5.33).
Each bubble represents a state.
The arcs represent transitions between states.
When there is more than one transition from a state, the transition is
labeled to indicate on what condition that transition is made.
Within each state are the signals that are asserted or deasserted (assigned a
particular set of values for multibit signals).
Note that all paths eventually lead back to the start state (state 0) to
allow the next instruction to be executed.
State 0.
This is the start state.
The MemRead is asserted so a read can occur from memory.
The ALUSrcA is deasserted so that the PC is input to the
ALU so it can be incremented to contain the address of the next
sequential instruction.
The IorD is deasserted so the address used to access memory
comes from the PC.
The IRWrite is asserted so the instruction register will be updated
with the value read from memory.
The ALUSrcB is set so the second input to the ALU is the
value 4 so the PC can be incremented to the next sequential
instruction.
The PCWrite is asserted so the PC will be updated with the address
of the next sequential instruction.
The PCSource is set so the value assigned to the PC is the
immediate output of the ALU.
The next state is always state 1 since the type of instruction has yet to
be determined.
State 1.
The ALUSrcA is set so the first input to the ALU comes from
the PC.
The ALUSrcB is set so the second input to the ALU comes from
sign-extended, left-shifted by 2, 16-bit address field of the
instruction.
The ALUOp is set so the ALU performs an addition.
Thus, the branch target address is calculated and the ALU result
is stored in the ALUOut register.
The next state is determined by the opcode field of the instruction.
State 2.
This state is used for load and store instructions.
The ALUSrcA signal is asserted so the first input to the ALU
comes from the value of the internal A register.
The ALUSrcB signal is set so that the second input to the ALU
comes from the 16-bit sign-extended value.
The ALUOp is set so an addition is performed by the ALU.
The ALU Result is the address to access memory and is placed in the
ALUOut internal register.
The next state depends on whether the opcode indicates if the
instruction was a load or a store.
State 3.
This state is used for load instructions.
The MemRead signal is asserted so a value can be loaded from memory.
The IorD is asserted so the address to access memory comes from the
ALUOut internal register.
The value loaded from memory is stored in the MDR internal register.
State 4.
This state is used for load instructions.
The RegDst is deasserted so that the specified register to update
comes from bits [20-16] (field rt) of the instruction register.
The RegWrite signal is asserted so the register file will be updated.
The MemtoReg signal is asserted so the value to be stored in the
specified register comes from the MDR internal register.
State 5.
This state is used for store instructions.
The MemWrite signal is asserted so that memory will be updated.
The IorD is asserted so the address to access memory comes from the
ALUOut internal register.
This value is stored in memory during this cycle.
State 6.
This state is used for R-type instructions.
The ALUSrcA signal is asserted so the first ALU input comes
from the A internal register.
The ALUSrcB signal is set so that the second ALU input comes
from the B internal register.
The ALUOp signal is set so that the operation performed by the
ALU is determined by examining bits [5-0] (the funct field)
of the instruction register.
The ALU Result is stored in the ALUOut internal register.
State 7.
This state is used for R-type instructions.
The RegDst signal is asserted so that the register updated in the
register file is specified by bits [15-11] (field rd) of the
instruction register.
The RegWrite signal is asserted so that the register file will
be updated.
The MemtoReg signal is deasserted so that the value to store in the
register file comes from the ALUOut internal register.
State 8.
This state is used for branch instructions.
The ALUSrcA signal is asserted so that the first input to the
ALU is from the A internal register.
The ALUSrcB signal is set so that the second input to the ALU
is from the B internal register.
The ALUOp signal is set so that the ALU performs a
subtraction and compares the two values to see if they are equal.
The PCWriteCond signal is asserted so that the Zero signal
be considered in which value to assign to the PC.
The PCSource signal is set so that the value in the ALUOut
internal register is the one that is routed to the PC.
The PC will only be updated if the Zero gets asserted as a
result of the ALU operation.
State 9.
This state is used for jump instructions.
The PCWrite signal is asserted since the PC needs to be
updated.
The PCSource signal is set so that the Jump address is routed
to the PC.
The PC gets updated with the jump target address.
State 10.
This state is used when an undefined instruction is encountered.
The IntCause signal is deasserted to indicate the cause of the
exception.
The CauseWrite signal is asserted so the Cause register will
get updated.
The ALUSrcA signal is deasserted so the first input to the
ALU will come from the PC.
The ALUSrcB is set so that 4 is the second ALU input.
The ALUOp signal is set so that a subtraction occurs.
Remember for the multicycle implementation, the PC has already been
incremented by 4.
So the processor needs to subtract 4 to get the address of the instruction that
caused the exception.
The EPCWrite signal is asserted so the EPC register will be
updated.
The EPC will hold the return address.
The PCWrite signal is asserted to cause the PC to be updated.
The PCSource signal is set so that the address for the exception
handling routine will be routed to the PC.
State 11.
This state is used when an arithmetic overflow occurs.
The IntCause signal is asserted to indicate the cause of the exception.
The CauseWrite signal is asserted so the Cause register will
get updated.
The ALUSrcA signal is deasserted so the first input to the
ALU will come from the PC.
The ALUSrcB is set so that 4 is the second ALU input.
The ALUOp signal is set so that a subtraction occurs.
Remember for the multicycle implementation, the PC has already been
incremented by 4.
So the processor needs to subtract 4 to get the address of the instruction that
caused the exception.
The EPCWrite signal is asserted so the EPC register will be
updated.
The EPC will hold the return address.
The PCWrite signal is asserted to cause the PC to be updated.
The PCSource signal is set so that the address for the exception
handling
routine will be routed to the PC.
see Fallacies and Pitfalls
-
Pitfall: Microcode implementing a complex instruction may not be
faster than a sequence using simpler instructions.
Almost all machines used to be microcoded.
Now few machines are designed today with control implemented via
microprogramming.
The introduction of caches no longer caused the microcode to have a
fetch performance advantage (microcode and cache both have same fetch time).
An optimizing compiler can exploit instructions generated at a similar
level to microcode.
Microcoded machines tend to be slower than hardwired machines.
Also CAD programs can automate most of the process of control, so there
is little advantage to using microcode.
Will understand this more when we go over pipelining.
-
Fallacy: If there is space in the control store, new instructions
are free of cost.
One has to consider the design cost for these new instructions.
Also, upward compatibility of executables would make the manufacturer
keep these instructions even when you need the extra space for other
purposes.