HW: Single-cycle, Multi-cycle, and Pipelined
In a single-cycle datapath, instructions are executed one at a time, one
instruction in each clock "tick". In other words, the clock cycle time
is set to the time it takes any instruction to fully execute.
In both a multi-cycle datapath and a pipelined datapath, a clock "tick"
is the time an instruction spends in any given stage of the datapath.
Each stage has a latency time associated with it, which is the amount of
time required for the stage to do its job. An example set of stages,
with latencies, might be:
- IF: Instruction fetch (200ps)
- ID: Instruction decode and register file read (100ps)
- EX: Execution or address calculation (e.g., time spent in
- MEM: Data memory access (225ps)
- WB: Write back to register file (100ps)
In a multi-cycle datapath, instructions are executed one at a time, but
only go through the stages that they need to. (E.g., R-format
instructions do not have to go through MEM, because they don't read or
write to memory.) In a pipelined architecture, as each instruction
moves to the next stage, the following instruction moves into the stage
Based on the information above, complete the following exercises:
- A stage's latency describes the minimum amount of time an
instruction would have to spend in that stage, but, depending
on the datapath type, an instruction might spend longer in
a stage than its latency requires. In which datapath(s)
does each instruction spend the minimum amount of time in
each stage? In which datapath(s) does each instruction
spend the same amount of time in each stage, regardless of
the stage's latency?
- Given the latencies described above, what would be the clock
cycle time for a single-cycle datapath? For a multi-cycle
datapath? For a pipelined datapath?
- Which two datapaths would have the same clock cycle?
- How long would a
sw instruction take in each of these
- How long would a
beq instruction take in each of these
- In which datapath(s) do all instructions take the same amount of time?
- Does pipelining improve the execution time of individual
instructions, improve throughput, or both? (What is throughput?)
- In a pipelined datapath, why does it make sense to require all
instructions to go through all the stages of the datapath,
whether they are necessary or not? (E.g., R-format
instructions have to go through MEM, even though they don't
read or write to memory.)
- Which datapath(s) would become faster if the latency of one
of the shorter stages became shorter? Would that speed up all
instructions, or only some instructions?
- If we could split one stage of the pipelined datapath into
two new stages, each with half the latency of the original
stage, which stage would you split and what would be the new
clock cycle time of the processor?
- In calculating the throughput for programs, for which
datapath(s) would it be useful to know what percentage of
instructions are typical R-format instructions, what percentage
are branches, and what percentage are
- Assume that the execution of a large program involves the
following percentages of different types of instructions:
Assuming there are no stalls or hazards, how long should a
single-cycle datapath processor with the latencies specified
above take to execute 10000 instructions? What about a
multi-cycle datapath processor? What about a pipelined
- "Regular" R-format instructions: 45%
- Branch (
bne) instructions: 20%
- Memory read (
lw) instructions: 20%
- Memory write (
sw) instructions: 15%
- Other instructions (e.g., jumps, etc): an insignificant amount
- Would you expect actual empirical test results to match
your calculations? Would you expect all three architecture
types (single-cycle, multi-cycle, pipelined) to be equally
similar or different, or would you expect bigger differences
with some architectures than others? Why or why not?