# **Controller**



## Control Logic "Truth Table" (incomplete)

| Inst[31:0] | BrEq | BrLT | PCSel | ImmSel | BrUn | ASel | BSel | ALUSel | MemRW | RegWEn | WBSel | * means "for            |
|------------|------|------|-------|--------|------|------|------|--------|-------|--------|-------|-------------------------|
| add        | *    | *    | +4    | -      | -    | Reg  | Reg  | Add    | Read  | 1      | ALU   | all values"             |
| sub        | *    | *    | +4    | -      | -    | Reg  | Reg  | Sub    | Read  | 1      | ALU   | - means<br>"don't care, |
| (R-R Op)   | *    | *    | +4    | -      | -    | Reg  | Reg  | (Op)   | Read  | 1      | ALU   | use any                 |
| addi       | *    | *    | +4    | 1      | -    | Reg  | Imm  | Add    | Read  | 1      | ALU   | value"                  |
| lw         | *    | *    | +4    | 1      | -    | Reg  | Imm  | Add    | Read  | 1      | Mem   |                         |
| sw         | *    | *    | +4    | S      | -    | Reg  | lmm  | Add    | Write | 0      | -     |                         |
| beq        | 0    | *    | +4    | В      | -    | PC   | lmm  | Add    | Read  | 0      | -     |                         |
| beq        | 1    | *    | ALU   | В      | -    | PC   | Imm  | Add    | Read  | 0      | -     |                         |
| bne        | 0    | *    | ALU   | В      | -    | PC   | Imm  | Add    | Read  | 0      | -     |                         |
| bne        | 1    | *    | +4    | В      | -    | PC   | Imm  | Add    | Read  | 0      | -     |                         |
| blt        | *    | 1    | ALU   | В      | 0    | PC   | lmm  | Add    | Read  | 0      | -     |                         |
| bltu       | *    | 1    | ALU   | В      | 1    | PC   | Imm  | Add    | Read  | 0      | -     |                         |
| jalr       | *    | *    | ALU   | T      | -    | Reg  | lmm  | Add    | Read  | 1      | PC+4  |                         |
| jal        | *    | *    | ALU   | J      | -    | PC   | lmm  | Add    | Read  | 1      | PC+4  |                         |
| E auipc    | *    | *    | +4    | U      | -    | PC   | lmm  | Add    | Read  | 1      | ALU   | 12                      |

Note: Instruction type encoded using only 9 bits inst[30],inst[14:12], inst[6:2]





#### **Controller Realization Options**

ROM (Read-Only Memory)

- Regular structure made from transistors
- Can be easily reprogrammed during the design process to
  - fix errors
  - · add instructions
- Popular when designing control logic manually
- Combinatorial Logic
  - Today, chip designers often use logic synthesis tools to convert truth tables to networks of gates
  - Logic equation for each control signal (common sub-expressions shared among control signal equations)
  - Can exploit output "don't cares" and input "for all values" to simplify circuit.



# **Instruction Timing**



 How can we keep Data-Path resources (ALU ,etc.) busy all the time.

## **Performance Measures**

$$\frac{time}{program} = \frac{instructions}{program} \cdot \frac{cycles}{instruction} \cdot \frac{time}{cycle}$$

#### Instructions per Program

Determined by

$$\frac{\textit{time}}{\textit{program}} = \frac{\textit{instructions}}{\textit{program}} \cdot \frac{\textit{cycles}}{\textit{instruction}} \cdot \frac{\textit{time}}{\textit{cycle}}$$

- Task specification
- Algorithm, e.g. O(N<sup>2</sup>) vs O(N)
- Programming language
- Compiler
- Instruction Set Architecture (ISA)

#### (Average) Clock cycles per Instruction

Determined by

$$\frac{\textit{time}}{\textit{program}} = \frac{\textit{instructions}}{\textit{program}} \cdot \frac{\textit{cycles}}{\textit{instruction}} \cdot \frac{\textit{time}}{\textit{cycle}}$$

- ISA (CISC versus RISC)
- Processor implementation (or microarchitecture)
  - E.g. for "our" single-cycle RISC-V design, CPI = 1
- Pipelined processors, CPI >1 (next lecture)
- Superscalar processors, CPI < 1 (next lecture)</li>

#### Time per Cycle (1/Frequency)

 $\frac{time}{program} = \frac{instructions}{program} \cdot \frac{cycles}{instruction} \cdot \frac{time}{cycle}$ 

#### Determined by

- Processor microarchitecture (determines critical path through logic gates)
- Technology (e.g. 5nm versus 14nm)
- Supply voltage (lower voltage reduces transistor speed, but improves energy efficiency)

#### For some task (e.g. image compression) ...

|                | Processor A | Processor B |  |  |
|----------------|-------------|-------------|--|--|
| # Instructions | 1 Million   | 1.5 Million |  |  |
| Average CPI    | 2.5         | 1           |  |  |
| Clock rate f   | 2.5 GHz     | 2 GHz       |  |  |
| Execution time | 1 ms        | 0.75 ms     |  |  |

Processor B is faster for this task, despite executing more instructions and having a lower clock rate!

$$\frac{energy}{program} = \frac{instructions}{program} \cdot \frac{energy}{instruction}$$

$$\frac{energy}{program} \propto \frac{instructions}{program} \cdot \frac{CV_{dd}^{2}}{}$$

"Capacitance" depends on technology, microarchitecture, circuit details

Supply voltage, e.g. 1V

Want to reduce capacitance and voltage to reduce energy/task

#### **Energy Tradeoff Example**

puter Science 61C

McMahon and Wea

 For instance, "Next-generation" processor (Moore's law):

• Capacitance, C: reduced by 15 %

ullet Supply voltage,  $V_{\sup}$ : reduced by 15 %

• Energy consumption:  $(.85C)(.85V)^2 = .63E = > -39 \%$  reduction

- Significantly improved energy efficiency thanks to
  - Moore's Law AND
  - Reduced supply voltage

# **Pipelining**



#### Latency

Time from entering college to graduation

Serial 4 yearsPipelining 4 years

### Throughput

Average number of students graduating each year

Serial 1000Pipelining 4000

#### Pipelining

- Increases throughput (4x in this example)
- But can never improve latency
  - sometimes worse (additional overhead)

#### Pipelining with RISC-V t<sub>step</sub> Serial t<sub>cycle</sub> Pipelined **Phase Pictogram** IM -Instruction Fetch 200 ps 200 ps 100 ps 200 ps Reg Read Reg **ALU** 200 ps 200 ps ALU Memory 200 ps 200 ps TDM-Register Write 100 ps 200 ps Reg 800 ps 1000 ps instruction sequence t<sub>instruction</sub> IM TOM TOO t<sub>instruction</sub> add t0, t1, t2

or t3, t4, t5

sll t6, t0, t3





# **Pipelining Datapath**

### Pipelining RISC-V RV32I Datapath







Control signals derived from instruction

- As in single-cycle implementation
- Information is stored in pipeline registers for use by later stages

