Lecture notes of EECS 151 Fall 2022 & Spring 2024 @ UC Berkeley by Prof.Shao & Prof. Wawrzynek
Author:Peiqi(Stefan) Tian
Notice
: LLMs are used for generating some LaTeX code from slides and explaining some concepts.
在芯片设计中,non-recurring engineering (NRE) costs 指的是一次性工程费用。这些费用是在芯片设计和开发过程中产生的,通常只支付一次,而不是重复发生的。NRE成本涵盖了从设计到制造初期的各种费用,具体包括:
设计费用:如架构设计、RTL编写、验证等。
工具费用:EDA(电子设计自动化)工具的使用许可。
IP核费用:购买第三方知识产权核(如处理器核、接口IP等)。
原型制造费用:流片(Tape-out)成本,包括掩模制作和试生产。
测试费用:芯片原型的功能和性能测试。
NRE成本通常较高,尤其是在先进工艺节点(如7nm、5nm等)下。然而,一旦支付,后续量产的单位成本会显著降低。因此,NRE成本是芯片开发初期的重要投资。
摩尔定律:固定芯片面积下,晶体管尺寸减小,信号传输时间减少,增加时钟频率。
Dennard Scaling,是1974年由Robert H. Dennard及其团队提出的一项关于晶体管尺寸缩小的理论。该定律指出,随着晶体管尺寸的缩小,其功率密度保持不变,即在缩小尺寸的同时,电压和电流也成比例降低,从而保持功耗不变。于2000年左右失效
。
尺寸缩小:
晶体管的尺寸(如沟道长度和宽度)按比例缩小,通常缩小因子为
如果
电压和电流降低:
电压
功耗密度不变:
功耗
Device or Circuit Parameter | Scaling Factor |
---|---|
Device dimension | |
Doping concentration | |
Voltage | |
Current | |
Capacitance | |
Delay time per circuit | |
Power dissipation per circuit | |
Power density |
推导
假设晶体管的尺寸(如沟道长度和宽度)按比例缩小,缩小因子为
根据 Dennard Scaling,电压
单个晶体管的功耗
功耗密度
缩小后的功耗密度
Digital Design : Given a functional description and performance
, cost
, & power
constraints, create an implementation using a set of primitives.
Digital systems are implemented as a interconnection of combinational logic
and state elements
Pre-verified block designs, standard bus interfaces (or adapters) ease integration - lower NREs, shorten TTM(Time to Market)
Brings together: standard cell blocks, custom analog blocks, processor cores, memory blocks, embedded FPGAs, …
Standardized on-chip buses (or hierarchical interconnect) permit “easy” integration of many blocks.
SoC (System on Chip) 是一种将多个电子系统组件集成到单个芯片上的集成电路。通常包含处理器、内存、输入/输出接口、数字信号处理器 (DSP)、图形处理单元 (GPU) 等组件,形成一个完整的系统。
HW design challenge
The HW systems we find useful are huge and the functions we implement are complex
Hardware is inherently parallel (the secret sauce for high performance). Correct design of parallel systems requires management of timing and synchronization.
The technology we use imposes physical constraints that effect cost, speed, and power.
Design Challenges are met by using layers of abstractions.
x1Specification (e.g., in plain text ,RV spec)
2 ↓
3Model (e.g., in C/C++ QEMU)
4 ↓
5* Architecture (e.g., in-order/out-of-order,pipeline/single cycle)
6 ↓
7* RTL Logic Design (e.g., in Verilog)
8 ↓
9* Physical design (schematic, layout; ASIC, FPGA)
10 ↓
11Manufactured part
12
13Validation: Have we built the right thing? (is model implementing the specification and meeting the performance?)
14Verification: Have we built the thing right? (is logic/physical design correct))
Full-custom:
Common in analog design
High NRE
ASIC:
Based around a set of pre-designed (and verified) cells
FPGA
Microprocessor
• Output a function only of the current inputs (no history).
• Truth-table representation of function. Output is explicitly specified for each input combination.
• In general, CL blocks have more than one output signal, in which case, the truth-table will have multiple output columns.
Peer Instruction: Total number of possible truth tables with 4 inputs and 1 output is : 65,536
My solution:
Each truth table has 16 rows, considering the output column as a set
• Output is a function of both the current inputs and the state
.
• State represents the memory.
• State is a function of previous inputs
.
• In synchronous digital systems, state is updated on each clock tick.
Any synchronous digital circuit can be represented with:
• Combinational Logic Blocks (CL), plus
• State Elements (registers or memories)
• Clock orchestrates sequencing of CL operations
The mapping of a continuous
variable onto a discrete
binary variable is done by defining logic levels.
Suppose the lowest voltage in the system is
xxxxxxxxxx
21 ------|>o-------|>o-----
2 Driver Receiver
The driver produces aLOW (0)
output in the range ofHIGH (1)
output in the range of LOW
. If the receiver gets an input in the range of HIGH
. If, for some reason such as noise or faulty components, the receiver’s input should fall in the forbidden zone inunpredictable
.
The noise margin (
A necessary property of any suitable technology for logic circuits is "Restoration” or “Regeneration”
Circuits need:
to ignore noise and other non-idealities at the their inputs, and
generate "cleaned-up" signals at their output.
• Describes the output voltage as a function of the input voltage.
• To choose logic levels -> slope = -1 -> maximize noise margin
NRE(fixed cost)
Recurring costs (variable cost) : Cost to manufacture, test and package a unit, hence proportional to the product volume
Assuming the wafer is circular, and the die is square-shaped.
One simple yield model assumes a uniform density of randomly occurring point defects
as the cause of yield loss. If the wafer has a large number of chips (Poisson's distribution
:
, where
If
where average number of defects per chip
, is:
Thus, the yield
Throughput, e.g., FLOPS.
Peak
average
Latency
Average
tail, e.g., 99th percentile CPU latency.
propagtion delay of
Defined as 50%
transition points of the input and output waveforms.
High-to-low transition:
Low-to-high transition:
On the rising edge of the clock, the input d
is sampled and transferred to the output q
. At all other times, the input d is ignored.
Limitations: ff cannot change their outputs instantaneously. Time is need to transfer inputs internally.
Therefore, the input before
the rising edge and remain stable
for a short amount of time after
the edge.
These two called: setup time and hold time.
Setup time mainly prevent sampling during the input is at rising edge.
During this time window, the input shall not change .
Once the flip-flop captures the new input, it also takes a small amount time to transfer the new value to output
. This delay is called clk-to-q
delay.
Find the Critical Path(Graph Theory)
Make it shorter
Make sure
Energy (joules
Power (watts
将电容
动态功耗为
😭 Silly me
See this website.
Simulation is the process of using software to emulate the behavior of a hardware
design to verify its correctness.
Synthesis is the process of converting HDL code into a gate-level netlist
, which consists of basic components like logic gates and flip-flops.
HDLs originally invented for simulation.
Structural Verilorg
List of sub-components and how they are connected
Behavioral Verilog
Describe what a component does, not how it does it
Result is only as good as the tools
Common approach is to use behavioral descriptions for “leaf cells
” and structural to build hierarchy
.
In a Verilog "continuous assignment
" (assign lhs = rhs;
), the value of the signal on the right side is driven onto the wire on the left side. The assignment is "continuous" because the assignment continues all the time
even if the right side's value changes. A continuous assignment is not a one-time event.
+
: Add
-
: Subtract
*
: Multiply
/
: Divide
%
: Modulus
!
: Logical negation
&&
: Logical AND
||
: Logical OR
>
: Greater than
<
: Less than
>=
: Greater than or equal
<=
: Less than or equal
==
: Equality
!=
: Inequality
~
: Bitwise negation
&
: Bitwise AND
|
: Bitwise OR
^
: Bitwise XOR
Reduction operators also exist for AND
, OR
, and XOR
that have the same symbol as the bitwise operators.
<<
: Shift left logical
>>
: Shift right logical
<<<
: Arithmetic left shift
>>>
: Arithmetic right shift
{}
: Concatenation
{{}}
: Replication : sign extensing a small numer into a wider one.
[MSB:LSB]
: Indexing/Slicing
( condition ) ? ( exp if condition is true ) : ( exp if condition is false )
A note onwire vs. reg
: Theleft-hand-side
of an assign
statement must be a net
type (e.g., wire
), while the left-hand-side
of a procedural assignment (in an always block
) must be a variable
type (e.g., reg
). These types (wire vs. reg) have nothing to do with what hardware is synthesized, and is just syntax left over
from Verilog.
WARNING: for
and while
loops can’t be mapped to hardware! These statements are valid verilog ( and can be simulated ) , but cannot always be mapped to hardware.
Nested if structure leads to “priority logic” structure, with different delays for different inputs. Use case
instead.
Generate loops
are used to iteratively instantiate modules.
Implicit nets are often a source of hard-to-detect bugs. In Verilog, net-type signals can be implicitly created by an assign
statement or by attaching something undeclared to a module port. Implicit nets are always one-bit wires and causes bugs if you had intended to use a vector. Disabling creation of implicit nets can be done using the `default_nettype none
directive
xxxxxxxxxx
61wire [2:0] a, c; // Two vectors
2assign a 3'b101; // a = 101
3assign b a; // b = 1 implicitly-created wire
4assign c b; // c = 001 <-- bug
5my_module i1 (d,e); // d and e are implicitly one-bit wide if not declared.
6 // This could be a bug if the port was intended to be a vector.
Generate loops are used to iteratively instantiate modules. This is useful with arameters or when instantiating large numbers of the same module.
xxxxxxxxxx
61wire [3:0] a, b; genvar i;
2core first_one (1'b0, a[i], b[i]);
3// programmatically wire later instances generate
4for (i 1; i < 4 ; i i 1) begin:name_of_this_loop
5 core generated_core (a[i], a[i1], b[i]);
6end endgenerate
Verilog modules may include parameters in the module definition.
xxxxxxxxxx
171module adder #(parameter width32)
2 (input [width1:0] a, input [width1:0] b, output [width:0] s);
3
4 s a b;
5
6endmodule
7
8module top();
9 localparam adder1width 64;
10 localparam adder2width 32;
11 reg [adder1width1:0] a,b;
12 reg [adder2width1:0] c,d;
13 wire [adder1width:0] out1;
14 wire [adder2width:0] out2;
15 adder #(.width(adder1width)) adder64 (.a(a), .b(b), .s(out1));
16 adder #(.width(adder2width)) adder32 (.a(c), .b(d), .s(out2));
17endmodule
Multidimensional Nets in Verilog
xxxxxxxxxx
71//creates a net called <netname> and describes it as an arra y of ( N + 1 ) elements,
2//where each element is a ( M + 1 ) bit number.
3reg [M:0] <netname> [N:0];
4// A memory structure that has eight 32-bit elements
5reg [31:0] fifo_ram [7:0];
6fifo_ram[2] // The full 3rd 32-bit element
7fifo_ram[5][7:0] // The lowest byte of the 6th 32-bit element
Clocked always blocksalways@(posedge clk)
create a blob of combinational logic just like combinational always blocks, butalso creates a set of flip-flops
(or "registers") at the output
of the blob of combinational logic. Instead of the outputs of the blob of logic being visible immediately, the outputs are visible only immediately after the next clk
.
Combinational circuits must have a value assigned to all outputs under all conditions. This usually means you always need else clauses or a default value assigned to the outputs.
Nested if structure leads to “priority logic” structure: with different delays for different inputs, Case version treats all inputs the same.
xxxxxxxxxx
131always @(posedge clk) begin
2 q1 < in;
3 q2 < q1; // use old q1
4 out < q2; // use old q2
5end
6
7always @(posedge clk) begin
8 q1 in;
9 q2 q1; // use new q1
10 out q2; // use new q2
11end
12
13//“old” means value before clock edge, “new” means the value after most recent assignment)
Ripple carry adder is that the delay for an adder to compute the carry out (from the carry-in, in the worst case) is fairly slow, and the second-stage adder cannot begin computing its carry-out until the first-stage adder has finished.
This makes the adder slow.
One improvement is a carry-select adder, shown below. The first-stage adder is the same as before, but we duplicate the second-stage adder,
one assuming carry-in=0 and one assuming carry-in=1, then using a fast 2-to-1 multiplexer to select which result happened to be correct.
求二进制求补码:把每个
Identities:
Idempotence:
Complements:
Commutative:
Associative:
Distributive:
Absorptive:
Duality:
AND
Leave literals unchanged
DeMorgan's Law
Bubble Pushing
• Pushing a bubble from input through the gate
• Bubble comes out in the output
• The gate flips from AND to OR or vice versa.
Sum of Products (SOP)
Disjunctive normal form, minterm expansion.
Minterm: a product (AND) involving all inputs for the term to be 1.
Product of Sums (POS)
conjunctive normal form, maxterm expansion.
Maxterm: a sum (OR) involving all inputs for the term to be 0.
Can obtain POSs from applying DeMorgan’s law to the SOPs of F (and vice versa)
Theorem
: Any combinational logic function can be implemented as a networks of logic gates.
The binary-reflected Gray code list for n bits can be generated recursively from the list for n − 1 bits by reflecting the list (i.e. listing the entries in reverse order), prefixing the entries in the original list with a binary 0, prefixing the entries in the reflected list with a binary 1, and then concatenating the original list with the reversed list.
Each square differs from an adjacent square by a change in a single variable(Gray code).
Use the fewest circles necessary to cover all the 1’s.
All the squares in each circle must contain 1’s.
Each circle must span a rectangular block that is a power of 2 (i.e., 1, 2, or 4) squares in each direction.
Each circle should be as large as possible.
A circle may wrap around the edges of the K-map.
A 1 in a K-map may be circled multiple times if doing so allows fewer circles to be used.
Squares with don't care entry can be treated as 1
or 0
on demand simplify circuit.
Using multiple levels (more than 2) will reduce the cost. Sometimes also delay. Sometimes a tradeoff between cost and delay.
NAND would be used in place of all ANDs and ORs.
No convenient hand methods exist for multi-level logic simplification:
CAD tools use sophisticated algorithms and heuristics.
These problems tend to be NP-complete
Humans and tools often exploit some special structure (example adder)
Can model behavior of any
sequential circuit.
The FSM follows exactly one edge per cycle.
Moore:outputs depend only
on current state. Both edges of output follow the clock.
Mealy:outputs depend on current state and inputs
. Output rises with input rising edge and is asynchronous
with the clock, output fails synchronous with next clock edge, the output timing behavior of the Moore machine can be achieved in a Mealy machine by “registering” the Mealy output values.
States: node
Outputs: Labled in each node
Inputs: Labled in each arc.
Current State | Inputs | Next State |
---|---|---|
Eat | Feeding | Eat |
Eat | Petting | Sleep |
Sleep | Feeding | Sleep |
Sleep | Petting | Annoyed |
Annoyed | Feeding | Eat |
Annoyed | Petting | Annoyed |
Encode each state
State | Encoding |
---|---|
Eat | 00 |
Sleep | 01 |
Annoyed | 10 |
Input | Encoding |
---|---|
Feeding | 0 |
Petting | 1 |
S1 (Current state[1]) | S0(Current state[0] | X(Input) | S1'(Next State[1]) | S0'(Next State[0]) |
---|---|---|---|---|
0 | 0 | 0 | 0 | 0 |
0 | 0 | 1 | 0 | 1 |
0 | 1 | 0 | 0 | 1 |
0 | 1 | 1 | 1 | 0 |
1 | 0 | 0 | 0 | 0 |
1 | 0 | 1 | 1 | 0 |
S1 | S0 | E(Eye open) | M (Mouth open) |
---|---|---|---|
0 | 0 | 1 | 1 |
0 | 1 | 0 | 0 |
1 | 0 | 1 | 0 |
States which are closed to each other had better to have close encoding.
An alternative approach is using One-Hot encoding. But can cost more ffs.
xxxxxxxxxx
261/*State*/
2REGISTER_R #(.N(2), .INIT(ZERO)) state(.q(ps), .d(ns), .rst(rst));
3/*Next state & Output*/
4always @() begin
5 case (ps)
6 ZERO: begin
7 out 1’b0;
8 if (in) ns CHANGE;
9 else ns ZERO;
10 end
11 CHANGE: begin
12 out 1’b1;
13 if (in) ns ONE;
14 else ns ZERO;
15 end
16 ONE: begin
17 out 1’b0;
18 if (in) ns ONE;
19 else ns ZERO;
20 end
21 default: begin
22 out 1’bx;
23 ns default;
24 end
25 endcase
26end
Basic idea: Two-dimensional array of logic blocks and flip-flops with means for the user to configure.
latch is a basic memory element used to store one bit of information.
When the control signal is active, the latch is transparent, meaning the output follows the input. When the control signal is inactive, the latch holds the last value.
Most FPGAs have “SRAM based(latch)” programmability.
Based on PYNQ-Z1
Basic FPGA functional
unit: Implements both combinational and sequential logic, includes,
LUT: implemented with Latches/SRAM, which uses a MUX to select the output value from stored truth table entries, with input signals acting as the MUX's selection lines.
FF
MUX
A CLB element contains a pair
of slices
, and each slice is composed of four 6-input LUTs and eight storage elements. The CLBs are arranged in columns in the 7 series FPGAs.
Multiple LUTs can be concatenated using MUX
by using the higher significant bits of the inputs as the selection signals to implement functions with more inputs than a single LUT.
SLICEL(LOGIC):only be used for logic
SLICEM(MEMORY, additional memory elements): can be also used as memory/shift registers.
Each interconnection has a transistor switch. Each switch is controlled by 1-bit configuration register (bitstream).
NetList is typically a 3-D graph, tools have to figure out the optimal placement and wiring on the fpga board.
LOGIC
BRAM: Used for storing large amounts of data
DSPs: Singal processing
CLOCKING
IO
Serial I/O + PCI
See this video.
A silicon atom has four electrons in its outermost shell, which can form covalent bonds with four adjacent silicon atoms. Since these electrons are bound by covalent bonds and cannot move freely, pure silicon does not conduct electricity. This pure form of silicon is known as an intrinsic semiconductor.
When phosphorus (P) is doped into silicon, phosphorus has five electrons in its outermost shell. Four of these electrons form covalent bonds with silicon, while the extra electron becomes a free electron. Since the charge carriers are primarily free electrons, this doped semiconductor is known as an N-type semiconductor.
When boron (B) is doped into silicon, boron has three electrons in its outermost shell. When a boron atom replaces a silicon atom in the crystal lattice, it can only form three complete covalent bonds with the surrounding four silicon atoms, leaving the fourth bond with a vacancy (i.e., a hole) due to the lack of an electron. This hole can easily attract electrons from neighboring atoms to fill it, hence it is referred to as a hole.
Origin of the Hole's Positive Charge
Principle of Electrical Neutrality: Each silicon atom originally contributes 4 electrons (electrically neutral), while a boron atom contributes only 3 electrons, resulting in a net positive charge (+e) at its location.
Nature of Holes: A hole is the absence of an electron in a covalent bond. When a neighboring electron moves to fill this vacancy, it is equivalent to the hole moving in the opposite direction. Since electrons carry a negative charge (-e), the absence of an electron is equivalent to a positively charged carrier (+e).
By doping both ends of the same silicon wafer, one end forms an N-type semiconductor, and the other end forms a P-type semiconductor. Before contact, both N-type and P-type semiconductors are electrically neutral (number of outer electrons = number of protons). After contact, due to the difference in electron concentration, electrons spontaneously diffuse
from the N-type semiconductor to the P-type semiconductor. This causes the N-type semiconductor to become positively charged due to the loss of electrons, while the P-type semiconductor becomes negatively charged due to the gain of electrons, resulting in an electric field directed from the N-type to the P-type near the contact surface.
Under the influence of this electric field, some electrons move toward the N-type region, a motion known as drift
. Eventually, the drift motion and the diffusion motion reach a dynamic equilibrium, and this region is called the PN junction.
When an external electric field is applied to the semiconductor, directed from the P-side to the N-side, as a result, the PN junction in the N-region narrows due to the replenishment of electrons, while the PN junction in the P-region narrows as electrons are removed by the electric field, leading to an increase in holes. Eventually, the PN junction disappears. Free electrons from the N-region pass through the holes in the P-region and eventually return to the power source. The PN junction only conducts when the external electric field strength exceeds the built-in electric field of the PN junction
, with a typical threshold voltage of 0.7V.
When the power supply is reverse-connected, the remaining electrons in the N-region move toward the positive terminal of the power supply, while the holes in the P-region are filled by electrons from the negative terminal, resulting in a reduction of charge carriers and preventing conduction. As the internal electric field strengthens, a small number of electrons still undergo drift motion. When the external voltage increases further, avalanche breakdown occurs. Although avalanche breakdown itself does not directly damage the semiconductor, the drifting electrons collide with covalent bonds, generating new holes and free electrons. This leads to a sharp increase in current and produces significant heat, potentially causing the semiconductor to burn out.。
See this video.
As shown in the figure, a single crystal silicon wafer is doped to form two N-type(Red
) semiconductor and a P-type (Yellow
)semiconductor. At the interface between the N-type and P-type regions, a PN junction is formed, with its built-in electric field directed from the N-region to the P-region.
A capacitor is added and connected to the positive and negative terminals of the power supply. The capacitor generates an electric field, under the influence of which a large number of electrons move upward, filling some of the holes. The remaining free electrons cause a portion of the P-type semiconductor to transform into an N-type semiconductor, enabling conduction. The region that transitions from P-type to N-type is called the N-channel.
without capacitor, one of the pn junction will be widen up(reversed biased), cannot conduct.
Because build in electrical feild in pn junction will canceling some gate votage.
If both the pull-up and pull-down networks were ON simultaneously, a short circuit would exist between
When
When
Triode mode or linear region
When
The transistor is turned on, and a channel has been created which allows current between the drain and the source. The MOSFET operates like a resistor
, controlled by the gate voltage relative to both the source and drain voltages.
When
The switch is turned on, and a channel has been created, which allows current between the drain and source. Since the drain voltage is higher than the source voltage, the electrons spread out, and conduction is not through a narrow channel but through a broader, two- or three-dimensional current distribution extending away from the interface and deeper in the substrate. The onset of this region is also known as pinch-off to indicate the lack of channel region near the drain
. Although the channel does not extend the full length of the device, the electric field between the drain and the channel is very high, and conduction continues.
The drain current is now weakly dependent upon drain voltage and controlled primarily by the gate-source voltage.