Lecture notes of EECS 151 Fall 2022 & Spring 2024 @ UC Berkeley by Prof.Shao & Prof. Wawrzynek

Author:Peiqi(Stefan) Tian

Notice: LLMs are used for generating some LaTeX code from slides and explaining some concepts.

LEC 01 Intro

在芯片设计中,non-recurring engineering (NRE) costs 指的是一次性工程费用。这些费用是在芯片设计和开发过程中产生的,通常只支付一次,而不是重复发生的。NRE成本涵盖了从设计到制造初期的各种费用,具体包括:

  1. 设计费用:如架构设计、RTL编写、验证等。

  2. 工具费用:EDA(电子设计自动化)工具的使用许可。

  3. IP核费用:购买第三方知识产权核(如处理器核、接口IP等)。

  4. 原型制造费用:流片(Tape-out)成本,包括掩模制作和试生产。

  5. 测试费用:芯片原型的功能和性能测试。

NRE成本通常较高,尤其是在先进工艺节点(如7nm、5nm等)下。然而,一旦支付,后续量产的单位成本会显著降低。因此,NRE成本是芯片开发初期的重要投资。

摩尔定律:固定芯片面积下,晶体管尺寸减小,信号传输时间减少,增加时钟频率。

Dennard Scaling,是1974年由Robert H. Dennard及其团队提出的一项关于晶体管尺寸缩小的理论。该定律指出,随着晶体管尺寸的缩小,其功率密度保持不变,即在缩小尺寸的同时,电压和电流也成比例降低,从而保持功耗不变。于2000年左右失效

image-20250224163853123

image-20250224163932997

主要内容

  1. 尺寸缩小

    晶体管的尺寸(如沟道长度和宽度)按比例缩小,通常缩小因子为 κκ>1)。具体来说:

    • 如果 κ=1.5,意味着晶体管的尺寸(如沟道长度和宽度)缩小为原来的

      1κ=11.50.6667,换句话说,晶体管的尺寸缩小为原来的 23

  2. 电压和电流降低

    电压 V 和电流 I 也按比例缩小,即

    (1)V1κ,I1κ.
  3. 功耗密度不变

功耗 P=V×I 按比例缩小为 1κ2,但由于晶体管数量增加 κ2 倍,整体功耗密度保持不变。

Device or Circuit Parameter Scaling Factor
Device dimension tox,L,W (器件尺寸:氧化层厚度、沟道长度、沟道宽度)1k
Doping concentration Na (掺杂浓度)k
Voltage V (电压)1k
Current I (电流)1k
Capacitance eA/t (电容)1k
Delay time per circuit VC/I (每个电路的延迟时间)1k
Power dissipation per circuit VI (每个电路的功耗)1k2
Power density VI/A (功率密度)1

 

推导

假设晶体管的尺寸(如沟道长度和宽度)按比例缩小,缩小因子为 κκ>1)。因此,单个晶体管的面积 A 缩小为原来的 1κ2

(2)A=Aκ2

根据 Dennard Scaling,电压 V 和电流 I 也按比例缩小:

(3)V=Vκ,I=Iκ

单个晶体管的功耗 P

(4)P=V×I

功耗密度 D 是总功耗与面积的比值。原始功耗密度为:

(5)D=PA

缩小后的功耗密度 D 为:

(6)D=PA=Pk2Ak2=D

Digital Design : Given a functional description and performance, cost, & power constraints, create an implementation using a set of primitives.

Digital systems are implemented as a interconnection of combinational logic and state elements

LEC 02 Design Abstraction

SoC (System on Chip) 是一种将多个电子系统组件集成到单个芯片上的集成电路。通常包含处理器、内存、输入/输出接口、数字信号处理器 (DSP)、图形处理单元 (GPU) 等组件,形成一个完整的系统。

Design Challenges are met by using layers of abstractions.

Implementing Digital Systems

Implementation Alternative

image-20250224171859773

Combinational Logic

• Output a function only of the current inputs (no history).

• Truth-table representation of function. Output is explicitly specified for each input combination.

• In general, CL blocks have more than one output signal, in which case, the truth-table will have multiple output columns.

Peer Instruction: Total number of possible truth tables with 4 inputs and 1 output is : 65,536

My solution:

Each truth table has 16 rows, considering the output column as a set S with 16 elements, i.e., |S|=16, so the number of total possible truth table is

(7)|P(S)|=216=65,536

Sequential Logic

• Output is a function of both the current inputs and the state.

• State represents the memory.

• State is a function of previous inputs.

• In synchronous digital systems, state is updated on each clock tick.

Any synchronous digital circuit can be represented with:

• Combinational Logic Blocks (CL), plus

• State Elements (registers or memories)

• Clock orchestrates sequencing of CL operations

 

LEC 03 Metrics and Verilog I

Digital Abstraction

Suppose the lowest voltage in the system is 0 V, also called ground or GND. The highest voltage in the system comes from the power supply and is usually called VDD.

inverter

The driver produces aLOW (0) output in the range of [0,VOL] or a HIGH (1) output in the range of  [VOH,VDD] . If the receiver gets an input in the range of [0,VIL] , it will consider the input to be LOW. If the receiver gets an input in the range of [VIH,VDD] , it will consider the input to be HIGH. If, for some reason such as noise or faulty components, the receiver’s input should fall in the forbidden zone in[VIL,VIH] , the behavior of the gate is unpredictable.

image-20250114103231006

The noise margin (NM) is the amount of noise that could be added to a worst-case output such that the signal can still be interpreted as a valid input.

(8)NML=VILVOLNMH=VOHVIH

A necessary property of any suitable technology for logic circuits is "Restoration” or “Regeneration”

Circuits need:

Voltage Transfer Characteristic

• Describes the output voltage as a function of the input voltage.

• To choose logic levels -> slope = -1 -> maximize noise margin

image-20250114103906089

Cost

Assuming the wafer is circular, and the die is square-shaped.

(9)Die Yield=# good chips per waferTotal # chips per wafer×100%
(10)Die cost=Wafer costDies per wafer×Die yield
(11)Dies per wafer=wafer areadie areaπ×wafer diameter2×die area
(12)(empirical formula) die yield=(1+defects per unit area×die areaα)αα is approximately 3
(13)variable cost=cost of die+cost of die test+cost of packagingfinal test yield
(14)cost per IC=variable cost per IC+fixed costvolume
(15)cost of die=f(die area)4

 

One simple yield model assumes a uniform density of randomly occurring point defects as the cause of yield loss. If the wafer has a large number of chips (N) and a large number of randomly distributed defects (n), then the probability P(k) that a given chip contains k defects may be approximated by Poisson's distribution:

(16)P(k)=emmkk!

, where m=nN. The yield Y is the probability that a chip has no defects (k=0), so:

(17)Y=em

If D is the chip defect density, then:

(18)D=nNA

where A is the area of each chip. Since m=nN, then m, which is the average number of defects per chip, is:

(19)m=AD

Thus, the yield Y is given by the Poisson Yield Model:

(20)Y=eAD

Performance

Digital Logic Delay

propagtion delay of tp a logic gate : How quickly its output responds to a change at its inputs.

Defined as 50% transition points of the input and output waveforms.

image-20250114105519623

High-to-low transition: tpHL

Low-to-high transition: tpLH

Edge-triggered d-type flip-flop

image-20250115081835568

On the rising edge of the clock, the input d is sampled and transferred to the output q . At all other times, the input d is ignored.

Therefore, the input d should be stable before the rising edge and remain stable for a short amount of time after the edge. These two called: setup time and hold time. Setup time mainly prevent sampling during the input is at rising edge. During this time window, the input shall not change . Once the flip-flop captures the new input, it also takes a small amount time to transfer the new value to output. This delay is called clk-to-q delay.

image-20250115081810038

Digital Logic Timing

Power

Energy (joules J)

Power (watts W)

将电容C 充满到VDD所需要的能量为CVDD2 ,若电压切换的频率为f , 电容充电和放电的周期比例为 α  。

动态功耗为

(21)Pdynamic=αCVDD2f

image-20250114110750094

😭 Silly me

Verilog

See this website.

HDLs originally invented for simulation.

LEC 04 Verilog II

In a Verilog "continuous assignment" (assign lhs = rhs;), the value of the signal on the right side is driven onto the wire on the left side. The assignment is "continuous" because the assignment continues all the time even if the right side's value changes. A continuous assignment is not a one-time event.

 

Operators

Arithmetic Operators

Logical Operators

Relational Operators

Equality Operators

Bitwise Operators

Shift Operators

Other Operators

A note onwire vs. reg: Theleft-hand-side of an assign statement must be a net type (e.g., wire), while the left-hand-side of a procedural assignment (in an always block) must be a variable type (e.g., reg). These types (wire vs. reg) have nothing to do with what hardware is synthesized, and is just syntax left over from Verilog.

WARNING: for and while loops can’t be mapped to hardware! These statements are valid verilog ( and can be simulated ) , but cannot always be mapped to hardware.

Nested if structure leads to “priority logic” structure, with different delays for different inputs. Use case instead.

Generate loops are used to iteratively instantiate modules.

Implicit nets are often a source of hard-to-detect bugs. In Verilog, net-type signals can be implicitly created by an assign statement or by attaching something undeclared to a module port. Implicit nets are always one-bit wires and causes bugs if you had intended to use a vector. Disabling creation of implicit nets can be done using the `default_nettype none directive

Generate loops are used to iteratively instantiate modules. This is useful with arameters or when instantiating large numbers of the same module.

Verilog modules may include parameters in the module definition.

 

 

Sequential

Clocked always blocksalways@(posedge clk) create a blob of combinational logic just like combinational always blocks, butalso creates a set of flip-flops (or "registers") at the output of the blob of combinational logic. Instead of the outputs of the blob of logic being visible immediately, the outputs are visible only immediately after the next clk.

Combinational circuits must have a value assigned to all outputs under all conditions. This usually means you always need else clauses or a default value assigned to the outputs.

Nested if structure leads to “priority logic” structure: with different delays for different inputs, Case version treats all inputs the same.

 

Image 1 Image 2

Adder

Ripple Carry Adder

Ripple carry adder is that the delay for an adder to compute the carry out (from the carry-in, in the worst case) is fairly slow, and the second-stage adder cannot begin computing its carry-out until the first-stage adder has finished. This makes the adder slow.

 

image-20250115075139868

Carry-select Adder

One improvement is a carry-select adder, shown below. The first-stage adder is the same as before, but we duplicate the second-stage adder, one assuming carry-in=0 and one assuming carry-in=1, then using a fast 2-to-1 multiplexer to select which result happened to be correct.

image-20250115075229353

Subtrator

求二进制求补码:把每个 0 都转为 1 以及每个 1 都转为 0,然后对结果加 1。基于以下观察:一个数与其取反表达式的和一定是 1111112,它表示 1。由于 x+x¯=1,因此 x¯+1=x。(用符号 x¯ 表示 x 按位取反。)

 

 

 

LEC 05 Combinational Logic

EDA Playground

image-20250115083351783

Laws of Boolean Algebra

 

Canonical Forms

LEC 06 CL and Finite State Machine

Theorem: Any combinational logic function can be implemented as a networks of logic gates.

Kmap

image-20250127154753478

The binary-reflected Gray code list for n bits can be generated recursively from the list for n − 1 bits by reflecting the list (i.e. listing the entries in reverse order), prefixing the entries in the original list with a binary 0, prefixing the entries in the reflected list with a binary 1, and then concatenating the original list with the reversed list.

Each square differs from an adjacent square by a change in a single variable(Gray code).

(22)Y=ABC+ABC=AB(C+C)=AB

image-20250117075049883

Squares with don't care entry can be treated as 1 or 0 on demand simplify circuit.

Using multiple levels (more than 2) will reduce the cost. Sometimes also delay. Sometimes a tradeoff between cost and delay.

NAND would be used in place of all ANDs and ORs.

image-20250120085653187

No convenient hand methods exist for multi-level logic simplification:

FSM

Can model behavior of any sequential circuit.

The FSM follows exactly one edge per cycle.

Moore:outputs depend only on current state. Both edges of output follow the clock.

Mealy:outputs depend on current state and inputs. Output rises with input rising edge and is asynchronous with the clock, output fails synchronous with next clock edge, the output timing behavior of the Moore machine can be achieved in a Mealy machine by “registering” the Mealy output values.

image-20250121081017575

FSM State Transition Diagram:

image-20250117080545716

Current StateInputsNext State
EatFeedingEat
EatPettingSleep
SleepFeedingSleep
SleepPettingAnnoyed
AnnoyedFeedingEat
AnnoyedPettingAnnoyed

Encode each state

StateEncoding
Eat00
Sleep01
Annoyed10
InputEncoding
Feeding0
Petting1
S1 (Current state[1]) S0(Current state[0]X(Input)S1'(Next State[1])S0'(Next State[0])
00000
00101
01001
01110
10000
10110
(23)S0=S1S0X+S1S0X=S1(S0X+S0X)=S1(S0X)
(24)S1=S1S0X+S1S0X=(S1S0)X
S1S0E(Eye open)M (Mouth open)
0011
0100
1010
(25)E=S1S0+S1S0=S0
(26)M=S1S0

image-20250117082050587

State assignment

States which are closed to each other had better to have close encoding.

An alternative approach is using One-Hot encoding. But can cost more ffs.

 

Verilog Imp

image-20250121081625864

LEC11 FPGA

image-20250119110004813

Basic idea: Two-dimensional array of logic blocks and flip-flops with means for the user to configure.

latch is a basic memory element used to store one bit of information.

Most FPGAs have “SRAM based(latch)” programmability.

CLB

Based on PYNQ-Z1

image-20250119110837065

SLICES

A CLB element contains a pair of slices, and each slice is composed of four 6-input LUTs and eight storage elements. The CLBs are arranged in columns in the 7 series FPGAs.

Multiple LUTs can be concatenated using MUX by using the higher significant bits of the inputs as the selection signals to implement functions with more inputs than a single LUT.

Configurable Interconnect

Each interconnection has a transistor switch. Each switch is controlled by 1-bit configuration register (bitstream).

image-20250119120233608

image-20250119112451769

image-20250119122032291

NetList is typically a 3-D graph, tools have to figure out the optimal placement and wiring on the fpga board.

Diverse Resources on FPGA

 

LEC 13 COMS

PN Junction

See this video.

A silicon atom has four electrons in its outermost shell, which can form covalent bonds with four adjacent silicon atoms. Since these electrons are bound by covalent bonds and cannot move freely, pure silicon does not conduct electricity. This pure form of silicon is known as an intrinsic semiconductor.

When phosphorus (P) is doped into silicon, phosphorus has five electrons in its outermost shell. Four of these electrons form covalent bonds with silicon, while the extra electron becomes a free electron. Since the charge carriers are primarily free electrons, this doped semiconductor is known as an N-type semiconductor.

image-20250121082351252

When boron (B) is doped into silicon, boron has three electrons in its outermost shell. When a boron atom replaces a silicon atom in the crystal lattice, it can only form three complete covalent bonds with the surrounding four silicon atoms, leaving the fourth bond with a vacancy (i.e., a hole) due to the lack of an electron. This hole can easily attract electrons from neighboring atoms to fill it, hence it is referred to as a hole.

image-20250121082703026

By doping both ends of the same silicon wafer, one end forms an N-type semiconductor, and the other end forms a P-type semiconductor. Before contact, both N-type and P-type semiconductors are electrically neutral (number of outer electrons = number of protons). After contact, due to the difference in electron concentration, electrons spontaneously diffuse from the N-type semiconductor to the P-type semiconductor. This causes the N-type semiconductor to become positively charged due to the loss of electrons, while the P-type semiconductor becomes negatively charged due to the gain of electrons, resulting in an electric field directed from the N-type to the P-type near the contact surface.

Under the influence of this electric field, some electrons move toward the N-type region, a motion known as drift. Eventually, the drift motion and the diffusion motion reach a dynamic equilibrium, and this region is called the PN junction.

image-20250121085832313

When an external electric field is applied to the semiconductor, directed from the P-side to the N-side, as a result, the PN junction in the N-region narrows due to the replenishment of electrons, while the PN junction in the P-region narrows as electrons are removed by the electric field, leading to an increase in holes. Eventually, the PN junction disappears. Free electrons from the N-region pass through the holes in the P-region and eventually return to the power source. The PN junction only conducts when the external electric field strength exceeds the built-in electric field of the PN junction, with a typical threshold voltage of 0.7V.

image-20250121090134888

When the power supply is reverse-connected, the remaining electrons in the N-region move toward the positive terminal of the power supply, while the holes in the P-region are filled by electrons from the negative terminal, resulting in a reduction of charge carriers and preventing conduction. As the internal electric field strengthens, a small number of electrons still undergo drift motion. When the external voltage increases further, avalanche breakdown occurs. Although avalanche breakdown itself does not directly damage the semiconductor, the drifting electrons collide with covalent bonds, generating new holes and free electrons. This leads to a sharp increase in current and produces significant heat, potentially causing the semiconductor to burn out.。

image-20250121090728455

MOSFET

See this video.

image-20250121092216064

image-20250121094419116

As shown in the figure, a single crystal silicon wafer is doped to form two N-type(Red) semiconductor and a P-type (Yellow)semiconductor. At the interface between the N-type and P-type regions, a PN junction is formed, with its built-in electric field directed from the N-region to the P-region.

A capacitor is added and connected to the positive and negative terminals of the power supply. The capacitor generates an electric field, under the influence of which a large number of electrons move upward, filling some of the holes. The remaining free electrons cause a portion of the P-type semiconductor to transform into an N-type semiconductor, enabling conduction. The region that transitions from P-type to N-type is called the N-channel.

image-20250123085928020

without capacitor, one of the pn junction will be widen up(reversed biased), cannot conduct.

image-20250121201007865

image-20250121201208545

Because build in electrical feild in pn junction will canceling some gate votage.

image-20250121201254101

image-20250121201527801

If both the pull-up and pull-down networks were ON simultaneously, a short circuit would exist between VDD and GND. The output of the gate might be in the forbidden zone and the transistors would consume large amounts of power, possibly enough to burn out. On the other hand, if both the pull-up and pull-down networks were OFF simultaneously, the output would be connected to neither VDD nor GND. We say that the output floats.

How a cpu is made.

image-20250123162051794

MOS Transistor as a Resistive Switch

image-20250123162446502

VGS is the voltage between the Gate (G) and the Source (S) of a MOS transistor, defined as VGS=VGVS. It is a key parameter that determines whether the MOS transistor is in the ON or OFF state.

Thumb

image-20250123164409361