Introduction to VLSI CAD

Time : 2021 spring (second half semester of sophomore)

lecture

more info in lec/*.pdf

subject	teacher
超大型積體電路電腦輔助設計概論	邱瀝毅

Report

more info in doc/*.docx

lab2
lab3
lab4
lab5
lab6
lab8

Environment

OS

CenterOS v6

Software

名稱	功能
NC Verilog	對HDL模擬真實電路並產生波型
nWave in Verdi	觀測波型`*.fsdb`
Superlint	檢查不符的格式，進行除錯
Design Vision	電路合成
HSPICE	類比電路模擬
Laker	佈局編輯器
Calibre	佈局驗證DRC、LVS、PEX
Mobaxterm	支援X11, sftp, ssh等傳輸協議，使遠端能連線工作站

How to run

In lab6, provide makefile

Description	Command
Run RTL Convolution simulation	`make rtl0`
Run RTL Pooling simulation	`make rtl1`
Run RTL simulation	`make rtl_full`
Run post-synthesis simulation	`make syn_full`
Dump waveform (no array)	`make {rtlX, syn_full} FSDB=1`
Dump waveform (with array)	`make {rtlX, syn_full} FSDB=2`
Open nWave without file pollution	`make nWave`
Open Superlint without file pollution	`make superlint`
Open DesignVision without file pollution	`make dv`
Synthesize your RTL code	`make synthesize`
Check correctness of your file structure	`make check`
Compress your homework to tar format	`make tar`
Count the total lines of your code	`wc –l ./src/* ./include/*`

compile

ncverilog top_module.v

pre-simulate

ncverilog top_module_tb.v +define+FSDB access+r

synthesis

open Design Vision

dv &

change hierarchy

current_design top

read design constraints file

source DC.sdc

Compile Design-> OK
generate report

report_timing
report_area
report_power

generate SDF file

write_sdf
version 2.1 context verilog load_delay net too_module_syn.sdf

post-simulate

ncverilog top_module_tb.v +define+FSDB+syn access+r

Superlint
1. open
```
jg -superlint
```
1. File -> TclScripts -> Source
2. Count the number of total lines
```
wc –l filename
```
check file hierarchy

sh check.sh

lab2

Encoder

4-to-2 priority encoder in gate-level

Full Adder

full adder in gate level

Ripple Carry Adder

5-bit add/sub ripple carry adder in hierarchical coding

call the FullAdder we design in Lab2

include "File_Path/Filename"

lab3

Multiplexer

8-to-1 multiplexer and testbench that needs to test all selected inputs and print results

Arithmetic Logic Unit

operations

alu_op	operation	description
01000	NOT	~src1
01001	NAND	~(src1&src2)
01010	MAX	max{sec1, src2}
01011	MIN	min{sec1, src2}
01100	ABS	\|src\|
01101	SLTS	(src1<src2)?1:0
01110	SLL	src1<<src2
01111	ROTL	src1 rotate left by "src2 bits"
10000	ASSU	unsigned(src1+src2)
10001	SRLU	unsigned(src1>>src2)

Port

signal	type	bits	description
alu_enable	input	1	0->close;1->open
alu_op	input	5	opcode select which op to be execued
src1	input	32	ALU source 1
src2	input	32	ALU source 2
alu_out	output	32	ALU result
alu_overflow	output	1	0->no;1->yes

Grayscale Conversion

conversion formula : y = 0.3125r + 0.5625g + 0.125b

input	output
24 bit RGB color values	8 bit grayscale values

lab4

Register File

模擬 64x32 register file寫入、存取、讀出的狀況。

Vending Machine

分為三個階段

階段	描述
Phase0	使用者投錢，機器並把錢先存在`money_temp`
Phase1	選擇飲料並把`money_temp`減去beverage的商品價格
Phase2	找錢`change = money_temp`，並把`finish`拉高，讓使用者知道交易已完成。此部分用conbinatioal寫，要與sequential電路分開寫

Convolution and activation function

沒修相關課程，大概有去看神經網路科普影片。但這題講白了這題就是把兩個矩陣的個別元素相乘，而對我來說難點在負數相乘要先做sign extension，而我的解題思路為

個別輸入連到array上方便一次用for loop處理，有4種輸入的情況w_w和if_w皆為1，個別為1與都為0
用for loop把array每一項個別處理
把結果跟0位元cascade到17位，再做sign extension
最後再乘得結果
Rectified Linear Unit函數映射(線性整流函數，活化函數主要目的是用來增加類神經網路模型的非線性)

lab5

Moore Machine

CurrentState	NS (din = 0)	NS (din = 1)	qout
S0 = 00	S2	S1	1
S0 = 01	S1	S0	0
S0 = 10	S3	S2	0
S0 = 11	S3	S1	1

Mealy Machine

Current State	Next State, output
X	din = 0	din = 1
S0 = 00	S1,0	S2,0
S1 = 01	S1,1	S2,0
S2 = 11	S2,0	S0,1

Memory

a 65536x24 bits random access memory
a 16384x24 bits read only memory

MAC using Shift Register

port

signal	type	bits	description
clk	input	1	clock
rst	input	1	reset
clear	input	1	Set all register to 0
w_w	input	1	Write weight enable. When w_w is high, write w_in.
if_w	input	1	Write input feature map enable. When if_w is high, write if_in.
w_in	input	16	Input weight data
if_in	input	16	Input feature map data
out	output	34	Output data

Shift register
a cascade of flip flops.The output of each flip flop is connected to the input of the next flip flop.The output of each flip flop is connected to the input of the next flip flop.

Grayscale Conversion System

spec
The system will be able to change RGB pictures to grayscale pictures
block diagram of system
function
1. reads pixel from the input memory.
2. compute new value of pixels
3. writes the new value pixel back to the output memory.
4. repeats the process step (1)-(3) until the last pixel of output memory is updated.
5. flags done when step (4) is completed
control signal

signal	function
`en_in_mem`	enable input memory
`in_mem_addr`	input memory address
`en_out_mem`	enable output memory
`out_mem_read`	output memory read enable
`out_mem_write`	output memory write enable
`out_mem_addr`	output memory address
`done`	Stop the process

state diagram
result

Original Image	Results

Waveform 第一張圖為所有執行的波形圖，第二張為最一開始從rst =1，使in_mem_addr, out_mem_addr初始化從0開始加，en_in_mem與en_out_mem、out_mem_write隨clk交替拉高，進入讀入(S_in_mem)與讀出(S_out_mem)的狀態迴圈，一直到out_addr到32'd479999時，就是把整張480000像素的圖片跑完就進入done = 1卡在S_done的單一狀態裡面，符合上面設計的state diagram的大致流程。
SuperLint Coverage Coverage:100% (No any error or warning)
Synthesis Report

Timing(slack)	Area(total cell area)	Power(total)
`5.49`	`3839.52`	`0.1058mW`

Waveform after Synthesis

lab6

spec

integrate all components that you have learned so far to form a simple convolution system.

block diagram of system

function

reads pixel from the IFM ROM to convolution block and consider the padding problem.
computes new value of pixels.
writes the convolution result back to the CONV RAM.
repeats the process step (1)-(3) until the last pixel of CONV RAM is updated.
reads pixel from the CONV RAM to pooling block.
computes new value of pixels.
writes the new value pixel back to the POOL RAM.
repeats the process step (5)-(7) until the last pixel of POOL RAM is updated.
flags done when step (8) is completed.

control signal

signal	function
`ROM_IF_OE`	read data from `input feature map ROM`
`ROM_W_OE`	read data from `weight ROM`
`RAM_CONV_WE`	store the data to `CONV RAM`
`RAM_CONV_OE`	read data from `CONV RAM`
`RAM_POOL_WE`	store the data to `POOL RAM`
`RAM_POOL_OE`	read data from `POOL RAM`
`done`	stop the process

design rules

Do convolution with a 3\times3 weight map to the penguin.
Consider the boundary condition to handle the padding problem.
Do maximum pooling to the convolution result.
Synthesize your system.v with following constraint:

Clock period	no more than 20 ns
Synthesized Verilog file	`system_syn.v`
Timing constraint file	`system_syn.sdf`

state diagram

by myself (illustrator)
Verdi

How to handle the boundary condition

READ_9
- 一般情況
  Cycle1、4、7 pad_en打開
- 邊界情況
  1. row == 18'b0 額外Cycle2、3打開
  2. row == 18'b255 額外Cycle8、9打開
READ_C
- 一般情況
  pad_en皆關閉
- 邊界情況
  1. column == 18'b255 Cycle1、2、3 pad_en皆打開
  2. row == 18'd0 Cycle1打開
  3. row == 18'd255 Cycle3打開

simulation result

terminal

image

Original	Result

Waveform

cs[2:0]=READ_W

cs[2:0]=READ_9

讀9筆資料，但因為地址都要早一個Cycle給，所以如上圖count[3:0]從0加到9，共花了10個Cycle去完成READ_9這個state。Cycle1、2、3、4、7 pad_en拉高，此時不用管地址，因為都是輸出0，而Cycle5、6、8、9，如上圖地址分別是0、1、256、257。ROM_IF_OE拉高讀ROM裡面原始企鵝的資料；而RAM_CONV_WE拉高把做完Convolution運算結果寫入RAM_CONV保存。

cs[2:0]=READ_C

如上述cs[2:0]=READ_9的行為，差別是指需要讀3筆資料而已，如上圖count[3:0]從0加到3，所以花了3+1=4個Cycle去完成。大部分的情況都是這樣，依序READ_C、WRITE_C交替。
在column == 18'd255時padding全部拉高，此時相對位置在Input Feature Map的右下角，接下來跳到狀態READ_9，row = row+1，而column歸0，從零開始數，如此不斷循環。
直到address == 18'd65535時，第一階段Convolution完成，跳至下一個state-READ_P

cs[2:0]=READ_P

一樣地址要早一個cycle給，pool_en拉高時，允許寫入 Pooling.v，如果pool_en拉低，我的設計就是維持Pooling.v的值。RAM_CONV_OE拉高為把前一個做完Convoulution保存在RAM_CONV的data讀進來；而RAM_POOL_WE拉高則把結果寫入RAM_POOL保存。
在column2 == 18'd254時row2 = row2+2，而column歸0，從零開始數，如此不斷循環。
當 address2 == 18'd16383時，第二階段Pooling完成，DONE拉高並卡在無窮迴圈之中，RTL code全部一、二階段執行流程結束。

SuperLint Coverage

Coverage : 99% (2error in system.v) 能解完的錯誤已解完，剩下兩個錯誤在system.v檔裡面。

錯誤代碼	說明
`INP_NO_USE`	`RAM_POOL_Q`沒有接線，因為該線功能為將`RAM_POOL`傳data到system，這個功能在這次design沒有用到
`RXT_XC_LDTH`	猜測為`rst`訊號接線導致

Synthesis Report

Synthesizable clock period	Simulation time	Cell Area	Power
`10ns` (TA default)	`4275325ns`	`84011`	`1.3264mW`

Waveform after Synthesis

lab8

設計一個inverter、nand、nor電路

編譯成功
WaveView中的波型

電路	波型驗證
`inverter`	訊號做0變1、1變0
`NAND`	先做AND再做NOT
`NOR`	先做OR再做NOT

心得

這堂課前半段是寫Verilog做數位電路模擬合成，用到的基本觀念有數位邏輯設計、計算機組織與unix-like環境的基本使用；後半段是layout，用到的基本觀念有電子學一二，但由於新冠疫情在本土延燒，後半的課只有上到lab9，在畫完inverter、nand、nor的layout後就幾乎結束了，有些可惜，不過大二下課業繁重。也給了我喘息的時間去讀電子學等其他科目。
比較重要或有趣的電路有

lab5第五部分的grayscale conversion system
lab6也就是final froject的simple convolution system

讓我學到要如何把演算法轉換成RTL code，尤其是lab6的邊界條件這部分是主要的困難點，再加上發現助教給的testbench似乎有把從ROM讀入的data調晚1個cycle，這些東西花了我很多的時間去完成，不過我也學到了很多東西，有了一點由自己去design的感覺。

其實這次作業很多部分是由助教代勞，像是linux環境下的shell script, makefile、由高階語言生成的golden data與testbench驗證以及那些system的block與彼此之間的port接線，我們學生完成的是block內部電路的FSM實現。

上完這堂課我覺得我應該精進自己的coding能力與對linux的掌握度，希望能達到真正了解整個設計流程的designer，別人開好文字描述的spec，自己從無到有、全部自己弄的獨立感。

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
doc		doc
img		img
lab2		lab2
lab3		lab3
lab4		lab4
lab5		lab5
lab6		lab6
lab8		lab8
lec		lec
.gitignore		.gitignore
readme.md		readme.md

HsuChiChen/ncku-intro-vlsi

Folders and files

Latest commit

History

Repository files navigation