Time : 2021 spring (second half semester of sophomore)
more info in lec/*.pdf
subject | teacher |
---|---|
超大型積體電路電腦輔助設計概論 | 邱瀝毅 |
more info in doc/*.docx
- OS
CenterOS v6
- Software
名稱 | 功能 |
---|---|
NC Verilog | 對HDL模擬真實電路並產生波型 |
nWave in Verdi | 觀測波型*.fsdb |
Superlint | 檢查不符的格式,進行除錯 |
Design Vision | 電路合成 |
HSPICE | 類比電路模擬 |
Laker | 佈局編輯器 |
Calibre | 佈局驗證DRC、LVS、PEX |
Mobaxterm | 支援X11, sftp, ssh等傳輸協議,使遠端能連線工作站 |
- In lab6, provide
makefile
Description | Command |
---|---|
Run RTL Convolution simulation | make rtl0 |
Run RTL Pooling simulation | make rtl1 |
Run RTL simulation | make rtl_full |
Run post-synthesis simulation | make syn_full |
Dump waveform (no array) | make {rtlX, syn_full} FSDB=1 |
Dump waveform (with array) | make {rtlX, syn_full} FSDB=2 |
Open nWave without file pollution | make nWave |
Open Superlint without file pollution | make superlint |
Open DesignVision without file pollution | make dv |
Synthesize your RTL code | make synthesize |
Check correctness of your file structure | make check |
Compress your homework to tar format | make tar |
Count the total lines of your code | wc –l ./src/* ./include/* |
- compile
ncverilog top_module.v
- pre-simulate
ncverilog top_module_tb.v +define+FSDB access+r
-
synthesis
- open Design Vision
dv &
- change hierarchy
current_design top
- read design constraints file
source DC.sdc
Compile Design
->OK
- generate report
report_timing report_area report_power
- generate SDF file
write_sdf version 2.1 context verilog load_delay net too_module_syn.sdf
-
post-simulate
ncverilog top_module_tb.v +define+FSDB+syn access+r
-
Superlint
- open
jg -superlint
File
->TclScripts
->Source
- Count the number of total lines
wc –l filename
-
check file hierarchy
sh check.sh
4-to-2 priority encoder in gate-level
5-bit add/sub ripple carry adder in hierarchical coding
- call the FullAdder we design in Lab2
include "File_Path/Filename"
8-to-1 multiplexer and testbench that needs to test all selected inputs and print results
- operations
alu_op | operation | description |
---|---|---|
01000 | NOT | ~src1 |
01001 | NAND | ~(src1&src2) |
01010 | MAX | max{sec1, src2} |
01011 | MIN | min{sec1, src2} |
01100 | ABS | |src| |
01101 | SLTS | (src1<src2)?1:0 |
01110 | SLL | src1<<src2 |
01111 | ROTL | src1 rotate left by "src2 bits" |
10000 | ASSU | unsigned(src1+src2) |
10001 | SRLU | unsigned(src1>>src2) |
- Port
signal | type | bits | description |
---|---|---|---|
alu_enable | input | 1 | 0->close;1->open |
alu_op | input | 5 | opcode select which op to be execued |
src1 | input | 32 | ALU source 1 |
src2 | input | 32 | ALU source 2 |
alu_out | output | 32 | ALU result |
alu_overflow | output | 1 | 0->no;1->yes |
conversion formula : y = 0.3125r + 0.5625g + 0.125b
input | output |
---|---|
24 bit RGB color values | 8 bit grayscale values |
模擬 64x32
register file寫入、存取、讀出的狀況。
分為三個階段
階段 | 描述 |
---|---|
Phase0 | 使用者投錢,機器並把錢先存在money_temp |
Phase1 | 選擇飲料並把money_temp 減去beverage的商品價格 |
Phase2 | 找錢change = money_temp ,並把finish 拉高,讓使用者知道交易已完成。此部分用conbinatioal寫,要與sequential電路分開寫 |
沒修相關課程,大概有去看神經網路科普影片。但這題講白了這題就是把兩個矩陣的個別元素相乘,而對我來說難點在負數相乘要先做sign extension
,而我的解題思路為
- 個別輸入連到array上方便一次用
for loop
處理,有4種輸入的情況w_w
和if_w
皆為1,個別為1與都為0 - 用
for loop
把array每一項個別處理 - 把結果跟0位元cascade到17位,再做
sign extension
- 最後再乘得結果
Rectified Linear Unit函數
映射(線性整流函數,活化函數主要目的是用來增加類神經網路模型的非線性)
CurrentState | NS (din = 0) | NS (din = 1) | qout |
---|---|---|---|
S0 = 00 | S2 | S1 | 1 |
S0 = 01 | S1 | S0 | 0 |
S0 = 10 | S3 | S2 | 0 |
S0 = 11 | S3 | S1 | 1 |
Current State | Next State, output | |
---|---|---|
X | din = 0 | din = 1 |
S0 = 00 | S1,0 | S2,0 |
S1 = 01 | S1,1 | S2,0 |
S2 = 11 | S2,0 | S0,1 |
- a
65536x24
bits random access memory - a
16384x24
bits read only memory
- port
signal | type | bits | description |
---|---|---|---|
clk | input | 1 | clock |
rst | input | 1 | reset |
clear | input | 1 | Set all register to 0 |
w_w | input | 1 | Write weight enable. When w_w is high, write w_in. |
if_w | input | 1 | Write input feature map enable. When if_w is high, write if_in. |
w_in | input | 16 | Input weight data |
if_in | input | 16 | Input feature map data |
out | output | 34 | Output data |
- Shift register
a cascade of flip flops.The output of each flip flop is connected to the input of the next flip flop.The output of each flip flop is connected to the input of the next flip flop.
-
spec
The system will be able to change RGB pictures to grayscale pictures -
function
- reads pixel from the input memory.
- compute new value of pixels
- writes the new value pixel back to the output memory.
- repeats the process step (1)-(3) until the last pixel of output memory is updated.
- flags
done
when step (4) is completed
-
control signal
signal | function |
---|---|
en_in_mem |
enable input memory |
in_mem_addr |
input memory address |
en_out_mem |
enable output memory |
out_mem_read |
output memory read enable |
out_mem_write |
output memory write enable |
out_mem_addr |
output memory address |
done |
Stop the process |
Original Image | Results |
---|---|
-
Waveform 第一張圖為所有執行的波形圖,第二張為最一開始從
rst =1
,使in_mem_addr
,out_mem_addr
初始化從0開始加,en_in_mem
與en_out_mem
、out_mem_write
隨clk
交替拉高,進入讀入(S_in_mem
)與讀出(S_out_mem
)的狀態迴圈,一直到out_addr
到32'd479999
時,就是把整張480000像素的圖片跑完就進入done = 1
卡在S_done
的單一狀態裡面,符合上面設計的state diagram的大致流程。
Timing(slack) | Area(total cell area) | Power(total) |
---|---|---|
5.49 |
3839.52 |
0.1058mW |
integrate all components that you have learned so far to form a simple convolution system.
- reads pixel from the
IFM ROM
to convolution block and consider the padding problem. - computes new value of pixels.
- writes the convolution result back to the
CONV RAM
. - repeats the process step (1)-(3) until the last pixel of
CONV RAM
is updated. - reads pixel from the
CONV RAM
to pooling block. - computes new value of pixels.
- writes the new value pixel back to the
POOL RAM
. - repeats the process step (5)-(7) until the last pixel of
POOL RAM
is updated. - flags
done
when step (8) is completed.
signal | function |
---|---|
ROM_IF_OE |
read data from input feature map ROM |
ROM_W_OE |
read data from weight ROM |
RAM_CONV_WE |
store the data to CONV RAM |
RAM_CONV_OE |
read data from CONV RAM |
RAM_POOL_WE |
store the data to POOL RAM |
RAM_POOL_OE |
read data from POOL RAM |
done |
stop the process |
- Do convolution with a 3\times3 weight map to the penguin.
- Consider the boundary condition to handle the padding problem.
- Do maximum pooling to the convolution result.
- Synthesize your
system.v
with following constraint:
Clock period | no more than 20 ns |
---|---|
Synthesized Verilog file | system_syn.v |
Timing constraint file | system_syn.sdf |
- READ_9
- 一般情況
Cycle1、4、7 pad_en打開 - 邊界情況
row == 18'b0
額外Cycle2、3打開row == 18'b255
額外Cycle8、9打開
- 一般情況
- READ_C
- 一般情況
pad_en
皆關閉 - 邊界情況
column == 18'b255
Cycle1、2、3pad_en
皆打開row == 18'd0
Cycle1打開row == 18'd255
Cycle3打開
- 一般情況
- terminal
- image
Original | Result |
---|---|
cs[2:0]=READ_W
cs[2:0]=READ_9
讀9筆資料,但因為地址都要早一個Cycle給,所以如上圖count[3:0]
從0加到9,共花了10個Cycle去完成READ_9
這個state。Cycle1、2、3、4、7 pad_en
拉高,此時不用管地址,因為都是輸出0,而Cycle5、6、8、9,如上圖地址分別是0、1、256、257。ROM_IF_OE
拉高讀ROM裡面原始企鵝的資料;而RAM_CONV_WE
拉高把做完Convolution運算結果寫入RAM_CONV
保存。
cs[2:0]=READ_C
如上述cs[2:0]=READ_9
的行為,差別是指需要讀3筆資料而已,如上圖count[3:0]
從0加到3,所以花了3+1=4個Cycle去完成。大部分的情況都是這樣,依序READ_C
、WRITE_C
交替。
在column == 18'd255
時padding
全部拉高,此時相對位置在Input Feature Map的右下角,接下來跳到狀態READ_9
,row = row+1,而column歸0,從零開始數,如此不斷循環。
直到address == 18'd65535
時,第一階段Convolution完成,跳至下一個state-READ_P
cs[2:0]=READ_P
一樣地址要早一個cycle給,pool_en
拉高時,允許寫入 Pooling.v
,如果pool_en
拉低,我的設計就是維持Pooling.v
的值。RAM_CONV_OE
拉高為
把前一個做完Convoulution保存在RAM_CONV
的data讀進來;而RAM_POOL_WE
拉高則把結果寫入RAM_POOL
保存。
在column2 == 18'd254
時row2 = row2+2
,而column歸0,從零開始數,如此不斷循環。
當 address2 == 18'd16383
時,第二階段Pooling
完成,DONE
拉高並卡在無窮迴圈之中,RTL code全部一、二階段執行流程結束。
Coverage : 99%
(2error in system.v
)
能解完的錯誤已解完,剩下兩個錯誤在system.v
檔裡面。
錯誤代碼 | 說明 |
---|---|
INP_NO_USE |
RAM_POOL_Q 沒有接線,因為該線功能為將RAM_POOL 傳data到system,這個功能在這次design沒有用到 |
RXT_XC_LDTH |
猜測為rst 訊號接線導致 |
Synthesizable clock period | Simulation time | Cell Area | Power |
---|---|---|---|
10ns (TA default) |
4275325ns |
84011 |
1.3264mW |
設計一個inverter、nand、nor電路
電路 | 波型驗證 |
---|---|
inverter |
訊號做0變1、1變0 |
NAND |
先做AND再做NOT |
NOR |
先做OR再做NOT |
這堂課前半段是寫Verilog
做數位電路模擬合成,用到的基本觀念有數位邏輯設計、計算機組織與unix-like
環境的基本使用;後半段是layout
,用到的基本觀念有電子學一二,但由於新冠疫情在本土延燒,後半的課只有上到lab9,在畫完inverter、nand、nor的layout
後就幾乎結束了,有些可惜,不過大二下課業繁重。也給了我喘息的時間去讀電子學等其他科目。
比較重要或有趣的電路有
- lab5第五部分的
grayscale conversion system
- lab6也就是final froject的
simple convolution system
讓我學到要如何把演算法轉換成RTL code
,尤其是lab6的邊界條件這部分是主要的困難點,再加上發現助教給的testbench
似乎有把從ROM
讀入的data調晚1個cycle,這些東西花了我很多的時間去完成,不過我也學到了很多東西,有了一點由自己去design的感覺。
其實這次作業很多部分是由助教代勞,像是linux環境下的shell script
, makefile
、由高階語言生成的golden data
與testbench
驗證以及那些system的block
與彼此之間的port
接線,我們學生完成的是block內部電路的FSM實現。
上完這堂課我覺得我應該精進自己的coding能力與對linux的掌握度,希望能達到真正了解整個設計流程的designer,別人開好文字描述的spec
,自己從無到有、全部自己弄的獨立感。