Previous journal: | Next journal: |
---|---|
0182-2023-12-03.md | 0184-2023-12-05.md |
- Try registering LAs to avoid hold violations.
- Put hmax and vmax into gpout select options, or gpout alts?
Results for TEA:
- Time: 16:30
- GDS size: 26MB (26821532)
- Details from
du -h results/final/gds/*.gds && du -b results/final/gds/*.gds && olsum full-summary top_raybox_zero_fsm --run
:- WNS: 0
- TNS: 0
- Dimensions: 1024x1023
- CoreArea_um^2: 1,049,066.3232
- Util: 30.98%
- Antennas:
reports/signoff/46-antenna_violators.rpt
- 1: 101%
reports/signoff/37-sta-rcx_nom/multi_corner_sta.checks.rpt
:- Fanout: 1586 vios, most ~8/4, but some up to 46/4 (on clkbufs)!
- Fastest max: 17.29
- Fastest slew: 0
- Fastest cap: 0
- Slowest max: -22.57
- Slowest slew: 19 vios, worst 5.46/3.00
- Slowest cap: 0
- Typical max: 8.27
- Typical slew: 8 vios, worst 3.13/3.00
- Typical cap: 0
Results for UPW:
- Time: 3:26
- GDS size: 28MB (28511532)
- Details from
du -h results/final/gds/*.gds && du -b results/final/gds/*.gds && olsum full-summary user_project_wrapper --run
:- WNS: -10.73
- TNS: -1666.14
- Antennas:
reports/signoff/30-antenna_violators.rpt
- 22 pin, 22 net
- Worst is 343%
- 12 exceed 210% -- NOTE: MOST are on IOs
reports/signoff/20-sta-rcx_nom/multi_corner_sta.checks.rpt
:- Fastest max: -2.53
- Fastest slew: 8, worst is 6.16/2.60
- Fastest cap: 4, worst is 0.58/0.23
- Slowest max: -41.42
- Slowest slew: 21, worst is 15.82/3.0
- Slowest cap: 4, worst is 0.58/0.25
- Typical max: -10.58
- Typical slew: 8, worst is 9.29/3.0
- Typical cap: 4, worst is 0.58/0.24
reports/signoff/20-sta-rcx_nom/multi_corner_sta.min.rpt
(HOLD):- Typical min: -2.86 -- HOLD VIO
- NOTE: Hold violations at UPW stage seem to be confined to LAs:
- At Typical, there is exactly 1 for each of the 13 LAs.
- They all range from -2.57 to -2.86.
- Does this imply something? e.g. that the actual registering of
- Can we do a test using very sparse placement of the LA/reg/gpout mux network that proves that this issue will occur even without the rest of the design logic?
- Is it possible to make these existing vios much WORSE to see if we can better-understand what influences them? e.g. make the reg chain much longer?
- What if we try registering LAs on the negedge of clk instead of pos?
- Is the violation where they get registered or is it downstream?
- Is the problem partly to do with wire lengths/placement/capacitance?
- Can the issue be fixed with deliberate delay lines? They'd probably have to be pretty long delays! But what about a simple inverter at each end?
Hold violations list at UPW (last is MET), sorted by LA index (not by slack):
Startpoint: la_data_in[0] Endpoint: top_raybox_zero_fsm/_27285_ -2.57 slack (VIOLATED)
Startpoint: la_data_in[1] Endpoint: top_raybox_zero_fsm/_27261_ -2.86 slack (VIOLATED)
Startpoint: la_data_in[2] Endpoint: top_raybox_zero_fsm/_27263_ -2.84 slack (VIOLATED)
Startpoint: la_data_in[3] Endpoint: top_raybox_zero_fsm/_27265_ -2.79 slack (VIOLATED)
Startpoint: la_data_in[4] Endpoint: top_raybox_zero_fsm/_27267_ -2.84 slack (VIOLATED)
Startpoint: la_data_in[5] Endpoint: top_raybox_zero_fsm/_27269_ -2.80 slack (VIOLATED)
Startpoint: la_data_in[6] Endpoint: top_raybox_zero_fsm/_27271_ -2.78 slack (VIOLATED)
Startpoint: la_data_in[7] Endpoint: top_raybox_zero_fsm/_27273_ -2.64 slack (VIOLATED)
Startpoint: la_data_in[8] Endpoint: top_raybox_zero_fsm/_27275_ -2.79 slack (VIOLATED)
Startpoint: la_data_in[9] Endpoint: top_raybox_zero_fsm/_27277_ -2.73 slack (VIOLATED)
Startpoint: la_data_in[10] Endpoint: top_raybox_zero_fsm/_27279_ -2.83 slack (VIOLATED)
Startpoint: la_data_in[11] Endpoint: top_raybox_zero_fsm/_27281_ -2.79 slack (VIOLATED)
Startpoint: la_data_in[12] Endpoint: top_raybox_zero_fsm/_27283_ -2.81 slack (VIOLATED)
Startpoint: wb_clk_i Endpoint: top_raybox_zero_fsm/_27257_ 0.13 slack (MET)
- Quick test to make reg chain for LAs much deeper: Go from 2 to 32. What effect will it have on hold violations?
Results for TEA:
- Time: 17:23
- GDS size: 27MB (28126050)
- Details from
du -h results/final/gds/*.gds && du -b results/final/gds/*.gds && olsum full-summary top_raybox_zero_fsm --run
:- WNS: 0
- TNS: 0
- Dimensions: 1064x1062
- CoreArea_um^2: 1,130,308.48
- Util: 30.96%
- Antennas:
reports/signoff/46-antenna_violators.rpt
- None
reports/signoff/37-sta-rcx_nom/multi_corner_sta.checks.rpt
:- Fanout: 1717 vios, most ~8/4, but some up to 32/4 (on clkbufs)!
- Fastest max: 17.09
- Fastest slew: 0
- Fastest cap: 0
- Slowest max: -21.86
- Slowest slew: 8 vios, worst 5.47/3.00
- Slowest cap: 0
- Typical max: 8.49
- Typical slew: 8 vios, worst 3.12XX/3.00
- Typical cap: 0
Results for UPW:
- Time: 3:26
- GDS size: 28MB (28511532)
- Details from
du -h results/final/gds/*.gds && du -b results/final/gds/*.gds && olsum full-summary user_project_wrapper --run
:- WNS: -10.73 -- UNCHANGED from Test 7.
- TNS: -1666.14 -- UNCHANGED from Test 7.
- Antennas:
reports/signoff/30-antenna_violators.rpt
- 22
- Worst is 343%
- 12 exceed 210XX% -- Basically same as above.
reports/signoff/20-sta-rcx_nom/multi_corner_sta.checks.rpt
:- Fastest max: -2.53
- Fastest slew: 8, worst is 6.16/2.60
- Fastest cap: 4, worst is 0.58/0.23
- Slowest max: -41.42
- Slowest slew: 21, worst is 15.82/3.0
- Slowest cap: 4, worst is 0.58/0.25
- Typical max: -10.58
- Typical slew: 8, worst is 9.29/3.0
- Typical cap: 4, worst is 0.58/0.24
- All same as Test 7
reports/signoff/20-sta-rcx_nom/multi_corner_sta.min.rpt
(HOLD):- Typical min: -2.86XX -- HOLD VIO
- NOTE: Hold violations at UPW stage seem to be confined to LAs:
- At Typical, there is exactly 1 for each of the 13 LAs.
- They all range from -2.57XX to -2.86XX.
UPW hold violations are unchanged from Test 7:
Startpoint: la_data_in[0] Endpoint: top_raybox_zero_fsm/_27285_ -2.57 slack (VIOLATED)
Startpoint: la_data_in[1] Endpoint: top_raybox_zero_fsm/_27261_ -2.86 slack (VIOLATED)
Startpoint: la_data_in[2] Endpoint: top_raybox_zero_fsm/_27263_ -2.84 slack (VIOLATED)
Startpoint: la_data_in[3] Endpoint: top_raybox_zero_fsm/_27265_ -2.79 slack (VIOLATED)
Startpoint: la_data_in[4] Endpoint: top_raybox_zero_fsm/_27267_ -2.84 slack (VIOLATED)
Startpoint: la_data_in[5] Endpoint: top_raybox_zero_fsm/_27269_ -2.80 slack (VIOLATED)
Startpoint: la_data_in[6] Endpoint: top_raybox_zero_fsm/_27271_ -2.78 slack (VIOLATED)
Startpoint: la_data_in[7] Endpoint: top_raybox_zero_fsm/_27273_ -2.64 slack (VIOLATED)
Startpoint: la_data_in[8] Endpoint: top_raybox_zero_fsm/_27275_ -2.79 slack (VIOLATED)
Startpoint: la_data_in[9] Endpoint: top_raybox_zero_fsm/_27277_ -2.73 slack (VIOLATED)
Startpoint: la_data_in[10] Endpoint: top_raybox_zero_fsm/_27279_ -2.83 slack (VIOLATED)
Startpoint: la_data_in[11] Endpoint: top_raybox_zero_fsm/_27281_ -2.79 slack (VIOLATED)
Startpoint: la_data_in[12] Endpoint: top_raybox_zero_fsm/_27283_ -2.81 slack (VIOLATED)
Startpoint: wb_clk_i Endpoint: top_raybox_zero_fsm/_27257_ 0.13 slack (MET)
- See what happens if we punch TEA's hold slack margins both up to 1.5
- Kept reg chain at 32.
Results for TEA:
- Time: 53:52
- GDS size: 33MB (34179512)
- Details from
du -h results/final/gds/*.gds && du -b results/final/gds/*.gds && olsum full-summary top_raybox_zero_fsm --run
:- WNS: 0
- TNS: 0
- Dimensions: 1064x1062
- CoreArea_um^2: 1,130,308.48 unchanged
- Util: 30.96%
- Antennas:
reports/signoff/46-antenna_violators.rpt
- 4; 128% is the worst.
reports/signoff/37-sta-rcx_nom/multi_corner_sta.checks.rpt
:- Fanout: 1809 vios, most ~8/4, but some up to 32/4 (on clkbufs)!
- Fastest max: 14.86
- Fastest slew: 98 vios, worst 6.22/2.6
- Fastest cap: 26 vios, worst 0.34/0.13
- Slowest max: -22.45
- Slowest slew: 513 vios, worst 18.00/3.00
- Slowest cap: 23 vios, worst, 0.34/0.12
- Typical max: 8.11
- Typical slew: 208 vios, worst 9.97/3.00
- Typical cap: 26 vios, worst 0.34/0.12
Results for UPW:
- Time: 4:00
- GDS size: 35MB (35847188)
- Details from
du -h results/final/gds/*.gds && du -b results/final/gds/*.gds && olsum full-summary user_project_wrapper --run
:- WNS: -10.89
- TNS: -2684.68
- Antennas:
reports/signoff/30-antenna_violators.rpt
- 16
- Worst is 337%
- 7 exceed 210%
reports/signoff/20-sta-rcx_nom/multi_corner_sta.checks.rpt
:- Fastest max: -4.83
- Fastest slew: 106, worst is 6.22/2.60
- Fastest cap: 30, worst is 0.47/0.23
- Slowest max: -41.30
- Slowest slew: 514, worst is 18.00/3.0
- Slowest cap: 27, worst is 0.47/0.25
- Typical max: -10.74
- Typical slew: 208, worst is 9.97/3.0
- Typical cap: 30, worst is 0.47/0.24
reports/signoff/20-sta-rcx_nom/multi_corner_sta.min.rpt
(HOLD):- Typical min: -3.82 -- HOLD VIO
- NOTE: Hold violations at UPW stage seem to be confined to LAs:
- At Typical, there is exactly 1 for each of the 13 LAs.
- They all range from -3.33 to -3.82.
Interesting: We went from 0.2/0.1 to 1.5/1.5. It sort of looks like it made LA holds worse by an average of -0.7, and the rest (non-vios) better...?
It took way longer to run!
Startpoint: la_data_in[0] Endpoint: top_raybox_zero_fsm/_27687_ -3.40 slack (VIOLATED)
Startpoint: la_data_in[1] Endpoint: top_raybox_zero_fsm/_27303_ -3.39 slack (VIOLATED)
Startpoint: la_data_in[2] Endpoint: top_raybox_zero_fsm/_27335_ -3.44 slack (VIOLATED)
Startpoint: la_data_in[3] Endpoint: top_raybox_zero_fsm/_27367_ -3.51 slack (VIOLATED)
Startpoint: la_data_in[4] Endpoint: top_raybox_zero_fsm/_27399_ -3.82 slack (VIOLATED)
Startpoint: la_data_in[5] Endpoint: top_raybox_zero_fsm/_27431_ -3.47 slack (VIOLATED)
Startpoint: la_data_in[6] Endpoint: top_raybox_zero_fsm/_27463_ -3.33 slack (VIOLATED)
Startpoint: la_data_in[7] Endpoint: top_raybox_zero_fsm/_27495_ -3.49 slack (VIOLATED)
Startpoint: la_data_in[8] Endpoint: top_raybox_zero_fsm/_27527_ -3.67 slack (VIOLATED)
Startpoint: la_data_in[9] Endpoint: top_raybox_zero_fsm/_27559_ -3.51 slack (VIOLATED)
Startpoint: la_data_in[10] Endpoint: top_raybox_zero_fsm/_27591_ -3.36 slack (VIOLATED)
Startpoint: la_data_in[11] Endpoint: top_raybox_zero_fsm/_27623_ -3.37 slack (VIOLATED)
Startpoint: la_data_in[12] Endpoint: top_raybox_zero_fsm/_27655_ -3.64 slack (VIOLATED)
Startpoint: wb_clk_i Endpoint: top_raybox_zero_fsm/_27300_ 0.48 slack (MET)
This does seem to have gone crazy with the routing on the left:
Compare test-8:
- Remove hold slack margin tweaks (inc. PL/GLB buffers).
- Bring reg chain depth from 32 back down to 2, and clock it on negedge.
FAILED! Tiny hold violations in TEA, one in particular: i_clk
startpoint, viol is -0.03. Maybe we could correct this with hold slack margin tweaks again?
- Put hold slack margin tweaks/buffers back in, but back to 0.2/0.1.
FAILED! No change to hold violations in TEA.
- Use these config adjustments:
"PL_RESIZER_HOLD_SLACK_MARGIN": 0.4, "GLB_RESIZER_HOLD_SLACK_MARGIN": 0.4, "PL_RESIZER_HOLD_MAX_BUFFER_PERCENT": 70, "GLB_RESIZER_HOLD_MAX_BUFFER_PERCENT": 70,
Succeeded in TEA, failed in UPW, but now it's just setup vios. Could be fixed by dropping UPW clock?
- Remove hold slack margins again.
- Remove reg on LA (conditional precompilation).
- Fix up SDC stuff:
- Name ports according to what our design actually has.
- Use
set_multicycle_path
on LA-driven ports.
- Can optionally use
"SYNTH_AUTONAME": true
to help trace vios.
Results: TBC
TEA failed with setup vios, UPW did not run. Cause seems to relate to using reset
condition just to halt regs in tex_sclk
block, and we can probably live without that (or give it a multicycle constraint).
- Remove reset check condition from
posedge tex_sclk
block. - Remember to fix up Verilator!
Results: TBC
TEA: Worst setup slack is 7.80, so we could probably run our clock faster than 44ns.
UPW: 617 hold vios, again all on LAs. Ranging up to -2.02. 1x LA[3], 1x LA[8], all the others are on LA[0] (reset).
- Speed up clock: CLOCK_PERIOD of 40ns
- See what effect it has on slew, fanout, caps, hold vios in UPW.
- Increase FP_CORE_UTIL and PL_TARGET_DENSITY
- See what effect it has on slew, fanout, caps, hold vios in UPW.
- Clock from user_clock2 instead of wb_clk_i
- SDC: Relax constraints on LA-based pins. Do we want
-hold 2
or-hold 1
? Put this in macro SDC, or UPW SDC? If we do the latter, then it could affect other designs...? - Should we do set_input_delay on our i_reset_alt?
- Try:
MAY BE INVALID. Instead try: https://sureshofficial.blogspot.com/2018/06/setfalsepath-vs-setdisabletiming.html
# Exclude timing analysis for a specific port set_disable_port_timing -port_name <port_name> -disable
Ideas:
- set_false_path
- user_clock2 for the design
- user_clock2 just for LA regs
- Halve the clocking of the LA regs?
- SDC relax LAs -- besides 'reset', they don't actually change... they just get set to a constant value.
- SDC, look into: "Multicycle Paths: Specifies exceptions where certain paths are allowed multiple cycles for setup and hold times."
- Use edge detector on LAs??
- Use WB instead of LA
NOTE: We don't strictly need a reliable signal on gpout_sels, just on LA-driven reset. Can we tell SDC not to care?
NOTE: Maybe registering reset_alt on negedge is bad because all the rest of the design relies on it being sync'd with clk?
NOTE: If we HAD to, we could just get rid of all the LA interface.
SYNTH_AUTONAME -- Add a synthesis step to generate names for instances. This results in instance names that can be very long, but may be more useful than the internal names that are six digit numbers.
Enabled = 1, Disabled = 0
(Default: 0)
SYNTH_TIMING_DERATE?
SYNTH_SIZING
SYNTH_SHARE_RESOURCES
SYNTH_ABC_LEGACY_REFACTOR
SYNTH_ABC_LEGACY_REWRITE
IO_PCT
FP_IO_MIN_DISTANCE: Try 5?
PL_SKIP_INITIAL_PLACEMENT
PL_RESIZER_MAX_WIRE_LENGTH
PL_RESIZER_MAX_SLEW_MARGIN (def 20)
PL_RESIZER_MAX_CAP_MARGIN (def 20)
PL_RESIZER_HOLD_SLACK_MARGIN (def 0.1)
PL_RESIZER_SETUP_SLACK_MARGIN (def 0.05)
PL_RESIZER_HOLD_MAX_BUFFER_PERCENT
PL_RESIZER_SETUP_MAX_BUFFER_PERCENT
PL_RESIZER_ALLOW_SETUP_VIOS
PL_WIRELENGTH_COEF (def 0.25) -- reducing will try to make shorter wires by initial cell place?
CTS_SINK_CLUSTERING_SIZE (def 25)
CTS_SINK_CLUSTERING_MAX_DIAMETER (def 50)
CTS_CLK_MAX_WIRE_LENGTH (def 0)
GLB_RESIZER_MAX_WIRE_LENGTH (def 0)
GLB_RESIZER_MAX_SLEW_MARGIN (def 10)
GLB_RESIZER_MAX_CAP_MARGIN (def 10)
GLB_RESIZER_ALLOW_SETUP_VIOS
GRT_ANT_MARGIN (def 10)
- Change pin ordering
- Change clock speed: Decrease TEA period from 44 to
- Try using user_clock2 (inc. SDC
usr_*
stuff) -- maybe user_clock2 can be used specifically for clocking LA reg, and keep wb_clk_i for main design?
- Fix (or at least decide on) user_defines
- FSM wrapper for SHMUL
- Try 40ns CLOCK_PERIOD for TEA
- Test on FPGA too
- Run a matrix of tests. Assuming TEA takes 15:30 and UPW takes 3:30, it's 19 mins for each permutation. That's:
- 19 runs in 6:01 hours.
- 25 runs in 7:55 hours. Example: 3x FP_CORE_UTIL, 2x hold margins, 2x setup margins, 2x PL_TARGET_DENSITY = 24
- 32 runs in 10:08 hours. Example: 4x FP_CORE_UTIL, 4x PL_TARGET_DENSITY, 2x allow setup vios?
- I think there's a parameter to allow larger global placement for better timing...?
- If necessary, just get rid of the LA stuff
- Try reducing density
- Are there any SDC constraints we can put in to help things?
- Move macro placement
- Write a runner/aggregator for TEA/UPW run results. Beware disk space.
- Check this Verilator lint:
%Warning-UNUSEDSIGNAL: /home/zerotoasic/anton/algofoogle-multi-caravel/openlane/top_raybox_zero_fsm/../../verilog/rtl/raybox-zero-fsm/src/rtl/row_render.v:11:21: Bits of signal are not used: 'texu'[5] : ... In instance top_raybox_zero_fsm.rbzero.row_render 11 | input wire [5:0] texu, | ^~~~
- Get rid of unused states/params (inc. RCP_RDX) in wall_tracer
- Find out about SoC user interrupts, masking, and default behaviour
- List each Typical Corner hold vio in UPW:
NOTE: Includes all Typical Corner hold paths, not just vios.
RPT=reports/signoff/20-sta-rcx_nom/multi_corner_sta.min.rpt fgrep 'Typical Corner' -A 999999 $RPT | egrep 'Startpoint|Endpoint|slack \(' | less
- Count hold vios (replace
wc -l
withsort
to just get a list of the violated Startpoints):fgrep 'Typical Corner' -A 999999 $RPT | egrep 'Startpoint|Endpoint|slack \(' | fgrep 'VIOLATED' -B2 | fgrep Startpoint | wc -l