Can we optimise yesterday's work on test07b
, or does Xilinx ISE already do a great job of optimising for us?
Fitter tells us:
Macrocells Product Terms Function Block Registers Pins
Used/Tot Used/Tot Inps Used/Tot Used/Tot Used/Tot
26 /72 ( 36%) 99 /360 ( 27%) 42 /216 ( 19%) 26 /72 ( 36%) 8 /52 ( 15%)
(Full fitter report for test07b...)
cpldfit: version P.20131013 Xilinx Inc.
Fitter Report
Design Name: test07b Date: 6-21-2020, 1:13PM
Device Used: XC9572XL-7-VQ64
Fitting Status: Successful
************************* Mapped Resource Summary **************************
Macrocells Product Terms Function Block Registers Pins
Used/Tot Used/Tot Inps Used/Tot Used/Tot Used/Tot
26 /72 ( 36%) 99 /360 ( 27%) 42 /216 ( 19%) 26 /72 ( 36%) 8 /52 ( 15%)
** Function Block Resources **
Function Mcells FB Inps Pterms IO
Block Used/Tot Used/Tot Used/Tot Used/Tot
FB1 16/18 21/54 51/90 1/13
FB2 1/18 0/54 0/90 0/13
FB3 0/18 0/54 0/90 1/14
FB4 9/18 21/54 48/90 6/12
----- ----- ----- -----
26/72 42/216 99/360 8/52
* - Resource is exhausted
** Global Control Resources **
Signal 'CLK50' mapped onto global clock net GCK3.
Global output enable net(s) unused.
Global set/reset net(s) unused.
** Pin Resources **
Signal Type Required Mapped | Pin Type Used Total
Input : 2 2 | I/O : 7 46
Output : 5 5 | GCK/IO : 1 3
Bidirectional : 0 0 | GTS/IO : 0 2
GCK : 1 1 | GSR/IO : 0 1
GTS : 0 0 |
GSR : 0 0 |
---- ----
Total 8 8
Since the HSYNC timing is based on multiples of 16, let's see if that will help...
I tried a bunch of different ways to write "divide-by-N" DFF-style logic instead of the naïve comparisons with fixed numbers, and they all made things seemingly use more resources.
I ended up doing the main pixel (per line) counter like this:
// Count "half-pixels" (because of 50MHz clock instead of 25MHz) per line:
reg [10:0] hcount; // 0..1599
always @(posedge CLK50) hcount <= hcount==1599 ? 0 : hcount+1;
I then can divide hcount
by 32 to get an index to a 16-pixel chunk
of the line:
// Get 16-pixel "chunk" index from hcount:
wire [5:0] chunk;
assign chunk = hcount[10:5];
Note that the whole line is 800 "pixels" and hence 50 "chunks".
Then I tried a few different ways to generate HSYNC:
- This method is relatively efficient on resources, but is not clocking HSYNC as a register, so it might be unstable:
// Generate HSYNC from chunk: assign hsync = ~(chunk>=41 && chunk<=46);
- This uses a register to help make it more stable:
...the whole design uses 12 MC, 12 Reg, 11 FBI, and 19 PT.
reg hsync_latch; always @(posedge CLK50) hsync_latch <= ~(chunk>=41 && chunk<=46); assign hsync = hsync_latch;
- This slightly different approach only changes the
register as the system enters chunk 41 (where it asserts HSYNC by pulling it low), or as it enters chunk 47 (where it releases HSYNC by pulling it high):...and the whole design uses 12 MC, 12 Reg, 22 FBI, and 18 PT.reg hsync_latch; always @(posedge CLK50) begin if (chunk==41) hsync_latch <= 0; // Assert HSYNC. if (chunk==47) hsync_latch <= 1; // Release HSYNC. end assign hsync = hsync_latch;
- Interestingly, this variation on no. 3 above seems to be the best:
...with the whole design using 12 MC, 12 Reg, 11 FBI, and 18 PT.
reg hsync_latch; always @(posedge CLK50) begin if (chunk[5] & (&chunk[3:0])) hsync_latch <= 1; // Release HSYNC. else if (chunk[5] & chunk[3] & chunk[0]) hsync_latch <= 0; // Assert HSYNC. end assign hsync = hsync_latch;
I tried using a similar approach for VSYNC:
reg vsync_latch;
always @(posedge CLK50)
if ((&vcount[8:5]) & vcount[3] & vcount[2]) vsync_latch <= 1; // Release VSYNC if we hit line 492.
else if ((&vcount[8:5]) & vcount[3] & vcount[1]) vsync_latch <= 0; // Assert VSYNC if we hit line 490.
assign vsync = vsync_latch;
...but it used an extra Product Term and a lot more Function Block Inputs than just doing it like this:
// Drive VSYNC based on line counter:
reg vsync_latch;
always @(posedge CLK50) vsync_latch <= ~(vcount>=480+10 && vcount<480+10+2);
assign vsync = vsync_latch;
Note, re chunk
- Chunk 41: 41×16 = 640+16, which is the combined active video area and front porch.
- Chunk 47: 47×16 = 640+16+96, which runs us up to the start of the back porch.
I'm putting test07c
on ice for now.