Can we optimise yesterday's work on test07b
, or does Xilinx ISE already do a great job of optimising for us?
Fitter tells us:
Macrocells Product Terms Function Block Registers Pins
Used/Tot Used/Tot Inps Used/Tot Used/Tot Used/Tot
26 /72 ( 36%) 99 /360 ( 27%) 42 /216 ( 19%) 26 /72 ( 36%) 8 /52 ( 15%)
(Full fitter report for test07b...)
cpldfit: version P.20131013 Xilinx Inc.
Fitter Report
Design Name: test07b Date: 6-21-2020, 1:13PM
Device Used: XC9572XL-7-VQ64
Fitting Status: Successful
************************* Mapped Resource Summary **************************
Macrocells Product Terms Function Block Registers Pins
Used/Tot Used/Tot Inps Used/Tot Used/Tot Used/Tot
26 /72 ( 36%) 99 /360 ( 27%) 42 /216 ( 19%) 26 /72 ( 36%) 8 /52 ( 15%)
** Function Block Resources **
Function Mcells FB Inps Pterms IO
Block Used/Tot Used/Tot Used/Tot Used/Tot
FB1 16/18 21/54 51/90 1/13
FB2 1/18 0/54 0/90 0/13
FB3 0/18 0/54 0/90 1/14
FB4 9/18 21/54 48/90 6/12
----- ----- ----- -----
26/72 42/216 99/360 8/52
* - Resource is exhausted
** Global Control Resources **
Signal 'CLK50' mapped onto global clock net GCK3.
Global output enable net(s) unused.
Global set/reset net(s) unused.
** Pin Resources **
Signal Type Required Mapped | Pin Type Used Total
------------------------------------|------------------------------------
Input : 2 2 | I/O : 7 46
Output : 5 5 | GCK/IO : 1 3
Bidirectional : 0 0 | GTS/IO : 0 2
GCK : 1 1 | GSR/IO : 0 1
GTS : 0 0 |
GSR : 0 0 |
---- ----
Total 8 8
Since the HSYNC timing is based on multiples of 16, let's see if that will help...
I tried a bunch of different ways to write "divide-by-N" DFF-style logic instead of the naïve comparisons with fixed numbers, and they all made things seemingly use more resources.
I ended up doing the main pixel (per line) counter like this:
// Count "half-pixels" (because of 50MHz clock instead of 25MHz) per line:
reg [10:0] hcount; // 0..1599
always @(posedge CLK50) hcount <= hcount==1599 ? 0 : hcount+1;
I then can divide hcount
by 32 to get an index to a 16-pixel chunk
of the line:
// Get 16-pixel "chunk" index from hcount:
wire [5:0] chunk;
assign chunk = hcount[10:5];
Note that the whole line is 800 "pixels" and hence 50 "chunks".
Then I tried a few different ways to generate HSYNC:
- This method is relatively efficient on resources, but is not clocking HSYNC as a register, so it might be unstable:
// Generate HSYNC from chunk: assign hsync = ~(chunk>=41 && chunk<=46);
- This uses a register to help make it more stable:
...the whole design uses 12 MC, 12 Reg, 11 FBI, and 19 PT.
reg hsync_latch; always @(posedge CLK50) hsync_latch <= ~(chunk>=41 && chunk<=46); assign hsync = hsync_latch;
- This slightly different approach only changes the
hsync_latch
register as the system enters chunk 41 (where it asserts HSYNC by pulling it low), or as it enters chunk 47 (where it releases HSYNC by pulling it high):...and the whole design uses 12 MC, 12 Reg, 22 FBI, and 18 PT.reg hsync_latch; always @(posedge CLK50) begin if (chunk==41) hsync_latch <= 0; // Assert HSYNC. if (chunk==47) hsync_latch <= 1; // Release HSYNC. end assign hsync = hsync_latch;
- Interestingly, this variation on no. 3 above seems to be the best:
...with the whole design using 12 MC, 12 Reg, 11 FBI, and 18 PT.
reg hsync_latch; always @(posedge CLK50) begin if (chunk[5] & (&chunk[3:0])) hsync_latch <= 1; // Release HSYNC. else if (chunk[5] & chunk[3] & chunk[0]) hsync_latch <= 0; // Assert HSYNC. end assign hsync = hsync_latch;
I tried using a similar approach for VSYNC:
reg vsync_latch;
always @(posedge CLK50)
begin
if ((&vcount[8:5]) & vcount[3] & vcount[2]) vsync_latch <= 1; // Release VSYNC if we hit line 492.
else if ((&vcount[8:5]) & vcount[3] & vcount[1]) vsync_latch <= 0; // Assert VSYNC if we hit line 490.
end
assign vsync = vsync_latch;
...but it used an extra Product Term and a lot more Function Block Inputs than just doing it like this:
// Drive VSYNC based on line counter:
reg vsync_latch;
always @(posedge CLK50) vsync_latch <= ~(vcount>=480+10 && vcount<480+10+2);
assign vsync = vsync_latch;
Note, re chunk
numbers;
- Chunk 41: 41×16 = 640+16, which is the combined active video area and front porch.
- Chunk 47: 47×16 = 640+16+96, which runs us up to the start of the back porch.
I'm putting test07c
on ice for now.