23 Jun 2020

Can we optimise yesterday's work on test07b, or does Xilinx ISE already do a great job of optimising for us?

Fitter tells us:

Macrocells     Product Terms    Function Block   Registers      Pins           
Used/Tot       Used/Tot         Inps Used/Tot    Used/Tot       Used/Tot       
26 /72  ( 36%) 99  /360  ( 27%) 42 /216 ( 19%)   26 /72  ( 36%) 8  /52  ( 15%)

(Full fitter report for test07b...)

cpldfit:  version P.20131013                        Xilinx Inc.
                                  Fitter Report
Design Name: test07b                             Date:  6-21-2020,  1:13PM
Device Used: XC9572XL-7-VQ64
Fitting Status: Successful

*************************  Mapped Resource Summary  **************************

Macrocells     Product Terms    Function Block   Registers      Pins           
Used/Tot       Used/Tot         Inps Used/Tot    Used/Tot       Used/Tot       
26 /72  ( 36%) 99  /360  ( 27%) 42 /216 ( 19%)   26 /72  ( 36%) 8  /52  ( 15%)

** Function Block Resources **

Function    Mcells      FB Inps     Pterms      IO          
Block       Used/Tot    Used/Tot    Used/Tot    Used/Tot    
FB1          16/18       21/54       51/90       1/13
FB2           1/18        0/54        0/90       0/13
FB3           0/18        0/54        0/90       1/14
FB4           9/18       21/54       48/90       6/12
             -----       -----       -----      -----    
             26/72       42/216      99/360      8/52 

* - Resource is exhausted

** Global Control Resources **

Signal 'CLK50' mapped onto global clock net GCK3.
Global output enable net(s) unused.
Global set/reset net(s) unused.

** Pin Resources **

Signal Type    Required     Mapped  |  Pin Type            Used    Total 
------------------------------------|------------------------------------
Input         :    2           2    |  I/O              :     7      46
Output        :    5           5    |  GCK/IO           :     1       3
Bidirectional :    0           0    |  GTS/IO           :     0       2
GCK           :    1           1    |  GSR/IO           :     0       1
GTS           :    0           0    |
GSR           :    0           0    |
                 ----        ----
        Total      8           8

Since the HSYNC timing is based on multiples of 16, let's see if that will help...

I tried a bunch of different ways to write "divide-by-N" DFF-style logic instead of the naïve comparisons with fixed numbers, and they all made things seemingly use more resources.

I ended up doing the main pixel (per line) counter like this:

// Count "half-pixels" (because of 50MHz clock instead of 25MHz) per line:
reg [10:0] hcount; // 0..1599
always @(posedge CLK50) hcount <= hcount==1599 ? 0 : hcount+1;

I then can divide hcount by 32 to get an index to a 16-pixel chunk of the line:

// Get 16-pixel "chunk" index from hcount:
wire [5:0] chunk;
assign chunk = hcount[10:5];

Note that the whole line is 800 "pixels" and hence 50 "chunks".

Then I tried a few different ways to generate HSYNC:

This method is relatively efficient on resources, but is not clocking HSYNC as a register, so it might be unstable:
```
// Generate HSYNC from chunk:
assign hsync = ~(chunk>=41 && chunk<=46);
```

This uses a register to help make it more stable:

reg hsync_latch;
always @(posedge CLK50) hsync_latch <= ~(chunk>=41 && chunk<=46);
assign hsync = hsync_latch;

...the whole design uses 12 MC, 12 Reg, 11 FBI, and 19 PT.

This slightly different approach only changes the hsync_latch register as the system enters chunk 41 (where it asserts HSYNC by pulling it low), or as it enters chunk 47 (where it releases HSYNC by pulling it high):
```
reg hsync_latch;
always @(posedge CLK50)
begin
  if (chunk==41) hsync_latch <= 0; // Assert HSYNC.
  if (chunk==47) hsync_latch <= 1; // Release HSYNC.
end
assign hsync = hsync_latch;
```
...and the whole design uses 12 MC, 12 Reg, 22 FBI, and 18 PT.

Interestingly, this variation on no. 3 above seems to be the best:

reg hsync_latch;
always @(posedge CLK50)
begin
if (chunk[5] & (&chunk[3:0]))             hsync_latch <= 1; // Release HSYNC.
else if (chunk[5] & chunk[3] & chunk[0])  hsync_latch <= 0; // Assert HSYNC.
end
assign hsync = hsync_latch;

...with the whole design using 12 MC, 12 Reg, 11 FBI, and 18 PT.

I tried using a similar approach for VSYNC:

reg vsync_latch;
always @(posedge CLK50)
begin
       if ((&vcount[8:5]) & vcount[3] & vcount[2]) vsync_latch <= 1; // Release VSYNC if we hit line 492.
  else if ((&vcount[8:5]) & vcount[3] & vcount[1]) vsync_latch <= 0; // Assert VSYNC if we hit line 490.
end
assign vsync = vsync_latch;

...but it used an extra Product Term and a lot more Function Block Inputs than just doing it like this:

// Drive VSYNC based on line counter:
reg vsync_latch;
always @(posedge CLK50) vsync_latch <= ~(vcount>=480+10 && vcount<480+10+2);
assign vsync = vsync_latch;

Note, re chunk numbers;

Chunk 41: 41×16 = 640+16, which is the combined active video area and front porch.
Chunk 47: 47×16 = 640+16+96, which runs us up to the start of the back porch.

I'm putting test07c on ice for now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0035-2020-06-23.md

0035-2020-06-23.md

23 Jun 2020

Files

0035-2020-06-23.md

Latest commit

History

0035-2020-06-23.md

File metadata and controls

23 Jun 2020