Skip to content

Latest commit

 

History

History
146 lines (118 loc) · 5.56 KB

0035-2020-06-23.md

File metadata and controls

146 lines (118 loc) · 5.56 KB

23 Jun 2020

Can we optimise yesterday's work on test07b, or does Xilinx ISE already do a great job of optimising for us?

Fitter tells us:

Macrocells     Product Terms    Function Block   Registers      Pins           
Used/Tot       Used/Tot         Inps Used/Tot    Used/Tot       Used/Tot       
26 /72  ( 36%) 99  /360  ( 27%) 42 /216 ( 19%)   26 /72  ( 36%) 8  /52  ( 15%)
(Full fitter report for test07b...)
cpldfit:  version P.20131013                        Xilinx Inc.
                                  Fitter Report
Design Name: test07b                             Date:  6-21-2020,  1:13PM
Device Used: XC9572XL-7-VQ64
Fitting Status: Successful

*************************  Mapped Resource Summary  **************************

Macrocells     Product Terms    Function Block   Registers      Pins           
Used/Tot       Used/Tot         Inps Used/Tot    Used/Tot       Used/Tot       
26 /72  ( 36%) 99  /360  ( 27%) 42 /216 ( 19%)   26 /72  ( 36%) 8  /52  ( 15%)

** Function Block Resources **

Function    Mcells      FB Inps     Pterms      IO          
Block       Used/Tot    Used/Tot    Used/Tot    Used/Tot    
FB1          16/18       21/54       51/90       1/13
FB2           1/18        0/54        0/90       0/13
FB3           0/18        0/54        0/90       1/14
FB4           9/18       21/54       48/90       6/12
             -----       -----       -----      -----    
             26/72       42/216      99/360      8/52 

* - Resource is exhausted

** Global Control Resources **

Signal 'CLK50' mapped onto global clock net GCK3.
Global output enable net(s) unused.
Global set/reset net(s) unused.

** Pin Resources **

Signal Type    Required     Mapped  |  Pin Type            Used    Total 
------------------------------------|------------------------------------
Input         :    2           2    |  I/O              :     7      46
Output        :    5           5    |  GCK/IO           :     1       3
Bidirectional :    0           0    |  GTS/IO           :     0       2
GCK           :    1           1    |  GSR/IO           :     0       1
GTS           :    0           0    |
GSR           :    0           0    |
                 ----        ----
        Total      8           8

Since the HSYNC timing is based on multiples of 16, let's see if that will help...

I tried a bunch of different ways to write "divide-by-N" DFF-style logic instead of the naïve comparisons with fixed numbers, and they all made things seemingly use more resources.

I ended up doing the main pixel (per line) counter like this:

// Count "half-pixels" (because of 50MHz clock instead of 25MHz) per line:
reg [10:0] hcount; // 0..1599
always @(posedge CLK50) hcount <= hcount==1599 ? 0 : hcount+1;

I then can divide hcount by 32 to get an index to a 16-pixel chunk of the line:

// Get 16-pixel "chunk" index from hcount:
wire [5:0] chunk;
assign chunk = hcount[10:5];

Note that the whole line is 800 "pixels" and hence 50 "chunks".

Then I tried a few different ways to generate HSYNC:

  1. This method is relatively efficient on resources, but is not clocking HSYNC as a register, so it might be unstable:
    // Generate HSYNC from chunk:
    assign hsync = ~(chunk>=41 && chunk<=46);
  2. This uses a register to help make it more stable:
    reg hsync_latch;
    always @(posedge CLK50) hsync_latch <= ~(chunk>=41 && chunk<=46);
    assign hsync = hsync_latch;
    ...the whole design uses 12 MC, 12 Reg, 11 FBI, and 19 PT.
  3. This slightly different approach only changes the hsync_latch register as the system enters chunk 41 (where it asserts HSYNC by pulling it low), or as it enters chunk 47 (where it releases HSYNC by pulling it high):
    reg hsync_latch;
    always @(posedge CLK50)
    begin
      if (chunk==41) hsync_latch <= 0; // Assert HSYNC.
      if (chunk==47) hsync_latch <= 1; // Release HSYNC.
    end
    assign hsync = hsync_latch;
    ...and the whole design uses 12 MC, 12 Reg, 22 FBI, and 18 PT.
  4. Interestingly, this variation on no. 3 above seems to be the best:
    reg hsync_latch;
    always @(posedge CLK50)
    begin
    if (chunk[5] & (&chunk[3:0]))             hsync_latch <= 1; // Release HSYNC.
    else if (chunk[5] & chunk[3] & chunk[0])  hsync_latch <= 0; // Assert HSYNC.
    end
    assign hsync = hsync_latch;
    ...with the whole design using 12 MC, 12 Reg, 11 FBI, and 18 PT.

I tried using a similar approach for VSYNC:

reg vsync_latch;
always @(posedge CLK50)
begin
       if ((&vcount[8:5]) & vcount[3] & vcount[2]) vsync_latch <= 1; // Release VSYNC if we hit line 492.
  else if ((&vcount[8:5]) & vcount[3] & vcount[1]) vsync_latch <= 0; // Assert VSYNC if we hit line 490.
end
assign vsync = vsync_latch;

...but it used an extra Product Term and a lot more Function Block Inputs than just doing it like this:

// Drive VSYNC based on line counter:
reg vsync_latch;
always @(posedge CLK50) vsync_latch <= ~(vcount>=480+10 && vcount<480+10+2);
assign vsync = vsync_latch;

Note, re chunk numbers;

  • Chunk 41: 41×16 = 640+16, which is the combined active video area and front porch.
  • Chunk 47: 47×16 = 640+16+96, which runs us up to the start of the back porch.

I'm putting test07c on ice for now.