Add Ethernet core #8

t-wallet · 2024-08-28T14:09:01Z

This PR contains an Ethernet II + IPv4 + UDP core.

Still TODO:

Create top-level Mac.hs and IP.hs modules
Test haddock, improve documentation and hide documentation of internal modules
Clean up tests and fix generation of data in Icmp.hs and IPPacketizers.hs
Add multi-entry ARP table: it exists, but suffers from timing issues
Strip FCS
Add detailed examples from https://github.com/GiPHouse/qbaylogic-clash-based-macipudp-stack-spring24/pull/
Upstream partitionS to Clash.Protocols.Df: Add variants of Df functions that handle function arguments wrapped in signals clash-protocols#107

Co-authored-by: Jasmijn Bookelmann <bookelmannjasmijn@gmail.com> Co-authored-by: Cato van Ojen <Baublesaurus@users.noreply.github.com> Co-authored-by: MatthijsMu <93450301+MatthijsMu@users.noreply.github.com> Co-authored-by: Jasper Laumen <96011116+JLaumen@users.noreply.github.com> Co-authored-by: Mart Koster <30956441+Akribes@users.noreply.github.com> Co-authored-by: Bryan Rinders <bryan__ajax@hotmail.com> Co-authored-by: Daan Weessies <daan.weessies@gmail.com> Co-authored-by: Rowan Goemans <goemansrowan@gmail.com>

rowanG077 · 2024-08-28T16:12:43Z

src/Clash/Cores/Ethernet/Arp/ArpManager.hs

+     , (Maybe ArpResponse, (Maybe IPv4Address, Df.Data ArpLite)))
+-- User issues a lookup request. We don't have a timeout, because the ARP table should
+-- always respond within a reasonable time frame. If not, there is a bug in the ARP table.
+arpManagerT AwaitLookup{..} (Just lookupIPv4, arpResponseIn, Ack readyIn, _) =


I don't get why this _awaitTransmission is necessary? If you are in the AwaitLookup state and you receive a Just lookupIPv4 why can't you just forward that directly to the ARP table?

We do forward the IP address unconditionally? _awaitTransmission is needed to be able to keep driving the same ARP request to the transmitter if the transmitter asserts backpressure.
It makes no difference for a single entry ARP table which has one clock cycle latency, which is why the old version worked in hardware. But with a multi-entry table you can have a longer latency. And that's where the notorious bug that stopped the multi-entry stack from working came from.

I.e. it sent an ARP request to the transmitter, the transmitter did nothing with it because he wasn't ready, but we already moved to the waiting state.

src/Clash/Cores/Ethernet/Arp/ArpTable.hs

src/Clash/Cores/Ethernet/Examples/ArpStack.hs

rowanG077 · 2024-08-28T16:16:52Z

src/Clash/Cores/Ethernet/Examples/FullUdpStack.hs

+  -- ^ My MAC Address
+  -> Signal dom (IPv4Address, IPv4Address)
+  -- ^ My IP address and the subnet
+  -> Circuit (PacketStream dom dataWidth (IPv4Address, UdpHeaderLite)) (PacketStream dom dataWidth (IPv4Address, UdpHeaderLite))


I'm not so sure anymore of providing the UDP circuit as a parameter. Because it's very limiting in what you can then do with it.

In GiPHouse/qbaylogic-clash-based-macipudp-stack-spring24#155 I have rewritten it so you have more flexibility. I would like your and others opinion on this.

I like

Circuit (PacketStream dom dataWidth (IPv4Address, UdpHeaderLite), PacketStream domEthRx 1 ()) (PacketStream dom dataWidth (IPv4Address, UdpHeaderLite), PacketStream domEthTx 1 ())

as a type for the UDP core. It clearly conveys that outgoing packets are either derived from incoming packets (i.e. ARP, UDP echo) or are coming from the outside (i.e. DHCP discovery). And that not all incoming packets are replied to.

The current version seems to assume that all incoming packets need to be replied to. Which is obviously not the case for anything but echos. So, let's change it!

In GiPHouse/qbaylogic-clash-based-macipudp-stack-spring24@868c010#diff-d4db0745d4f48da89a4b824cae23aa6a8d92447fa14b8f93220f4d6c538c5ce7 I did basically that for someone who wanted to use clash ethernet as a core.

Changed. Only thing that I added is that you really need a fifo after the IP depacketizer.

A UDP echo stack can now be expressed as:

udpEcho = circuit $ \phyIn -> do phyIn' <- unsafeRgmiiRxC ... -< phyIn udpOut' <- mapMeta (\(ip, hdr) -> (ip, hdr{_udplDstPort = _udplSrcPort hdr, _udplSrcPort = _udplDstPort hdr})) -< udpOut (udpOut, toTxPhy) <- fullStackC ... < (udpOut', phyIn') rgmiiTxC ... -< toTxPhy

src/Clash/Cores/Ethernet/Examples/FullUdpStack.hs

src/Clash/Cores/Ethernet/IP/IPPacketizers.hs

src/Clash/Cores/Ethernet/Icmp.hs

rowanG077 · 2024-08-28T16:34:11Z

src/Clash/Cores/Ethernet/Mac/Preamble.hs

+  (KnownNat dataWidth) =>
+  (1 <= dataWidth) =>
+  Circuit (PacketStream dom dataWidth ()) (PacketStream dom dataWidth ())
+preambleStripperC =


I'm not so sure anymore that an incoming ethernet packet will always have the full preamble since it's partially used for clock recovery. @DigitalBrains1 Do you know this?

If that is true, then we may want to investigate stripping the preamble in the Ethernet domain. It should be simple enough to not hurt timings that much, and I suspect even putting registers between the component will be cheaper than doing it for generic dataWidth.

I'm fairly sure a receiver can discard the preamble; not sure whether in whole or if there is a minimum number of bit times it needs to pass on for valid reception. I've quickly browsed through 802.3 and found the following things:

802.3-2018 section 4.2.5 Preamble generation

In a LAN implementation, most of the Physical Layer components are allowed to provide valid output some number of bit times after being presented valid input signals. Thus it is necessary for a preamble to be sent before the start of data, to allow the PLS circuitry to reach its steady state.

I think this says a receiver is allowed to only start outputting the preamble some number of bit times after the preamble starts, and the preamble is there to absorb the difference.

4.1.2.1.2 Reception without contention

The Physical Layer passes subsequent bits up to the MAC sublayer, where the leading bits are discarded, up to and including the end of the preamble and Start Frame Delimiter.

Not really conclusive, but I feel it leaves the door wide open to say the number of leading bits is inconsequential.

In section 4.2.9 Frame reception, IEEE 802.3 provides pseudo-Pascal programs to show what a working implementation of a CSMA/CD Media Access sublayer could look like. In it's BitReceiver process, which produces received frames, it strips off stuff by invoking another procedure; the call is documented:

PhysicalSignalDecap; {Skip idle and extension, strip off preamble and sfd}

The procedure itself is only described:

procedure PhysicalSignalDecap; begin {Receive one bit at a time from physical medium until a valid sfd is detected, discard bits and return} end; {PhysicalSignalDecap}

Nowhere does it say anything about a check on the length of the preamble in the receiver.

Without referring to any standard, it also just makes sense. The preamble is, among other things, meant for synchronisation. In the first part of the preamble, you haven't synchronised yet, so you can't output it because it is still gibberish in some sense.

I wrote two new versions of the preamble stripper that allow for a variable-length preamble in bytes. I believe it is reasonable to assume that the SFD is byte-aligned.

The first version just forwards fragments upon encountering the SFD:

data PreambleStripperState = ValidateSfd | Forward deriving (Generic, NFDataX, Show, ShowX) preambleStripperC :: forall dom. HiddenClockResetEnable dom => Circuit (PacketStream dom 1 ()) (PacketStream dom 1 ()) preambleStripperC = fromSignals (mealyB go ValidateSfd) where go ValidateSfd (Just PacketStreamM2S{..}, _) = (nextSt, (PacketStreamS2M True, Nothing)) where nextSt | isNothing _last && head _data == 0xD5 = Forward | otherwise = ValidateSfd go Forward (Just transferIn, bwdIn) = (nextSt, (bwdIn, Just transferIn)) where nextSt | isJust (_last transferIn) && _ready bwdIn = ValidateSfd | otherwise = Forward go st (Nothing, _) = (st, (PacketStreamS2M True, Nothing))

This version happily accepts packets with a longer preamble than normal just like the procedure described in the comment above. And prays that faulty packets are picked up by the CRC check.

Or we can go with a version that's a little stricter (but also slightly more expensive):

data PreambleStripperState = ValidateSfd {_counter :: Index 8} | Forward | Drop deriving (Generic, NFDataX, Show, ShowX) preambleStripperC :: forall dom. HiddenClockResetEnable dom => Circuit (PacketStream dom 1 ()) (PacketStream dom 1 ()) preambleStripperC = fromSignals (mealyB go (ValidateSfd 0)) where go ValidateSfd{..} (Just PacketStreamM2S{..}, _) = (nextSt, (PacketStreamS2M True, Nothing)) where nextSt | isNothing _last && head _data == 0xD5 = Forward | isNothing _last && _counter == maxBound = Drop | otherwise = ValidateSfd (satSucc SatWrap _counter) go Forward (Just transferIn, bwdIn) = (nextSt, (bwdIn, Just transferIn)) where nextSt | isJust (_last transferIn) && _ready bwdIn = ValidateSfd 0 | otherwise = Forward go Drop (Just transferIn, _) = (nextSt, (PacketStreamS2M True, Nothing)) where nextSt | isJust (_last transferIn) = ValidateSfd 0 | otherwise = Drop go st (Nothing, _) = (st, (PacketStreamS2M True, Nothing))

Which drops packets if the SFD was not detected in the first 8 bytes.

I personally prefer the latter, what do you guys think?

I feel more for the former. I don't feel the extra circuitry adds a useful enough feature.

Also, in the latter, it seems to me you can replace satSucc SatWrap _counter by just succ _counter. It always starts at 0 and never progresses beyond maxBound.

[edit] I would not oppose the latter if that's what you want anyway [/edit]

How would you sensibly handle a non byte-aligned SFD with a streaming protocol that only allows you to enable full bytes? We cannot say that some bits are invalid. This is not possible with AXI Stream either.

However, we might also commit a version that requires byte alignment now, and create an issue noting this requirement and asking anyone who discovers they have reception issues to please inform us with details about their setup.

I'm not convinced this is a good solution, but personally I'm willing to accept this. I can't speak for other people.

[edit]
To clarify, when I posted this reply, GitHub did not show me the previous message, so I didn't ignore it, I was unaware of it.
[/edit]

I took a look what our forefathers did (liteeth). And what they do two things:

Make preamble enabled or disabled, interestingly as a package deal with CRC https://github.com/enjoy-digital/liteeth/blob/master/liteeth/mac/core.py#L215.

Check the last n-bits, where n is the datawidth, of the preamble + SFD (https://github.com/enjoy-digital/liteeth/blob/master/liteeth/mac/preamble.py#L77)

Also by googling I did find a phy that could return non-byte aligned shortened preamble + SFD: https://e2e.ti.com/support/interface-group/interface/f/interface-forum/1233993/dp83848c-preamble-sfd-length-is-shortage. Personally I think this should be made a phy specific problem. It's trivial to detect an SFD when you know you get n-bits every cycle. But hard when you have already adapted it to a byte packetstream with completely unknown alignment.

Personally considering the state currently I would keep the SFD check byte aligned for now. We only have an RGMII phy anyway. Make a note that explains the above story and once the next Phy is actually implemented we fix the story for that phy.

Point of note is that I did run the older colorlight board RGMII Phy in 100mbit mode using liteeth in the past without issues. That means the SFD was byte aligned. The 8.2v rev of the PCB has a phy that does not support 100mbit.

How would you sensibly handle a non byte-aligned SFD with a streaming protocol that only allows you to enable full bytes?

I did not know about which place in the stack you were talking about. MII is a nibble-width protocol, and it's somewhat of a common unit in Ethernet. I thought you meant whether the nibbles of the SFD were byte-aligned, i.e., that the first nibble is always at an even number of cycles from the first where RX_DV is asserted (0 cycles, i.e., first nibble is the SFD, or 2, or ...).

It's a pity I misunderstood you. I spent a fair amount of time wading through 802.3 to check your assumption; turns out my assumption was the faulty one.

If I understand Rowan correctly, I agree that it should be byte-aligned before you turn it into a PacketStream. It doesn't seem to make sense to not do this.

Doesn't RGMII strictly define byte alignment? I.e., they either say the lower nibble is on the falling clock edge and the upper nibble is on the rising edge, or they say the exact reverse but still fully specify it? I'm not going to dive into another standard right now...

If I understand Rowan correctly, I agree that it should be byte-aligned before you turn it into a PacketStream. It doesn't seem to make sense to not do this.

Yes this is essentially what I meant. If you wanted a generic component you'd want something like:

sfdDetector :: Signal (Maybe (BitVector n)) -> Signal (Maybe (BitVector n))

Which eats the input until a SFD is detected and then forwards. Similar to the original component except not yet in a byte aligned packet stream. And you also don't need to care about backpressure at this stage.

Doesn't RGMII strictly define byte alignment? I.e., they either say the lower nibble is on the falling clock edge and the upper nibble is on the rising edge, or they say the exact reverse but still fully specify it? I'm not going to dive into another standard right now...

Yes RGMII is a nice case. You get a full byte every cycle. I just checked and it specifies:

Multiplexing of data and control information is done by taking advantage of both edges of the reference clocks and sending the lower 4 bits on the rising edge and the upper 4 bits on the falling edge. Control signals can be multiplexed into a single clock cycle using the same technique.

and

This interface can be used to implement the 10/100 Mbps Ethernet Media Independent Interface (MII) by reducing the clock rate to 25MHz for 100Mbps operation and 2.5MHz for 10Mbps. The TXC will always be generated by the MAC and RXC will be generated by the PHY. During packet reception, the RXC may be stretched on either the positive or negative pulse to accommodate the transition from the free running clock to a data-synchronous clock domain. When the speed of the PHY changes, a similar stretching
of the positive or negative pulses is allowed. No glitching of the clocks are allowed during speed transitions.
This interface will operate at 10 and 100Mbps speeds exactly the same way it does at Gigabit speed with the exception that the data may be duplicated on the falling edge of the appropriate clock. The MAC will hold TX_CTL low until it has ensured that it is operating at the same speed as the PHY.

So if I interpret it correctly even in 10/100 Mbps mode it will always full transmit bytes.

test/Test/Cores/Ethernet.hs

It was only used to be able to test the ARP stack in hardware when we did not have IP yet.

Major documentation was missing for the ICMP module. Especially the fact that the checksum adjustment breaks for very specific (unlikely to happen in practice) input packets.

Mostly includes improvements in packet generation, and reuse of models and generators. Aside from that, most test files have also been formatted with fourmolu.

src/Clash/Cores/Ethernet/Arp.hs

Now also recognizes packets of which some of the preamble is missing. The SFD is still required to be byte-aligned.

Also removed some superfluous constraints and made the timer of the ARP manager less granular. This caused the manual test to fail, because its domain is too slow. The manual test was not any better than a hardware test anyway, so I removed it. Once support for ReqResp tests arrives from clash-protocols, we can add a proper test.

t-wallet · 2024-09-21T14:20:14Z

ARP is all done and open for further review. Only thing missing is aborting upon receiving backpressure from the ARP receiver. The problem with that is that it does not run at full throughput currently, because that saves resources. I could generalize abortOnBackpressureC to only abort after seeing n cycles of backpressure in order to fix that.

Co-authored-by: Rowan Goemans <goemansrowan@gmail.com> See the discussion in #8 (comment) for more details.

See the discussion in #8 (comment) for more details. Co-authored-by: Rowan Goemans <goemansrowan@gmail.com>

t-wallet · 2024-09-30T14:29:17Z

Split IPv4 checksum improvements to #14.

t-wallet and others added 2 commits August 28, 2024 16:00

Add top-level MAC module and improve its documentation

9cf12fd

rowanG077 reviewed Aug 28, 2024

View reviewed changes

t-wallet added 3 commits August 30, 2024 11:28

Remove ARP stack example

b89f52e

It was only used to be able to test the ARP stack in hardware when we did not have IP yet.

Document ICMP module

669673f

Major documentation was missing for the ICMP module. Especially the fact that the checksum adjustment breaks for very specific (unlikely to happen in practice) input packets.

Signicantly overhaul the Ethernet tests

a73282a

Mostly includes improvements in packet generation, and reuse of models and generators. Aside from that, most test files have also been formatted with fourmolu.

DigitalBrains1 reviewed Sep 3, 2024

View reviewed changes

src/Clash/Cores/Ethernet/Arp.hs Outdated Show resolved Hide resolved

t-wallet added 9 commits September 9, 2024 11:07

Refer to RFC 1624 in ICMP docs

29d774d

Change the stripping of the preamble

a3c4a6f

Now also recognizes packets of which some of the preamble is missing. The SFD is still required to be byte-aligned.

Add preamble test to unittests

60cd05b

Simplify fcs validator and strip fcs

270e582

Add detailed MAC TX example

44626b8

Upstream partitionS

3b031f2

Add top level IPv4 module, move InternetChecksum out of IP module

52a3f2d

100% documentation coverage, removed unqualified identifiers

039aa9e

Arp: fix timing issues and improve documentation

69f8944

t-wallet force-pushed the ethernet branch from 236942a to 69f8944 Compare September 20, 2024 11:42

t-wallet added 2 commits September 21, 2024 13:30

ARP documentation improvements

815b19c

EthernetStream: update outdated documentation, improve readability

3acdaff

DigitalBrains1 force-pushed the main branch from f75ceaa to 477189d Compare September 23, 2024 15:08

t-wallet added a commit that referenced this pull request Sep 27, 2024

Make full UDP stack more flexible

9075466

Co-authored-by: Rowan Goemans <goemansrowan@gmail.com> See the discussion in #8 (comment) for more details.

Make full UDP stack example more flexible

2658307

See the discussion in #8 (comment) for more details. Co-authored-by: Rowan Goemans <goemansrowan@gmail.com>

t-wallet force-pushed the ethernet branch from 9075466 to 2658307 Compare September 27, 2024 12:00

t-wallet added 2 commits September 27, 2024 15:01

Udp: add port swapping, improve docs

c70dc3b

Remove unused internet checksum functions: see new PR

4b5be3f

t-wallet mentioned this pull request Sep 30, 2024

Improve IPv4 checksum verification/generation #14

Draft

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Ethernet core #8

Add Ethernet core #8

t-wallet commented Aug 28, 2024 •

edited

Loading

rowanG077 Aug 28, 2024

t-wallet Aug 28, 2024 •

edited

Loading

rowanG077 Aug 28, 2024

t-wallet Sep 16, 2024

rowanG077 Sep 16, 2024

t-wallet Sep 27, 2024 •

edited

Loading

rowanG077 Aug 28, 2024

t-wallet Aug 29, 2024

DigitalBrains1 Sep 1, 2024 •

edited

Loading

t-wallet Sep 9, 2024

DigitalBrains1 Sep 9, 2024 •

edited

Loading

t-wallet Sep 10, 2024

DigitalBrains1 Sep 10, 2024 •

edited

Loading

rowanG077 Sep 10, 2024 •

edited

Loading

DigitalBrains1 Sep 10, 2024

rowanG077 Sep 10, 2024

t-wallet commented Sep 21, 2024

t-wallet commented Sep 30, 2024

Add Ethernet core #8

Are you sure you want to change the base?

Add Ethernet core #8

Conversation

t-wallet commented Aug 28, 2024 • edited Loading

Choose a reason for hiding this comment

t-wallet Aug 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

t-wallet Sep 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DigitalBrains1 Sep 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DigitalBrains1 Sep 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DigitalBrains1 Sep 10, 2024 • edited Loading

Choose a reason for hiding this comment

rowanG077 Sep 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

t-wallet commented Sep 21, 2024

t-wallet commented Sep 30, 2024

t-wallet commented Aug 28, 2024 •

edited

Loading

t-wallet Aug 28, 2024 •

edited

Loading

t-wallet Sep 27, 2024 •

edited

Loading

DigitalBrains1 Sep 1, 2024 •

edited

Loading

DigitalBrains1 Sep 9, 2024 •

edited

Loading

DigitalBrains1 Sep 10, 2024 •

edited

Loading

rowanG077 Sep 10, 2024 •

edited

Loading