eflatency: Optionally echo the packet in the pong reply and support VLAN tags #238

osresearch · 2024-08-07T11:16:29Z

This patch adds the option of having the pong node copy the contents of the ping message into the reply, which adds a little more realism to the eflatency test since it requires the receiver to read the contents of the message, not just receive the notice that a message has arrived.

Additionally, it also adds the option for 802.1q VLAN tagging for eflatency tests that traverse switches, making it possible to benchmark those switches as well.

It also cleans up a bit of the logic by removing some magic sizes by using sizeof() on various ethernet headers.

jfeather-amd

Thanks for the contribution, @osresearch!

This review just covers things I found by inspection, and I plan on doing some testing in the coming days, but overall I quite like these changes! I do have one concern in regards to performance which is perhaps a non-issue, I will confer with other members of the team in regards to this concern. There are also quite a few style nit-picks, please feel free to disregard these if you want - I can apply these changes in the merging process if you would rather focus on the code itself.

src/tests/ef_vi/eflatency.c

jfeather-amd · 2024-08-08T08:57:57Z

src/tests/ef_vi/eflatency.c

-      switch( EF_EVENT_TYPE(evs[i]) ) {
+    while(vi->i < vi->n_ev)
+    {
+      const ef_event * const ev = &vi->evs[vi->i++];


This certainly feels a lot easier to parse what's going on! The previous dance of incrementing vi->i early if we're returning rather than continuing to process the remaining events is quite obtuse.

src/tests/ef_vi/eflatency.c

jfeather-amd · 2024-08-08T11:11:46Z

src/tests/ef_vi/eflatency.c

+    if (cfg_validate && cfg_payload_len > 0)
+    {
+      const uint8_t rx_pattern = rx_vi->rx_pkt[HEADER_SIZE];
+      if (pattern != rx_pattern)


I wonder if when validating it's worth validating the whole packet using memcp for example, or if for more detail about the first octet that's different a custom loop would suffice.

The first version of the patch (as you noted above) only set the first byte... I can add a loop to check the rest of them.

src/tests/ef_vi/eflatency.c

osresearch · 2024-08-08T13:27:30Z

Thanks for the feedback on the patch. I'll make the style corrections and push an updated version.

… clearer, added cfg_verbose

…back

jfeather-amd

Thanks for addressing my comments so quickly! I'm still looking at some tests for this, and have kicked off a test run to go overnight.

jfeather-amd · 2024-08-08T15:53:19Z

src/tests/ef_vi/eflatency.c

+        // sfc driver
+        vi->rx_pkt = pkt_bufs[EF_EVENT_RX_RQ_ID(*ev)]->dma_buf;
        return;
      case EF_EVENT_TYPE_RX_REF:
+        // efct driver
+        vi->rx_pkt = efct_vi_rxpkt_get(&vi->vi, ev->rx_ref.pkt_id);


Ah, good question! Because handle_rx_ref() calls efct_vi_rxpkt_release(), our app's reference to this has been released. The data may still be valid (e.g., if something else still has a reference to this), but I don't believe it's safe to use at this point.

Indeed, checking my own knowledge here against the user guide:

Once released, the packet identifier and any pointers to the packet data must be considered invalid.

jfeather-amd · 2024-08-08T15:58:51Z

src/tests/ef_vi/eflatency.c

-  for( i = 0; i < cfg_iter; ++i ) {
+  for( i = 0; i < cfg_iter; ++i, pattern++ ) {
+    memset(&tx_pkt[HEADER_SIZE], pattern, cfg_payload_len);
+    checksum_udp_pkt(tx_pkt);


The memset() and checksum_udp_pkt() calls are outside of the timing loop for the ping process

This looks like it's only half true, although I hadn't noticed the nuance before! We call gettimeofday(&start, NULL); above the loop, and internally (per iteration) call uint64_t start = ci_frc64_get(); so I would expect the "full" measurement will see an increase, but perhaps the per-iteration one won't. I would definitely want to verify this behaviour before accepting this though, as numbers changing for whatever reason can be quite a nasty surprise to end users!

jfeather-amd · 2024-08-08T16:00:44Z

src/tests/ef_vi/eflatency.c

+      const uint8_t rx_pattern = rx_vi->rx_pkt[HEADER_SIZE];
+      if( pattern != rx_pattern )
+        fprintf(stderr, "expected pong %02x got %02x\n", pattern, rx_pattern);
+    }


After you pointed out the memory operations were outside of the timing loop, I spotted that this bit of code isn't. I would be interested to see if your 150ns time increase changes if this it moved after uint64_t stop = ci_frc64_get();

jfeather-amd · 2024-08-14T11:35:27Z

Hi @osresearch, sorry for the delay in getting back to you on this! I just finished looking into performance testing this patch and found that there seems to be a significant enough regression that I am hesitant to merge this in its current state. I would like to think for a while longer about how to progress this PR, as I do think this would be a nice change to have! Some options to consider are:

Making a similar change to a different ef_vi test app
Locking this behind a build option
Duplicating eflatency to have eflatency_memops (for example) which incorporates this change

Although I haven't thought for long enough to decide which one of these would be most appropriate.

osresearch · 2024-08-14T14:12:44Z

Thanks for doing the performance testing on the patch, @jfeather-amd . Can you describe where the slowdowns seem to be? In the non-vlan, non-echo, non-validating case (the default), my latency deltas were in the noise on the X2 and X3 cards, so I'm very curious about your methodology so that I can replicate the results for my future testing.

osresearch · 2024-08-27T09:48:10Z

I've re-run tests on the X3 cards with better isolation and pinning the eflatency task to a single CPU; the results show no change in the min, 50%, 95% and 99% numbers, although there is an unexpected increase in the mean of about 50ns. This is caused by the unconditional memset() and checksum_udp_pkt() on the send side, although these occur outside of the ci_frc64_get() timing loop and which I had assumed would not affect the timing. Adding if(cfg_validating)... around the packet rewriting removes this effect.

However, this performance regression appears to be an issue with the way mean is computed -- it is the total time for all packets (delta between the two gettimeofday() calls), not the mean of the measured times (rdtsc ticks). I wonder if the mean should be computed as the average of the actual times instead. It is unexpected to me that the first column of results doesn't match the data used for the other columns. I've submitted #240 to compute the mean from the timings array instead of the wall clock time.

osresearch requested a review from a team as a code owner August 7, 2024 11:16

jfeather-amd reviewed Aug 8, 2024

View reviewed changes

Trammell Hudson and others added 3 commits August 8, 2024 17:20

eflatency: remove locals from generic_desc_check to make control flow…

c909333

… clearer, added cfg_verbose

eflatency: send a unique pattern each message and optionally echo it …

a518f2a

…back

eflatency: add -V for vlan tag in ethernet header

732e00f

osresearch force-pushed the eflatency-echo branch from bf7d196 to 732e00f Compare August 8, 2024 15:22

jfeather-amd reviewed Aug 8, 2024

View reviewed changes

osresearch mentioned this pull request Aug 29, 2024

SFC9250 ctpio fallback failures #241

Open

ivatet-amd mentioned this pull request Sep 3, 2024

eflatency: compute the mean from the timings array #240

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eflatency: Optionally echo the packet in the pong reply and support VLAN tags #238

eflatency: Optionally echo the packet in the pong reply and support VLAN tags #238

osresearch commented Aug 7, 2024

jfeather-amd left a comment

jfeather-amd Aug 8, 2024

jfeather-amd Aug 8, 2024

osresearch Aug 8, 2024

osresearch commented Aug 8, 2024

jfeather-amd left a comment

jfeather-amd Aug 8, 2024

jfeather-amd Aug 8, 2024

jfeather-amd Aug 8, 2024

jfeather-amd commented Aug 14, 2024

osresearch commented Aug 14, 2024

osresearch commented Aug 27, 2024 •

edited

Loading

eflatency: Optionally echo the packet in the pong reply and support VLAN tags #238

Are you sure you want to change the base?

eflatency: Optionally echo the packet in the pong reply and support VLAN tags #238

Conversation

osresearch commented Aug 7, 2024

jfeather-amd left a comment

Choose a reason for hiding this comment

jfeather-amd Aug 8, 2024

Choose a reason for hiding this comment

jfeather-amd Aug 8, 2024

Choose a reason for hiding this comment

osresearch Aug 8, 2024

Choose a reason for hiding this comment

osresearch commented Aug 8, 2024

jfeather-amd left a comment

Choose a reason for hiding this comment

jfeather-amd Aug 8, 2024

Choose a reason for hiding this comment

jfeather-amd Aug 8, 2024

Choose a reason for hiding this comment

jfeather-amd Aug 8, 2024

Choose a reason for hiding this comment

jfeather-amd commented Aug 14, 2024

osresearch commented Aug 14, 2024

osresearch commented Aug 27, 2024 • edited Loading

osresearch commented Aug 27, 2024 •

edited

Loading