Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iperf_udp: implement UDP connect retry mechanism #1144

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

olichtne
Copy link
Contributor

This implements a simple retry mechanism for the iperf_udp_connect function using a static iteration loop with a select call and attempting to resend the connect datagram.

This is mostly as a suggestion for a fix, there's probably a better way to implement some parts of this, for example the number of iterations should probably be either a constant or could be a new cli argument that could change the value. The retry mechanism could also be optionally disabled. Another idea is that the timeout value for the select function could also be configurable, a 1 second wait here may not always be a good idea depending on the specific network that is being measured.1

I'm also not 100% sure if this doesn't potentially have side effects for the test initialization state transition on the slave side in case the connect datagram is not in fact lost and instead the reply is delayed or lost.

From my internal testing on servers this resolves our issues when running multiple iperf instances to simulate stress on the network stack by running multiple network applications.

Sometimes the single UDP datagram sent as a form of "connection init"
gets lost on the way. A common example is testing over wifi, or more
specifically in our case testing many parallel udp connections.

When this happens the recv() call on the client times out after 30
seconds and the test is unsuccessful.

This commit implements a very basic retry mechanism for this. Using
select() calls with a timeout of 1 second we attempt to send the
"connection init" datagram up to 5 times and expect a reply from the
server.

If this fails the test fails earlier reporting issues with initializing
the connection. Otherwise the test proceeds as before.

Signed-off-by: Ondrej Lichtner <olichtne@redhat.com>
@jtluka
Copy link
Contributor

jtluka commented Aug 20, 2021

@olichtne , could you please update this pull request with a fix of uninitialized timeout.tv_usec:

@@ -575,6 +575,7 @@ iperf_udp_connect(struct iperf_test *test)
         struct timeval timeout;
 
         timeout.tv_sec = 1;
+        timeout.tv_usec = 0;
 
         memcpy(&read_set, &test->read_set, sizeof(fd_set));
         result = select(test->max_fd + 1, &read_set, NULL, NULL, &timeout);

Without this patch I get following error on aarch64:

iperf3: error - select failed: Invalid argument

Suggested-by: Jan Tluka <jtluka@redhat.com
Signed-off-by: Ondrej Lichtner <olichtne@redhat.com>
Copy link

@talatb talatb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had the same issue with multiple processes, we tried this PR and it works with small modification that addressed in the comments.

@olichtne
Copy link
Contributor Author

olichtne commented Feb 1, 2022

that sounds a little suspicious, i'm not sure how +i would help, can you describe your situation in a bit more detail? or how the +i is supposed to help?

@secahtah
Copy link

This may not have been merged yet, but thank you for posting it, it saved my bacon today.

I have a suggestion, rather than hardcoding the number of retry attempts to five, set a default value and then make it a flag that can be passed from the command line to override the default.

Thanks!

@secahtah
Copy link

On second thought, there is more to this. While testing using this today I discovered that the iperf3 UDP client just silently finishes the test even though there's 0 packets sent. Maybe more is required.

@olichtne
Copy link
Contributor Author

olichtne commented Sep 1, 2022

yeah, for our internal use, we keep using the fix from this PR that we have in a separate branch.

Since this didn't get more of a response from the maintainers of this project I didn't really look much deeper into this yet as I don't want to mess up any internals accidentally...

I do agree though on the cli argument being more comfortable for a real solution...

Not sure what you mean by your case of UDP client silently finishing with 0 packets sent, I think that may need more information/reproducer data to understand, and may potentially be a new issue completely.

@davidBar-On
Copy link
Contributor

I'm also not 100% sure if this doesn't potentially have side effects for the test initialization state transition on the slave side in case the connect datagram is not in fact lost and instead the reply is delayed or lost.

I like the approach suggested by this PR, but I think that the above is the main issue with it and that the Stream creation protocol should be changed to make this suggestion feasible for general use.

If I understand iperf_udp_accept() code correctly , when the server receives the 4 bytes "datagram to the UDP stream to let the server know we're here", it uses the current control socket for the UDP Stream and creates a new control socket. If the 4 bytes datagram is received by the server but the response is lost, then two of the possible scenarios are:

  1. If the test is using only one stream (the default -P 1), then the retry of sending the 4 bytes datagram will be received by the server as a test datagram, and the server will not send a reply to it, so the test will not work.
  2. If more then one stream is used, e.g. -P 2, the the retry datagram will be received by the server as initiation of the second stream, but the response to the client will be received by the client as an ack the the first stream. Trying to initiate the second stream will fail since from the server point of view the two streams were already created.

The suggested approach probably works well when only the upload interface (client to server) is loaded, so the chance of loosing the the server's reply is low.

Note that a less elegant solution to the issue may be by running the client in a shell-script loop.

@olichtne
Copy link
Contributor Author

olichtne commented Sep 5, 2022

right, reading the explanation basically confirms that i don't entirely understand how the initiation protocol works which is why i didn't push for this harder.

But now after reading this explanation I am kind of questioning if this initiation protocol should even be in place this way for UDP.

To me thinking about it from an outside view of what I would expect for a UDP test is that a perf measurement tool would:

  1. client would create a control connection and communicate the creation of individual opening of "streams" via different udp ports to the server
  2. the server responds to this, confirming or failing this, but still via the control connection
  3. on success the client simply starts sending the measurement PDUs via the individual udp sockets to the server

The point of this being that the UDP connections are not actual connections, and they're not established as part of the UDP protocol. You directly use them to measure their performance from the start.

In a situation that the udp packets don't arrive at the server you simply report:

  • generator throughput: X mbps
  • receiver throughput: 0 mbps

And close the connections on both sides via the control connection.

In my eyes, this situation is completely valid, it also would mean that since there is no "connection" creation it doesn't have to be retried.

@davidBar-On
Copy link
Contributor

I have submitted PR #1402 which should be an enhancement of this PR. Although my system does not have the issues solved by the PR, I hope it is sufficiently tested. Of course, it will help if the new PR will be tested on systems that does have this issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants