-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ofp_send / ofp_sendto poor performance #252
Comments
Hi,
But main usecase for OFP was packet processing and not generator: receive a packet, process it, send it. If a pure packet generator is your usecase then maybe odp_generator from odp_dpdk will be more useful.
Worker threads can receive packets (access to interface RX queues) and send packets (access to TX queues). Now, when you are running ofp_send/ofp_sendto in a loop (in a control thread), the traffic is not using the workers: it is just one (your) thread using one TX queue.
Usually when planning to use OFP, you have to start with filling a list:
So, tell us more ... Merci, |
Hi Bogdan, Thank you for the detailed response. I was able to improve the performance 5X changing the NIC (mellanox card) to not use the igb_uio driver since it seems DPDK has direct support for the NIC via ibverbs and the mlx poll mode driver. However performance is still well under what we were hoping to achieve. Anyway, I definitely ensured the send thread was not using the same threads as the worker threads as they were bound to different cores. Initially the control thread was on cpu 0 (I think the default core for control threads), but I put it on cpu 1 since I think a lot of linux processes use cpu 0 for handling interrupts and other things. I think that stuff was negligable because it didn't matter whether I ran on cpu 1 or 0. The worker threads I put on 2 and 3, but it doesn't matter which ones it seems they run on (the system I was testing on has 16 cores). Since I was only doing transmit, I wasn't sure how useful to have more than one thread. I think I understand the purpose of the hook, but was hoping to avoid that as it seems like it is sort of global hook for the port. Meaning, I could have several udp sockets open sending to different destinations, but all flows would go through the same hook, so that I wouldn't really know which packet belonged to which flow without doing some packet parsing... Seems like there is added complexity there. Anyway, if there are some benchmarks you might have, especially with the socket api, that would be very useful to know what is possible or what one might expect to be able to achieve. |
Hi, You can try this:
Hooks are points where you can access packet as is processed by ofp: you can inspect the packet or take ownership of the packet and do what you want with the it. That are many optimizations possible (e.g: use multiple TX queues (one per used core) without Multithread safe) but above points should improve performance. Btw, if DUT and packet destination are in the same network you can add a direct route. route add 192.168.200.20/32 gw 192.168.200.20 dev fp1 Merci, |
So I made a setup as described above. With OFP_PKT_TX_BURST_SIZE == 1 I am getting: With OFP_PKT_TX_BURST_SIZE == 16 I am getting: And this is with regular socket API (ofp_sendto() ).... and without multiple TX queues, etc. |
Hi, My first idea would be or you to test udpecho and udp_fwd_socket examples, to check if the numbers are still low. We tested these before and we had better performance than the one you reported. You can also set OFP_PKT_TX_BURST_SIZE to a higher value, such as 16, in case of line rate traffic and see if the numbers are getting better. BR, /Iulia |
With ofp sitting on top of odp_dpdk, the performance of ofp_send /ofp_sendto is pretty poor (UDP). In a while loop running nothing but ofp_send, the performance caps out at about 110Kpps.
This while loop is running in its own thread, but was not spawned with odp_thread api, but did run the odp/ofp local thread init in order to be able to use the ofp fastpath apis. The ODP/OFP has two dispatch threads running on their own cores.
Is the ofp_send/to family of APIs supposed to not be part of the fast path? i.e. is it on the slow path? Are the pktio interfaces the only ones supposed to be fast? Just curious as to what I might be doing wrong.
Using vanilla dpdk on the same NIC, 4-5Mpps is achievable with little effort or tuning.
Why not use plain DPDK? Was hoping to make use of OFP networking stack capabilities.... Rather not have to populate layer 2-3-4 headers and perform arp resolution, etc etc.
The text was updated successfully, but these errors were encountered: