Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[202405][Infra][add/remove topo] Improve vm_topology performance (#16230) #16288

Merged
merged 1 commit into from
Jan 6, 2025

Conversation

lolyu
Copy link
Contributor

@lolyu lolyu commented Jan 2, 2025

Description of PR

Summary:
Fixes # (issue)

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Back port request

  • 202012
  • 202205
  • 202305
  • 202311
  • 202405

Approach

What is the motivation for this PR?

Cherry-pick #16230 into 202405

Signed-off-by: Longxiang Lyu lolv@microsoft.com

How did you do it?

How did you verify/test it?

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

…#16230)

What is the motivation for this PR?
vm_topology builds up the testbed connections (veth links, ovs bridges, etc) on the test server by running Linux commands, which involves a lot of waiting for I/O operations.

the vm_topology script running statistics with restart-ptf:
real    18m50.615s
user    0m0.009s
sys     0m0.099s

With the I/O bound nature, vm_topology runtime could be greatly decreased by using threading pool to parallelize the I/O operations.

How did you do it?
Introduce the thread pool to vm_topology to parallel run functions that take time to finish.
* restart-ptf on dualtor-120 vm_topology profile statistics:

Top three total run time function call:

function name	total run time
add_host_ports	1040s
bind_fp_ports	96.3s
init	        16.7s

* remove-topo on dualtor-120 vm_topology profile statistics:

Top three total run time function call:

function name	                        total run time
remove_host_ports	                    165s
unbind_fp_ports	                        40.6s
remove_injected_fp_ports_from_docker	3.3s

Let's use thread pool to parallel run the following functions that take most of time from the above statistics:

* add_host_ports
* remove_host_ports
* bind_fp_ports
* unbind_fp_ports

Two new classes are introduced to support this feature:

* class VMTopologyWorker: a worker class to support work in either single thread mode or thread pool mode.
* class ThreadBufferHandler: a logging handler to buffer logs from each task submitted to the VMTopologyWorker and flush when the task ends. This is to ensure vm_topology logs are grouped by the tasks, logs from different tasks will not be mixed together.

How did you verify/test it?

Let's test this PR on a dualtor-120 testbed with this PR, and the thread pool has 13 thread workers.

operation	    vm_topology run time without this PR	vm_topology run time with this PR
remove-topo	    3m19.786s	                            1m18.430s
restart-ptf	    18m50.615s	                            3m58.963s

* restart-ptf with-this-PR vm_topology profile statistics:
function name	total run time without this PR	total run time with this PR
add_host_ports	1040s	                        169s
bind_fp_ports	96.3s	                        39.3s

* remove-topo with-this-pR vm_topology profile statistics:

function name	    total run time without this PR	total run time with this PR
remove_host_ports	165s	                        68.8s
unbind_fp_ports	    40.6s	                        8.4

Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
@lolyu lolyu force-pushed the 202405_improve_vm_topology branch from bc3eba8 to 2fc4487 Compare January 2, 2025 03:38
@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@lolyu lolyu changed the title [Infra][add/remove topo] Improve vm_topology performance (#16230) [202405][Infra][add/remove topo] Improve vm_topology performance (#16230) Jan 2, 2025
@lolyu lolyu requested a review from wangxin January 2, 2025 03:40
@wangxin wangxin merged commit 2bc8e20 into sonic-net:202405 Jan 6, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants