Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a fixture to set scheduler to slower speeds and revert it back. #15718

Merged
merged 10 commits into from
Dec 23, 2024

Conversation

rraghav-cisco
Copy link
Contributor

Description of PR

Summary:
Fixes the flakiness of DWRR testcase. The PR adds a new fixture that slows down the scheduler without changing the underlying algorithm. This allows the dWRR test to pass consitently.

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Back port request

  • 202012
  • 202205
  • 202305
  • 202311
  • 202405

Approach

What is the motivation for this PR?

How did you do it?

How did you verify/test it?

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

@rraghav-cisco
Copy link
Contributor Author

Result from my run:

=========================================================================================================================== PASSES ===========================================================================================================================
______________________________________________________________________________________________________ TestQosSai.testQosSaiDwrr[single_dut_multi_asic] ______________________________________________________________________________________________________
---------------------------------------------------- generated xml file: /run_logs/dwrr/2024-11-24-04-39-38/1/L-400g-LL-masic-gb/qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr_2024-11-24-04-39-38.xml -----------------------------------------------------
INFO:root:Can not get Allure report URL. Please check logs
------------------------------------------------------------------------------------------------------------------- live log sessionfinish -------------------------------------------------------------------------------------------------------------------
04:56:32 __init__.pytest_terminal_summary         L0067 INFO   | Can not get Allure report URL. Please check logs
================================================================================================================== short test summary info ===================================================================================================================
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[single_dut_multi_asic]
========================================================================================================= 1 passed, 1 warning in 1012.38s (0:16:52) ==========================================================================================================
sonic@202405-qos-sonic-mgmt-prod:/data/tests$

@rraghav-cisco
Copy link
Contributor Author

Finally found a way to repeat tests:

====================================================================================================================== warnings summary ======================================================================================================================
../../usr/local/lib/python3.8/dist-packages/paramiko/transport.py:236
  /usr/local/lib/python3.8/dist-packages/paramiko/transport.py:236: CryptographyDeprecationWarning: Blowfish has been deprecated
    "class": algorithms.Blowfish,

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================================================================================================================== PASSES ===========================================================================================================================
___________________________________________________________________________________________________ TestQosSai.testQosSaiDwrr[single_dut_multi_asic-1-10] ____________________________________________________________________________________________________
___________________________________________________________________________________________________ TestQosSai.testQosSaiDwrr[single_dut_multi_asic-2-10] ____________________________________________________________________________________________________
___________________________________________________________________________________________________ TestQosSai.testQosSaiDwrr[single_dut_multi_asic-3-10] ____________________________________________________________________________________________________
___________________________________________________________________________________________________ TestQosSai.testQosSaiDwrr[single_dut_multi_asic-4-10] ____________________________________________________________________________________________________
___________________________________________________________________________________________________ TestQosSai.testQosSaiDwrr[single_dut_multi_asic-5-10] ____________________________________________________________________________________________________
___________________________________________________________________________________________________ TestQosSai.testQosSaiDwrr[single_dut_multi_asic-6-10] ____________________________________________________________________________________________________
___________________________________________________________________________________________________ TestQosSai.testQosSaiDwrr[single_dut_multi_asic-7-10] ____________________________________________________________________________________________________
___________________________________________________________________________________________________ TestQosSai.testQosSaiDwrr[single_dut_multi_asic-8-10] ____________________________________________________________________________________________________
___________________________________________________________________________________________________ TestQosSai.testQosSaiDwrr[single_dut_multi_asic-9-10] ____________________________________________________________________________________________________
___________________________________________________________________________________________________ TestQosSai.testQosSaiDwrr[single_dut_multi_asic-10-10] ___________________________________________________________________________________________________
---------------------------------------------------- generated xml file: /run_logs/dwrr/2024-11-24-19-05-56/1/L-400g-LL-masic-gb/qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr_2024-11-24-19-05-56.xml -----------------------------------------------------
INFO:root:Can not get Allure report URL. Please check logs
------------------------------------------------------------------------------------------------------------------- live log sessionfinish -------------------------------------------------------------------------------------------------------------------
19:35:30 __init__.pytest_terminal_summary         L0067 INFO   | Can not get Allure report URL. Please check logs
================================================================================================================== short test summary info ===================================================================================================================
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[single_dut_multi_asic-1-10]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[single_dut_multi_asic-2-10]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[single_dut_multi_asic-3-10]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[single_dut_multi_asic-4-10]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[single_dut_multi_asic-5-10]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[single_dut_multi_asic-6-10]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[single_dut_multi_asic-7-10]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[single_dut_multi_asic-8-10]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[single_dut_multi_asic-9-10]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[single_dut_multi_asic-10-10]
========================================================================================================= 10 passed, 1 warning in 1773.04s (0:29:33) =========================================================================================================
sonic@202405-qos-sonic-mgmt-prod:/data/tests$ 

@rraghav-cisco
Copy link
Contributor Author

rraghav-cisco commented Nov 26, 2024

100G/S - 100G/S, mAsic:

INFO:root:Can not get Allure report URL. Please check logs
------------------------------------------------------------------------------------------------------------------- live log sessionfinish -------------------------------------------------------------------------------------------------------------------
01:56:02 __init__.pytest_terminal_summary         L0067 INFO   | Can not get Allure report URL. Please check logs
================================================================================================================== short test summary info ===================================================================================================================
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[single_dut_multi_asic-1-10]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrrWeightChange[single_dut_multi_asic-1-10]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[single_dut_multi_asic-2-10]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrrWeightChange[single_dut_multi_asic-2-10]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[single_dut_multi_asic-3-10]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrrWeightChange[single_dut_multi_asic-3-10]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[single_dut_multi_asic-4-10]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrrWeightChange[single_dut_multi_asic-4-10]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[single_dut_multi_asic-5-10]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrrWeightChange[single_dut_multi_asic-5-10]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[single_dut_multi_asic-6-10]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrrWeightChange[single_dut_multi_asic-6-10]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[single_dut_multi_asic-7-10]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrrWeightChange[single_dut_multi_asic-7-10]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[single_dut_multi_asic-8-10]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrrWeightChange[single_dut_multi_asic-8-10]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[single_dut_multi_asic-9-10]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrrWeightChange[single_dut_multi_asic-9-10]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[single_dut_multi_asic-10-10]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrrWeightChange[single_dut_multi_asic-10-10]
ERROR qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[single_dut_multi_asic-1-10] - Failed: Processes "['analyze_logs--<MultiAsicSonicHost xx37-lc7>']" failed with exit code "1"
ERROR qos/test_qos_sai.py::TestQosSai::testQosSaiDwrrWeightChange[single_dut_multi_asic-1-10] - Failed: Processes "['analyze_logs--<MultiAsicSonicHost xx37-lc7>']" failed with exit code "1"
ERROR qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[single_dut_multi_asic-2-10] - Failed: Processes "['analyze_logs--<MultiAsicSonicHost xx37-lc7>']" failed with exit code "1"
ERROR qos/test_qos_sai.py::TestQosSai::testQosSaiDwrrWeightChange[single_dut_multi_asic-2-10] - Failed: Processes "['analyze_logs--<MultiAsicSonicHost xx37-lc7>']" failed with exit code "1"
ERROR qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[single_dut_multi_asic-3-10] - Failed: Processes "['analyze_logs--<MultiAsicSonicHost xx37-lc7>']" failed with exit code "1"
ERROR qos/test_qos_sai.py::TestQosSai::testQosSaiDwrrWeightChange[single_dut_multi_asic-3-10] - Failed: Processes "['analyze_logs--<MultiAsicSonicHost xx37-lc7>']" failed with exit code "1"
ERROR qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[single_dut_multi_asic-4-10] - Failed: Processes "['analyze_logs--<MultiAsicSonicHost xx37-lc7>']" failed with exit code "1"
ERROR qos/test_qos_sai.py::TestQosSai::testQosSaiDwrrWeightChange[single_dut_multi_asic-4-10] - Failed: Processes "['analyze_logs--<MultiAsicSonicHost xx37-lc7>']" failed with exit code "1"
ERROR qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[single_dut_multi_asic-5-10] - Failed: Processes "['analyze_logs--<MultiAsicSonicHost xx37-lc7>']" failed with exit code "1"
ERROR qos/test_qos_sai.py::TestQosSai::testQosSaiDwrrWeightChange[single_dut_multi_asic-5-10] - Failed: Processes "['analyze_logs--<MultiAsicSonicHost xx37-lc7>']" failed with exit code "1"
ERROR qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[single_dut_multi_asic-6-10] - Failed: Processes "['analyze_logs--<MultiAsicSonicHost xx37-lc7>']" failed with exit code "1"
ERROR qos/test_qos_sai.py::TestQosSai::testQosSaiDwrrWeightChange[single_dut_multi_asic-6-10] - Failed: Processes "['analyze_logs--<MultiAsicSonicHost xx37-lc7>']" failed with exit code "1"
ERROR qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[single_dut_multi_asic-7-10] - Failed: Processes "['analyze_logs--<MultiAsicSonicHost xx37-lc7>']" failed with exit code "1"
ERROR qos/test_qos_sai.py::TestQosSai::testQosSaiDwrrWeightChange[single_dut_multi_asic-7-10] - Failed: Processes "['analyze_logs--<MultiAsicSonicHost xx37-lc7>']" failed with exit code "1"
ERROR qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[single_dut_multi_asic-8-10] - Failed: Processes "['analyze_logs--<MultiAsicSonicHost xx37-lc7>']" failed with exit code "1"
ERROR qos/test_qos_sai.py::TestQosSai::testQosSaiDwrrWeightChange[single_dut_multi_asic-8-10] - Failed: Processes "['analyze_logs--<MultiAsicSonicHost xx37-lc7>']" failed with exit code "1"
ERROR qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[single_dut_multi_asic-9-10] - Failed: Processes "['analyze_logs--<MultiAsicSonicHost xx37-lc7>']" failed with exit code "1"
ERROR qos/test_qos_sai.py::TestQosSai::testQosSaiDwrrWeightChange[single_dut_multi_asic-9-10] - Failed: Processes "['analyze_logs--<MultiAsicSonicHost xx37-lc7>']" failed with exit code "1"
ERROR qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[single_dut_multi_asic-10-10] - Failed: Processes "['analyze_logs--<MultiAsicSonicHost xx37-lc7>']" failed with exit code "1"
ERROR qos/test_qos_sai.py::TestQosSai::testQosSaiDwrrWeightChange[single_dut_multi_asic-10-10] - Failed: Processes "['analyze_logs--<MultiAsicSonicHost xx37-lc7>']" failed with exit code "1"
=========================================================================================== 20 passed, 470 deselected, 1 warning, 20 errors in 2973.01s (0:49:33) ============================================================================================
sonic@202405-qos-sonic-mgmt-prod:/data/tests$ 

@rraghav-cisco
Copy link
Contributor Author

100G/S - 100G/S - sAsic:

---------------------------------------------------------- generated xml file: /run_logs/dwrr-cir/2024-11-26-02-31-41/1/O-100g-SS-gb-sasic/qos/test_qos_sai.py::TestQosSai_2024-11-26-02-31-41.xml -----------------------------------------------------------
INFO:root:Can not get Allure report URL. Please check logs
------------------------------------------------------------------------------------------------------------------- live log sessionfinish -------------------------------------------------------------------------------------------------------------------
03:05:14 __init__.pytest_terminal_summary         L0067 INFO   | Can not get Allure report URL. Please check logs
================================================================================================================== short test summary info ===================================================================================================================
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[single_asic-1-10]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrrWeightChange[single_asic-1-10]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[single_asic-2-10]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrrWeightChange[single_asic-2-10]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[single_asic-3-10]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrrWeightChange[single_asic-3-10]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[single_asic-4-10]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrrWeightChange[single_asic-4-10]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[single_asic-5-10]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrrWeightChange[single_asic-5-10]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[single_asic-6-10]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrrWeightChange[single_asic-6-10]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[single_asic-7-10]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrrWeightChange[single_asic-7-10]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[single_asic-8-10]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrrWeightChange[single_asic-8-10]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[single_asic-9-10]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrrWeightChange[single_asic-9-10]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[single_asic-10-10]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrrWeightChange[single_asic-10-10]
================================================================================================= 20 passed, 470 deselected, 1 warning in 2011.91s (0:33:31) =================================================================================================
sonic@202405-qos-sonic-mgmt-prod:/data/tests$ 

@rraghav-cisco rraghav-cisco marked this pull request as ready for review December 4, 2024 18:19
@rraghav-cisco
Copy link
Contributor Author

Final run result:

========================================================================================== 10 failed, 130 passed, 3290 deselected, 1 warning in 15756.72s (4:22:36) ==========================================================================================
sonic@202405-qos-sonic-mgmt-prod:/data/tests$ 

I got this with the option "-e --count=10 -k testQosSaiDwrr" options. The 10 fails are due to long-long single-asic run, which is not a use case in qos-sai.

@sdszhang , @abdosi : FYI.

@rraghav-cisco
Copy link
Contributor Author

I am not sure what the error means in the prechecks. @auspham , pls help.


# Set scheduler back to original speed.
self.copy_and_run_set_cir_script_cisco_8000(
dut=dst_dut, port=dst_port, asic=dst_index, speed=int(1.1*port_speed))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we need to use 1.1*port_speed?

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@zhixzhu
Copy link
Contributor

zhixzhu commented Dec 16, 2024

@rraghav-cisco tx enable/disable impacts scheduler rate. Need to adjust sequence of steps as below:
1, tx disable, this set credit cir to 0
2, send packets
3, set_credit_cir to 5Gbps
4, check received packets
5, tx enable, this set credit cir to original value.

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

# Set scheduler to 5 Gbps.
self.copy_and_run_set_cir_script_cisco_8000(
dut=dst_dut,
port=intf,
Copy link
Contributor

@sdszhang sdszhang Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got this error:

        for intf in interfaces:
            # Set scheduler to 5 Gbps.
>           self.copy_and_run_set_cir_script_cisco_8000(
                dut=dst_dut,
                port=intf,
                asic=dst_index,
                speed=5 * 1000 * 1000 * 1000)
E           TypeError: copy_and_run_set_cir_script_cisco_8000() got an unexpected keyword argument 'port'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sdszhang , I apologize. I missed the update to this function in my previous commit. I have fixed it now. Thanks.

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@yejianquan yejianquan merged commit 09f7dc1 into sonic-net:master Dec 23, 2024
17 checks passed
@mssonicbld
Copy link
Collaborator

@rraghav-cisco PR conflicts with 202405 branch

@yejianquan
Copy link
Collaborator

@rraghav-cisco
Please resolve the conflict,
And I'm a little bit curious that, where do we set back the scheduler to original value?

@rraghav-cisco
Copy link
Contributor Author

@rraghav-cisco Please resolve the conflict, And I'm a little bit curious that, where do we set back the scheduler to original value?

@yejianquan : I will resolve the conflict. The set-back happens with tx_enable inside the ptf script. The function: sai_thrift_port_tx_enable() in the file: tests/saitests/py3/sai_qos_tests.py, which is run in PTF, basically reverts the scheduler to original state.

@rraghav-cisco
Copy link
Contributor Author

rraghav-cisco commented Dec 23, 2024

@mssonicbld : PR for 202405: #16199

mssonicbld pushed a commit to mssonicbld/sonic-mgmt that referenced this pull request Jan 13, 2025
sonic-net#15718)

Description of PR
Summary:
Fixes the flakiness of DWRR testcase. The PR adds a new fixture that slows down the scheduler without changing the underlying algorithm. This allows the dWRR test to pass consitently.

co-authorized by: jianquanye@microsoft.com
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202411: #16479

@sdszhang
Copy link
Contributor

202405 manual cherry pick PR #16199, merged.

mssonicbld pushed a commit that referenced this pull request Jan 13, 2025
#15718)

Description of PR
Summary:
Fixes the flakiness of DWRR testcase. The PR adds a new fixture that slows down the scheduler without changing the underlying algorithm. This allows the dWRR test to pass consitently.

co-authorized by: jianquanye@microsoft.com
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants