Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve starting ipmi_sim program #9378

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

maximenoel8
Copy link
Contributor

@maximenoel8 maximenoel8 commented Oct 17, 2024

Context

If one of the IPMI Power management features is failing, it will keep ipmi_sim process running. The next feature will try to start again this ipmi_sim process. In this situation, the command ipmi_sim -n < /dev/null > /dev/null & will be executed and wait for the previous process to stop ( nothing will stop the process ). The command will not return any code and the ssh connection will die. This situation will not be detected by the ssh timeout and the testsuite will stay stuck forever at this stage.

What does this PR change?

Use nohup command to start the service. This command will return immediately a result even if the process is already running.
Improve the IPMI mocking cleanup. From my test, pkill fake_ipmi_host.sh || : was not killing successfully fake_ipmi_host. ( detected by rerunning several time the same features. )

GUI diff

No difference.

  • DONE

Documentation

  • No documentation needed: only internal and user invisible changes

  • DONE

Test coverage

ℹ️ If a major new functionality is added, it is strongly recommended that tests for the new functionality are added to the Cucumber test suite

  • No tests: already covered

  • DONE

Links

Port:
Issue:

  • DONE

Changelogs

Make sure the changelogs entries you are adding are compliant with https://github.com/uyuni-project/uyuni/wiki/Contributing#changelogs and https://github.com/uyuni-project/uyuni/wiki/Contributing#uyuni-projectuyuni-repository

If you don't need a changelog check, please mark this checkbox:

  • No changelog needed

If you uncheck the checkbox after the PR is created, you will need to re-run changelog_test (see below)

Re-run a test

If you need to re-run a test, please mark the related checkbox, it will be unchecked automatically once it has re-run:

  • Re-run test "changelog_test"
  • Re-run test "backend_unittests_pgsql"
  • Re-run test "java_lint_checkstyle"
  • Re-run test "java_pgsql_tests"
  • Re-run test "ruby_rubocop"
  • Re-run test "schema_migration_test_pgsql"
  • Re-run test "susemanager_unittests"
  • Re-run test "javascript_lint"
  • Re-run test "spacecmd_unittests"

Before you merge

Check How to branch and merge properly!

Improve killing fake_ipmi_host.sh
Copy link
Contributor

👋 Hello! Thanks for contributing to our project.
Acceptance tests will take some time (aprox. 1h), please be patient ☕
You can see the progress at the end of this page and at https://github.com/uyuni-project/uyuni/pull/9378/checks
Once tests finish, if they fail, you can check 👀 the cucumber report. See the link at the output of the action.
You can also check the artifacts section, which contains the logs at https://github.com/uyuni-project/uyuni/pull/9378/checks.

If you are unsure the failing tests are related to your code, you can check the "reference jobs". These are jobs that run on a scheduled time with code from master. If they fail for the same reason as your build, it means the tests or the infrastructure are broken. If they do not fail, but yours do, it means it is related to your code.

Reference tests:

KNOWN ISSUES

Sometimes the build can fail when pulling new jar files from download.opensuse.org . This is a known limitation. Given this happens rarely, when it does, all you need to do is rerun the test. Sorry for the inconvenience.

For more tips on troubleshooting, see the troubleshooting guide.

Happy hacking!
⚠️ You should not merge if acceptance tests fail to pass. ⚠️

end

When(/^the server stops mocking an IPMI host$/) do
get_target('server').run('pkill ipmi_sim')
get_target('server').run('pkill fake_ipmi_host.sh || :')
get_target('server').run("ps aux | grep [f]ake_ipmi_host.sh | awk '{print $2}' | xargs kill")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add here verbose: true, so we can keep track of having duplicated processes.

@@ -580,12 +580,12 @@
raise ScriptError, 'File injection failed' unless success
end
server.run('chmod +x /etc/ipmi/fake_ipmi_host.sh', verbose: true, check_errors: true)
server.run('ipmi_sim -n < /dev/null > /dev/null &', verbose: true, check_errors: true)
server.run('nohup ipmi_sim -n > /var/log/ipmi_sim.log 2>&1 &', verbose: true, check_errors: true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe before starting a new one, we can have an assert/pre-condition where we assure there is no other process already running.
If we have it, we might want to skip the start with a warning message, or we might want to just fail the step.

end

When(/^the server stops mocking an IPMI host$/) do
get_target('server').run('pkill ipmi_sim')
get_target('server').run('pkill fake_ipmi_host.sh || :')
get_target('server').run("ps aux | grep [f]ake_ipmi_host.sh | awk '{print $2}' | xargs kill")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would stick with the previous version, there are two issues with it:

  1. Certain implementations of ps might lead to a different output, thus it may require extra handling in the future.
  2. || : is kind of a safety mechanism, if grep won't find this process it may lead to errors.
  3. It involves more moving components: ps, grep, awk, and xargs, kill whereas there you just have pkill.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree in your points @szachovy but @maximenoel8 said that pkill was not killing successfully, and we need to address it somehow.

Copy link
Contributor Author

@maximenoel8 maximenoel8 Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look in uyuni podman server you will see the pkill is not working:

uyuni-master-podman-srv:~ #  ps aux | grep fake_ipmi_host.sh
root     10582  0.0  0.0   4236  3120 ?        S    Oct16   0:03 /bin/bash /etc/ipmi/fake_ipmi_host.sh
root     11787  0.0  0.0   9156  2336 pts/0    S+   09:48   0:00 grep --color=auto fake_ipmi_host.sh
root     13152  0.0  0.0   4236  3104 ?        S    Oct16   0:03 /bin/bash /etc/ipmi/fake_ipmi_host.sh

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--full is missing:

uyuni-master-podman-srv:~ # ps aux | grep fake_ipmi_host.sh
root     10582  0.0  0.0   4236  3120 ?        S    Oct16   0:03 /bin/bash /etc/ipmi/fake_ipmi_host.sh
root     13152  0.0  0.0   4236  3104 ?        S    Oct16   0:03 /bin/bash /etc/ipmi/fake_ipmi_host.sh
root     13807  0.0  0.0   8780  2328 pts/1    S+   10:03   0:00 grep --color=auto fake_ipmi_host.sh
uyuni-master-podman-srv:~ # pkill --full fake_ipmi_host.sh
uyuni-master-podman-srv:~ # ps aux | grep fake_ipmi_host.sh
root     13877  0.0  0.0   8780   816 pts/1    S+   10:03   0:00 grep --color=auto fake_ipmi_host.sh

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, it seems the proper fix for that killing, indeed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants