-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Testsuite: test restart of postgres #6539
Conversation
7e7fdc5
to
697fd02
Compare
This PR is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approving, but please apply your suggested change (remove the Wait)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When Eric's suggestion is done, approved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nitpicks regarding the wording and the PR needs a rebase.
Rebase done. |
The scenario is definitely too fast: Feature: Restart the PostgreSQL service
Scenario: Restart the PostgreSQL database service # features/secondary/srv_restart_postgres.feature:6
This scenario ran at: 2023-10-25 15:45:58 +0200
Given I am authorized for the "Admin" section # features/step_definitions/navigation_steps.rb:425
systemctl restart postgresql.service returned status code = 0.
Output:
''
When I restart the "postgresql.service" service on "server" # features/step_definitions/command_steps.rb:656
And I wait until "postgresql" service is active on "server" # features/step_definitions/command_steps.rb:240
Initializing a twopence node for 'sle_minion'.
Host 'sle_minion' is alive with determined hostname dominik-min-sles and FQDN dominik-min-sles.mgr.suse.de
Node: dominik-min-sles, OS Version: 15-SP4, Family: sles
And I am on the Systems overview page of this "sle_minion" # features/step_definitions/navigation_steps.rb:437
And I follow "States" in the content area # features/step_definitions/navigation_steps.rb:326
And I click on "Apply Highstate" # features/step_definitions/navigation_steps.rb:283
Then I should see a "Applying the highstate has been scheduled." text # features/step_definitions/navigation_steps.rb:613
Timeout after 90 seconds (Timeout.timeout): I still see text 'Apply highstate scheduled by admin'
And I wait until event "Apply highstate scheduled by admin" is completed # features/step_definitions/common_steps.rb:135
execution expired (Timeout::Error)
./features/step_definitions/navigation_steps.rb:75:in `block (2 levels) in <top (required)>'
./features/support/commonlib.rb:94:in `block in repeat_until_timeout'
./features/support/commonlib.rb:83:in `repeat_until_timeout'
./features/step_definitions/navigation_steps.rb:72:in `/^I wait at most (\d+) seconds until I do not see "([^"]*)" text, refreshing the page$/'
./features/step_definitions/common_steps.rb:142:in `/^I wait (\d+) seconds until the event is picked up and (\d+) seconds until the event "([^"]*)" is completed$/'
./features/step_definitions/common_steps.rb:159:in `/^I wait at most (\d+) seconds until event "([^"]*)" is completed$/'
./features/step_definitions/common_steps.rb:136:in `/^I wait until event "([^"]*)" is completed$/'
features/secondary/srv_restart_postgres.feature:14:in `I wait until event "Apply highstate scheduled by admin" is completed'
=> /var/log/rhn/rhn_web_ui.log |
👋 Hello! Thanks for contributing to our project. If you are unsure the failing tests are related to your code, you can check the "reference jobs". These are jobs that run on a scheduled time with code from master. If they fail for the same reason as your build, it means the tests or the infrastructure are broken. If they do not fail, but yours do, it means it is related to your code. Reference tests: KNOWN ISSUES Sometimes the build can fail when pulling new jar files from download.opensuse.org . This is a known limitation. Given this happens rarely, when it does, all you need to do is rerun the test. Sorry for the inconvenience. For more tips on troubleshooting, see the troubleshooting guide. Happy hacking! |
7554195
to
9e03797
Compare
I created a step for checking the taskomatic log as Oscar suggested. Scenario: Restart the PostgreSQL database service
Given I am authorized for the "Admin" section
When I restart the "postgresql.service" service on "server"
And I wait until "postgresql" service is active on "server"
And I am on the Systems overview page of this "sle_minion"
And I follow "States" in the content area
And I click on "Apply Highstate"
Then I should see a "Applying the highstate has been scheduled." text
And I wait until I see "MinionActionExecutor" in file "/var/log/rhn/rhn_taskomatic_daemon.log" on "server" # <------ new
And I wait until event "Apply highstate scheduled by admin" is completed When(/^I wait until I see "([^"]*)" in file "([^"]*)" on "([^"]*)"$/) do |text, file, host|
node = get_target(host)
repeat_until_timeout(message: "Entry #{text} in file #{file} on #{host} not found") do
_output, code = node.run("tail -n 10 #{file} | grep '#{text}' ", check_errors: false, verbose: true)
break if code.zero?
end
end (...)
tail -n 10 /var/log/rhn/rhn_taskomatic_daemon.log | grep 'MinionActionExecutor' returned status code = 1.
Output:
''
And I wait until I see "MinionActionExecutor" in file "/var/log/rhn/rhn_taskomatic_daemon.log" on "server" # features/step_definitions/command_steps.rb:1626
Timeout after 250 seconds (repeat_until_timeout): Entry MinionActionExecutor in file /var/log/rhn/rhn_taskomatic_daemon.log on server not found (RuntimeError)
./features/support/commonlib.rb:100:in `block in repeat_until_timeout'
./features/support/commonlib.rb:83:in `repeat_until_timeout'
./features/step_definitions/command_steps.rb:1628:in `/^I wait until I see "([^"]*)" in file "([^"]*)" on "([^"]*)"$/'
features/secondary/srv_restart_postgres.feature:15:in `I wait until I see "MinionActionExecutor" in file "/var/log/rhn/rhn_taskomatic_daemon.log" on "server"'
And I wait until event "Apply highstate scheduled by admin" is completed # features/step_definitions/common_steps.rb:135
=> /var/log/rhn/rhn_web_ui.log However, when we do not wait a certain amount of time before we execute the high state on the minion, this event will not get picked up, and therefore I do not see what I am looking for in the taskomatic log file. So when applying highstate while taskomatik is not fully reconnected, it does not trigger the high state and leaves it in the pending queue. Any suggestions or comments, @srbarrios? |
If we leave the timeout in, the scenario succeeds: dominik-ctl:~/spacewalk/testsuite # cucumber features/secondary/srv_restart_postgres.feature
(..)
Scenario: Restart the PostgreSQL database service # features/secondary/srv_restart_postgres.feature:6
This scenario ran at: 2023-10-26 15:12:58 +0200
Given I am authorized for the "Admin" section # features/step_definitions/navigation_steps.rb:425
systemctl restart postgresql.service returned status code = 0.
Output:
''
When I restart the "postgresql.service" service on "server" # features/step_definitions/command_steps.rb:656
And I wait until "postgresql" service is active on "server" # features/step_definitions/command_steps.rb:240
And I wait for "60" seconds # features/step_definitions/common_steps.rb:19
Initializing a twopence node for 'sle_minion'.
Host 'sle_minion' is alive with determined hostname dominik-min-sles and FQDN dominik-min-sles.mgr.suse.de
Node: dominik-min-sles, OS Version: 15-SP4, Family: sles
And I am on the Systems overview page of this "sle_minion" # features/step_definitions/navigation_steps.rb:437
And I follow "States" in the content area # features/step_definitions/navigation_steps.rb:326
And I click on "Apply Highstate" # features/step_definitions/navigation_steps.rb:283
Then I should see a "Applying the highstate has been scheduled." text # features/step_definitions/navigation_steps.rb:613
tail -n 10 /var/log/rhn/rhn_taskomatic_daemon.log | grep 'MinionActionExecutor' returned status code = 0.
Output:
'2023-10-26 15:14:14,090 [DefaultQuartzScheduler_Worker-6] INFO com.redhat.rhn.taskomatic.task.MinionActionExecutor - Executing action: 149
'
And I wait until I see "MinionActionExecutor" in file "/var/log/rhn/rhn_taskomatic_daemon.log" on "server" # features/step_definitions/command_steps.rb:1626
And I wait until event "Apply highstate scheduled by admin" is completed # features/step_definitions/common_steps.rb:135
This scenario took: 114 seconds
1 scenario (1 passed)
10 steps (10 passed)
1m54.133s |
Sorry took me a while to jump here.
I tried to identify the text we need to look for by navigating into our product code, looking for tasko, quartz and postgresql words, but I couldn't find it. I would say we even need to add a new log to the Java side in order to properly monitor this reconnection from the database. But... on python code, I found this debug log entry that can help with this monitoring: My guess is that's printed on salt-master logs |
dominik-srv:/ # tail /var/log/salt/master
2023-10-26 14:14:00,525 [salt.loaded.ext.engines.mgr_events:147 ][ERROR ][27280] salt.loaded.ext.engines.mgr_events: SSL connection has been closed unexpectedly
2023-10-26 14:14:00,525 [salt.loaded.ext.engines.mgr_events:151 ][ERROR ][27280] salt.loaded.ext.engines.mgr_events: Error commiting: connection already closed
2023-10-26 14:14:00,526 [salt.loaded.ext.engines.mgr_events:188 ][ERROR ][27280] salt.loaded.ext.engines.mgr_events: Diconnected from database. Trying to reconnect...
2023-10-26 14:14:00,773 [salt.loaded.ext.pillar.suma_minion:150 ][ERROR ][27293] Error on getting database pillar, trying again: ('SSL connection has been closed unexpectedly\n',)
2023-10-26 15:14:14,467 [salt.loaded.ext.engines.mgr_events:147 ][ERROR ][27280] salt.loaded.ext.engines.mgr_events: SSL connection has been closed unexpectedly
2023-10-26 15:14:14,467 [salt.loaded.ext.engines.mgr_events:151 ][ERROR ][27280] salt.loaded.ext.engines.mgr_events: Error commiting: connection already closed
2023-10-26 15:14:14,469 [salt.loaded.ext.engines.mgr_events:188 ][ERROR ][27280] salt.loaded.ext.engines.mgr_events: Diconnected from database. Trying to reconnect...
2023-10-26 15:14:14,717 [salt.loaded.ext.pillar.suma_minion:150 ][ERROR ][27285] Error on getting database pillar, trying again: ('SSL connection has been closed unexpectedly\n',) We concluded in Slack that there is no easy solution for this right now, so I added the initial waiting time from Vladimir back but also introduced a step to check for the triggered high state event in the Taskomatik log file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved. We couldn't find something better, so we will need to have this sleep :(
We need to disable this test for now, according to mc. It did cause too much issues in 4.3 where it already was merged. I will open a bug report and adjust the PR to comment the test out in the |
@raulillo82 This can also be merged at some point to align the branches with 4.3, too. It is not exactly a test suite fix, though. I did rebase the branch again. |
See https://bugzilla.suse.com/show_bug.cgi?id=1218947 Signed-off-by: Dominik Gedon <dominik.gedon@suse.com>
Rebase done. Once the tests are finished, I will merge the PR. |
What does this PR change?
This test restarts postgres service and verifies, that it is handled correctly in java and salt - applying highstate must work.
Links
This should catch the bugs discussed in https://github.com/SUSE/spacewalk/issues/20143
Changelogs
Make sure the changelogs entries you are adding are compliant with https://github.com/uyuni-project/uyuni/wiki/Contributing#changelogs and https://github.com/uyuni-project/uyuni/wiki/Contributing#uyuni-projectuyuni-repository
If you don't need a changelog check, please mark this checkbox:
If you uncheck the checkbox after the PR is created, you will need to re-run
changelog_test
(see below)Re-run a test
If you need to re-run a test, please mark the related checkbox, it will be unchecked automatically once it has re-run: