-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix random test failures in github #939
Conversation
We see many failures when starting podman based environment on github. Add verbose logging to get more info about the failures. Signed-off-by: Nir Soffer <nsoffer@redhat.com>
We used 64 bit random number, which should be enough, but we see random failures in github with this error: time="2023-06-22T15:04:59Z" level=warning msg="Error validating CNI config file /etc/cni/net.d/test-77836bdde14be1d8-cluster.conflist: [plugin bridge does not support config version \"1.0.0\" plugin portmap does not support config version \"1.0.0\" plugin firewall does not support config version \"1.0.0\" plugin tuning does not support config version \"1.0.0\"]" Error: volume with name test-77836bdde14be1d8-cluster already exists: volume already exists This is probably not the reason for "volume already exists", but lets make sure it cannot be the problem by using a 128 bit random prefix. Signed-off-by: Nir Soffer <nsoffer@redhat.com>
To report a bug in minikube we need to use the --alsologtostderr option. Enable this option when using debug mode. With this option minikube logs huge amount of debug log to stderr, but we will see the logs only if the command fails in the commands.Error traceback. Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Log the minikube commands in debug level to make it eaiser to understand failures on remote machines and verify changes in minikube commands. Example of the new logs: $ drenv start envs/test.yaml -v 2023-06-22 19:25:07,827 INFO [test] Starting environment 2023-06-22 19:25:07,920 INFO [cluster] Starting minikube cluster 2023-06-22 19:25:07,920 DEBUG [cluster] Running ['minikube', 'start', '--profile', 'cluster', '--driver', 'podman', '--container-runtime', 'cri-o', '--disk-size', '20g', '--nodes', '1', '--cni', 'auto', '--cpus', '2', '--memory', '2g', '--alsologtostderr'] ... $ drenv delete envs/test.yaml -v 2023-06-22 19:26:05,932 INFO [test] Deleting environment 2023-06-22 19:26:05,933 INFO [cluster] Deleting cluster 2023-06-22 19:26:05,933 DEBUG [cluster] Running ['minikube', 'delete', '--profile', 'cluster'] ... Signed-off-by: Nir Soffer <nsoffer@redhat.com>
97a3c38
to
d7ff7cb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this change is good but I don't think a clash of the random number was responsible for the error message. Very likely that the test created the volume twice.
@@ -37,21 +37,23 @@ environment. | |||
[Install clusteradm CLI tool](https://open-cluster-management.io/getting-started/installation/start-the-control-plane/#install-clusteradm-cli-tool) | |||
for the details. Version 0.5.0 or later is required. | |||
|
|||
1. Install `podman` | |||
1. Install `docker` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a pretty big change and is claiming incompatibility with podman. I suggest that you update the commit message with example issues that you have seen with podman. If possible, also link to one or more runs on github actions where the failures have occurred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
poadmn is considered experimental in minikube:
https://minikube.sigs.k8s.io/docs/drivers/podman/
We used it because it was a way to get rid of docker, and it seems to work, but
since we added it was removed from all environments. We kept it only in the test
env used for running the tests locally and in github. Now that it does not work
in github there is no reason to use it.
We can consider using podman again when it is officially supported and works
for us in github.
If you look in the latest github actions in this week you will find that all of
them failed in drenv tests in the same way:
- https://github.com/RamenDR/ramen/actions/runs/5348065862
- https://github.com/RamenDR/ramen/actions/runs/5347722247
- https://github.com/RamenDR/ramen/actions/runs/5347705377
- https://github.com/RamenDR/ramen/actions/runs/5347594713
- https://github.com/RamenDR/ramen/actions/runs/5347589593
- More: https://github.com/RamenDR/ramen/actions?query=is%3Afailure
@nirs If you are confident about the changes, then we don't need any change in the code; but please provide references to the failures such that podman can be enabled for drenv tests later by verifying that the issues are resolved. |
The podman driver is flakeky locally and consistently failing in github actions now. Switching to docker fix the issues in github. Developers need to install docker to run drenv test locally. I tried to use the podman-docker pacakge but it does not emulate docker good enough for minikube. Signed-off-by: Nir Soffer <nsoffer@redhat.com>
All the changes are useful regardless of replacing pomand with docker for the drenv
Provided in the other comment. |
No chance that the volume was created by the test, the only place calling drenv Even if we assume that pytest is bugy and this is called more than once, starting I think this is an issue with the podman driver, maybe only on Ubuntu since locally The most important thing now is to unbreak our CI - once the CI is green again we have I filed a minikube bug for this issue, hopefully they will have some ideas how to |
Recently we see random failures in drenv tests when starting the vm. Try to debug
and avoid the failures.