-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check status of all the core pods for microshift #4009
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
92394d6
to
5907b3b
Compare
pkg/crc/cluster/cluster.go
Outdated
} | ||
|
||
func podRunningForNamespace(ocConfig oc.Config, namespace string) bool { | ||
stdin, stderr, err := ocConfig.WithFailFast().RunOcCommand("get", "pods", "-n", namespace, "--field-selector=status.phase!=Running") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why !=Running
? I would have expected =Running
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cfergeau If I use ==Running
that means it will show all the running pods, what we want the pods which are not in Running phase in that specific namespace so we can reiterate in retry function.
$ kubectl get pod -n kube-system --field-selector=status.phase!=Running
NAME READY STATUS RESTARTS AGE
csi-snapshot-controller-85cc4fd76b-xznzw 1/1 Pending 0 45h
if !podRunningForNamespace(ocConfig, namespace) { | ||
logging.Debugf("Pods in %s namespace are not running", namespace) | ||
return &errors.RetriableError{Err: fmt.Errorf("pods in %s namespace are not running", namespace)} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fwiw, this is a bit wasteful as we'll try again and again the same namespaces even if we already found running pods. Maybe this can be done with a map? map keys are namespaces, iterate over the keys. When there are running pods in the namespace, remove it from the map?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but it has bit of benefit in case some pod goes to reconciliation state (like in one iteration it is running but in second it is in pending state.) It is not full proof solution (with k8s context it is never going to be) but should be good for initial feedback if core pods are running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the OpenShift case, once the oc get co
check succeeds once, we retry 2 more times and we only decide the cluster is good when the oc get co
check succeeds 3 times in a row. If you want to handle " in one iteration it is running but in second it is in pending state" it would be nice to have a consistent approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the OpenShift case, once the
oc get co
check succeeds once, we retry 2 more times and we only decide the cluster is good when theoc get co
check succeeds 3 times in a row. If you want to handle " in one iteration it is running but in second it is in pending state" it would be nice to have a consistent approach.
yes in case of openshift we can iterate over all the clusteroperator at once because those are not namespace specific resource. Here we are not able to have a single call which provide use all the pods status in core namespaces otherwise I would've use same logic. So now we iterate over namespace by namespace and check the pods status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not questioning the way the iterations are done, I was reacting to
it has bit of benefit in case some pod goes to reconciliation state (like in one iteration it is running but in second it is in pending state.)
For an OpenShift cluster, we roughly do iterate over a isClusterReady()
function until it returns true. Once it returns true, we still run it 2 times in case the cluster was ready, but in a transient/conciliation state.
If reconciliation is something you want to try to handle better, I would use the same approach as for OpenShift for consistency, cluster is not ready before isClusterReady()
succeeded 3 times in a row.
In past we observed having kube api access doesn't mean all the required service pods are running and cluster is working as expected. This PR adds a list of core namespace for microshift preset and make sure all the pods in that namespace is running before letting user to know to consume the cluster.
5907b3b
to
217216c
Compare
return errors.Retry(ctx, 2*time.Minute, waitForPods, 2*time.Second) | ||
} | ||
|
||
func podRunningForNamespace(ocConfig oc.Config, namespace string) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
allPodsRunning(ocConfig oc.Config, namespace string) bool
or checkAllPodsRunning
is more descriptive/accurate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be in namespace context so checkAllPodsRunningInNamespace
or allPodsRunningForNamespace
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a namespace
argument, we don't have non-namespace function this could be confused with, so I don't think it's really useful to mention Namespace
in the function name. It's more something for an api doc comment if you think it's important to inform API users that it will only iterate over a single namespace.
if !podRunningForNamespace(ocConfig, namespace) { | ||
logging.Debugf("Pods in %s namespace are not running", namespace) | ||
return &errors.RetriableError{Err: fmt.Errorf("pods in %s namespace are not running", namespace)} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the OpenShift case, once the oc get co
check succeeds once, we retry 2 more times and we only decide the cluster is good when the oc get co
check succeeds 3 times in a row. If you want to handle " in one iteration it is running but in second it is in pending state" it would be nice to have a consistent approach.
This is not microshift specific, as it might also be able to solve issues with the readiness of the OCP and OKD preset. |
What was the reasoning behind this label, as the fixed issue: #3852 is merely an enhancement. |
/hold |
In past we observed having kube api access doesn't mean all the required service pods are running and cluster is working as expected. This PR adds a list of core namespace for microshift preset and make sure all the pods in that namespace is running before letting user to know to consume the cluster.
Fixes: Issue #3852