Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debugging warnings in pss workflow #2866

Merged
merged 79 commits into from
Sep 27, 2024

Conversation

biswajit-9776
Copy link
Contributor

Pull Request Template for Kubeflow manifests Issues

✏️ A brief description of the changes

I pushed a PR just to view workflows here in CICD

📦 List any dependencies that are required for this change

My PR depends on #

🐛 If this PR is related to an issue, please put the link to the issue here.

The following issues are related, because ...

✅ Contributor checklist


You can join the CNCF Slack and access our meetings at the Kubeflow Community website. Our channel on the CNCF Slack is here #kubeflow-platform.

Signed-off-by: biswajit-9776 <biswajitpatt139@gmail.com>
Signed-off-by: biswajit-9776 <biswajitpatt139@gmail.com>
@google-oss-prow google-oss-prow bot added size/M and removed size/XS labels Sep 4, 2024
Signed-off-by: biswajit-9776 <biswajitpatt139@gmail.com>
Signed-off-by: biswajit-9776 <biswajitpatt139@gmail.com>
Signed-off-by: biswajit-9776 <biswajitpatt139@gmail.com>
Signed-off-by: biswajit-9776 <biswajitpatt139@gmail.com>
@juliusvonkohout
Copy link
Member

juliusvonkohout commented Sep 4, 2024

Loogs good, now you just need to create patches to fix

Run ./tests/gh-actions/enable_restricted_PSS.sh
...
Patching the PSS-restricted labels for namespace oauth2-proxy...
Warning: existing pods in namespace "oauth2-proxy" violate the new PodSecurity enforce level "restricted:latest"
Warning: oauth2-proxy-5bcc78bdc9-7f24j: allowPrivilegeEscalation != false, unrestricted capabilities, runAsNonRoot != true, seccompProfile
namespace/oauth2-proxy patched
Patching the PSS-restricted labels for namespace kubeflow...
Warning: existing pods in namespace "kubeflow" violate the new PodSecurity enforce level "restricted:latest"
namespace/kubeflow patched
Warning: cache-server-5dbfbbdc95-bk6rl (and 12 other pods): unrestricted capabilities, runAsNonRoot != true, runAsUser=0, seccompProfile
Warning: cache-server-64cbf5d9f6-6h8nh (and 9 other pods): allowPrivilegeEscalation != false, unrestricted capabilities, runAsNonRoot != true, runAsUser=0, seccompProfile
Warning: kubeflow-pipelines-profile-controller-7b7b8f44f7-ktpxd (and 1 other pod): allowPrivilegeEscalation != false, unrestricted capabilities, runAsNonRoot != true, seccompProfile
Warning: metacontroller-0: seccompProfile

@biswajit-9776
Copy link
Contributor Author

- name: Apply patches to clear warnings
      run: |
        DIRECTORY="contrib/security/PSS/patches"
        for file in "$DIRECTORY"/*.yaml; do
          echo "Patching file: $file"
          KIND=$(kubectl get -f "$file" -o jsonpath='{.kind}')
          NAME=$(kubectl get -f "$file" -o jsonpath='{.metadata.name}')
          NAMESPACE=$(kubectl get -f "$file" -o jsonpath='{.metadata.namespace}')
          # Apply the patch
          kubectl get "$KIND" "$NAME" -n "$NAMESPACE" &> /dev/null
          if [ $? -eq 0 ]; then
            kubectl patch "$KIND" "$NAME" -n "$NAMESPACE" --patch-file "$file"
          fi
        done

This job in our workflow was added to fix the warnings. Locally it does so but I don't see why it seems to not work here.
Let me restart some of the deployments to see if that works.

@juliusvonkohout
Copy link
Member

I also see for example

which should not be the case. It seems to patch the replicas.

Signed-off-by: biswajit-9776 <biswajitpatt139@gmail.com>
@juliusvonkohout
Copy link
Member

juliusvonkohout commented Sep 4, 2024

maybe you need to wait for all deployments to restart (pods) after the patches and become ready. But they should restart automatically.

@biswajit-9776
Copy link
Contributor Author

biswajit-9776 commented Sep 4, 2024

I also see for example

which should not be the case. It seems to patch the replicas.

Ohh, this line isn't needed so I'll remove it

Signed-off-by: biswajit-9776 <biswajitpatt139@gmail.com>
Signed-off-by: biswajit-9776 <biswajitpatt139@gmail.com>
Signed-off-by: biswajit-9776 <biswajitpatt139@gmail.com>
Signed-off-by: biswajit-9776 <biswajitpatt139@gmail.com>
Signed-off-by: biswajit-9776 <biswajitpatt139@gmail.com>
Signed-off-by: biswajit-9776 <biswajitpatt139@gmail.com>
Signed-off-by: biswajit-9776 <biswajitpatt139@gmail.com>
@biswajit-9776
Copy link
Contributor Author

biswajit-9776 commented Sep 4, 2024

@juliusvonkohout the kubeflow pods fail to restart successfully, even after 10 minutes. Any suggestions?

@juliusvonkohout
Copy link
Member

maybe some of them are failing due to the security context. maybe they are in a crashloopbackoff

@juliusvonkohout
Copy link
Member

Maybe you should add the readiness wait to your patch loop and test one by one this way

Signed-off-by: biswajit-9776 <biswajitpatt139@gmail.com>
Signed-off-by: biswajit-9776 <biswajitpatt139@gmail.com>
@biswajit-9776
Copy link
Contributor Author

@juliusvonkohout Hey, how much wait time should I set for each pod? Apparently 180s isn't enough.

Signed-off-by: biswajit-9776 <biswajitpatt139@gmail.com>
Signed-off-by: biswajit-9776 <biswajitpatt139@gmail.com>
@biswajit-9776
Copy link
Contributor Author

biswajit-9776 commented Sep 15, 2024

Even though the istio-cni uses init containers with proper securityContext attributes, it misses the seccompProfile attribute, that may lead to the following warnings as of now:

Run ./tests/gh-actions/enable_restricted_PSS.sh
Patching the PSS-restricted labels for namespace istio-system...
namespace/istio-system patched
Patching the PSS-restricted labels for namespace auth...
namespace/auth patched
Patching the PSS-restricted labels for namespace cert-manager...
namespace/cert-manager patched
Patching the PSS-restricted labels for namespace oauth2-proxy...
namespace/oauth2-proxy patched
Patching the PSS-restricted labels for namespace kubeflow...
Warning: existing pods in namespace "kubeflow" violate the new PodSecurity enforce level "restricted:latest"
Warning: cache-server-6c656d7988-v[4](https://github.com/kubeflow/manifests/actions/runs/10868317592/job/30158117073?pr=2866#step:10:5)nzx (and 12 other pods): seccompProfile
Warning: metadata-grpc-deployment-8496ffb98b-wxc6p (and 1 other pod): allowPrivilegeEscalation != false, unrestricted capabilities, runAsNonRoot != true, seccompProfile
namespace/kubeflow patched

The warnings for baseline label are gone as of now.

Signed-off-by: biswajit-9776 <biswajitpatt139@gmail.com>
@juliusvonkohout
Copy link
Member

In the Istio initcontainer configmap or so in the Istio-system namespace you might be able to set the seccomp profile.

@juliusvonkohout
Copy link
Member

You can also try to set the securitycontext on the pod instead of container level.

@biswajit-9776
Copy link
Contributor Author

The patches targeting deployments with containers, would that be unnecessary? As just the initContainer is failing the seccompProfile...

@biswajit-9776
Copy link
Contributor Author

Hey @juliusvonkohout, are there any patches to templates of configmap in the source code that I could use as a reference? A simple addition of seccompProfile to the sidecar template of the configmap replaces all other data with my patched attribute that I wouldn't want.

Signed-off-by: biswajit-9776 <biswajitpatt139@gmail.com>
Signed-off-by: biswajit-9776 <biswajitpatt139@gmail.com>
@juliusvonkohout
Copy link
Member

The patches targeting deployments with containers, would that be unnecessary? As just the initContainer is failing the seccompProfile...

You have to test that.

@juliusvonkohout
Copy link
Member

juliusvonkohout commented Sep 15, 2024

Hey @juliusvonkohout, are there any patches to templates of configmap in the source code that I could use as a reference? A simple addition of seccompProfile to the sidecar template of the configmap replaces all other data with my patched attribute that I wouldn't want.

I would have to check that. In the worst case you have to patch it with sed or so.

Signed-off-by: biswajit-9776 <biswajitpatt139@gmail.com>
Signed-off-by: biswajit-9776 <biswajitpatt139@gmail.com>
Signed-off-by: biswajit-9776 <biswajitpatt139@gmail.com>
@biswajit-9776
Copy link
Contributor Author

biswajit-9776 commented Sep 16, 2024

Okay, so I have used an ugly sed command for the time being @juliusvonkohout. It does solve the istio-containers failing the labels.
There's only the metadata-grpc-deployment and I believe the ml-pipeline that are currently failing the warnings even after the patches which I also see locally, as they go to CrashLoopBackOff and their previously healthy non-terminated pods generate the warnings.

Signed-off-by: biswajit-9776 <biswajitpatt139@gmail.com>
Signed-off-by: biswajit-9776 <biswajitpatt139@gmail.com>
Signed-off-by: biswajit-9776 <biswajitpatt139@gmail.com>
Signed-off-by: biswajit-9776 <biswajitpatt139@gmail.com>
capabilities:
drop:
- ALL
initContainers:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove all the istio initcontainer patches from all
deployment patches. They are injected at runtime and changed by the general istio configuration you modified with sed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initContainers have been removed in the commit e40311f

fi
done
sleep 10
# - name: Create dynamic user namespace and check for PSS labels present
Copy link
Member

@juliusvonkohout juliusvonkohout Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the debugging parts, such that we can merge soon. Let's just finish these three comments and merge. This PR is getting too big and we should continue in new ones.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debugging parts along with patch for dynamic namespaces have also been removed in the commit e40311f. I believe that can be done in a new PR to keep things simple.


- name: Configure istio init container with seccompProfile attribute
run: |
kubectl get cm istio-sidecar-injector -n istio-system -o yaml > temporary_patch.yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good :-) according to the workflow results it seems to work, but please verify it manually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I did run it manually and opened up the configmap to observe the changes in place as we want. Also, I have removed the comments which I can later add as commits for dynamic namespace.

Copy link
Contributor Author

@biswajit-9776 biswajit-9776 Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ml-pipeline pod in my local system does seem to fail with CrashLoopBackOff as I open it up. Maybe I need to increase my fs.inotify.max_user_{instances, watches} .

Signed-off-by: biswajit-9776 <biswajitpatt139@gmail.com>
Copy link
Member

@lampajr lampajr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@biswajit-9776 tests look much better now, the only warning I see at the moment is:

Warning: ml-pipeline-bf9f88745-qzmm9: allowPrivilegeEscalation != false, unrestricted capabilities, runAsNonRoot != true, seccompProfile

@biswajit-9776
Copy link
Contributor Author

@biswajit-9776 tests look much better now, the only warning I see at the moment is:

Warning: ml-pipeline-bf9f88745-qzmm9: allowPrivilegeEscalation != false, unrestricted capabilities, runAsNonRoot != true, seccompProfile

Yes @lampajr, it also looks the same in my local system. Upon opening the pod logs, it doesn't fail the PSS warnings though. It's failing to start up again.

@juliusvonkohout
Copy link
Member

/lgtm
/approve

lets continue in follow up PRs.

Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: juliusvonkohout

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit e3b2aa4 into kubeflow:master Sep 27, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants