-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Could not resolve Linux kernel version on GKE 1.25.* + GPU Operator version: 23.3.1 #526
Comments
Our tests for cluster-api-provider-azure are also facing a very similar issue running the 5.15.0-1035-azure kernel on Ubuntu 22.04 with GPU operator v23.3.1. The last time the same test passed was about a week ago on 8 May. |
@xcheng85 Kernel headers, modules and modules-extras for GKE kernel versions 5.15.0.1027, 5.15.0.1028 and 5.15.0.1030 have now restored to the archive. There has also been an exception to any archive pruning/deleting made for future gke See https://lists.ubuntu.com/archives/ubuntu-devel/2023-May/042571.html for context |
thanks @philroche for the info I tried to deploy GPU Operator 23.3.1 on GKE cluster (using Ubuntu nodes with containerd) and I get this state:
logs from the driver container:
is it also planned to restore the kernel image package? |
@francisguillier re-publication of linux-image-* now in progress too. See https://launchpad.net/ubuntu/+source/linux-signed-gke/+publishinghistory |
@philroche |
Yes. There are plans to restore azure packages too. I will update here once complete |
@philroche I confirm everything works fine now on GKE (Kernel is 5.15.0-1028-gke)
|
Re-publication of the Azure kernel packages noted here has now started - See https://launchpad.net/ubuntu/+source/linux-azure/+publishinghistory and https://launchpad.net/ubuntu/+source/linux-signed-azure/+publishinghistory |
Closing this as packages were made available again. |
i2c_core
andipmi_msghandler
loaded on the nodes?kubectl describe clusterpolicies --all-namespaces
) YesIssue or feature description
Could not resolve Linux kernel version in daemonset pod.
Steps to reproduce the issue
https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/google-gke.html
Information to attach (optional if deemed irrelevant)
Node kernel version: 5.15.0-1028-gke on Ubuntu 22.04 node.
GPU Operator version: 23.3.1
https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/google-gke.html only has the instruction for Centos.
Thank you very much.
The text was updated successfully, but these errors were encountered: