Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.15.1 still panics in GetMaintenanceResultFromJob #8562

Open
christophlehmann opened this issue Dec 29, 2024 · 8 comments
Open

v1.15.1 still panics in GetMaintenanceResultFromJob #8562

christophlehmann opened this issue Dec 29, 2024 · 8 comments
Assignees
Labels
Needs info Waiting for information target/1.15.2

Comments

@christophlehmann
Copy link

What steps did you take and what happened:

I upgraded to v1.15.1 and use the kopia uploader. The velero pod restarts due to panic

time="2024-12-29T16:36:43Z" level=info msg="Running maintenance on backup repository" backupRepo=velero/p-project-default-kopia-6bmb6 logSource="pkg/controller/backup_repository_controller.go:309"
time="2024-12-29T16:36:43Z" level=info msg="Start to maintenance repo" BSL name=default logSource="pkg/repository/manager/manager.go:216" repo UID=1183e359-8bb9-4240-af20-ae3a64f09756 repo name=p-project-default-kopia-6bmb6 repo type=kopia
time="2024-12-29T16:36:48Z" level=info msg="Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference" BackupRepository="{\"name\":\"p-project-default-kopia-6bmb6\",\"namespace\":\"velero\"}" controller=backuprepository controllerGroup=velero.io controllerKind=BackupRepository logSource="/go/pkg/mod/github.com/bombsimon/logrusr/v3@v3.0.0/logrusr.go:108" name=p-project-default-kopia-6bmb6 namespace=velero reconcileID="\"48e60053-612f-4651-ba3c-bdf4b53a6b12\""
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x1843d63]

goroutine 1439 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:116 +0x1e5
panic({0x284a160?, 0x478c5a0?})
	/usr/local/go/src/runtime/panic.go:770 +0x132
github.com/vmware-tanzu/velero/pkg/repository.GetMaintenanceResultFromJob({0x3198a80, 0xc00040f200}, 0xc0043e4a08)
	/go/src/github.com/vmware-tanzu/velero/pkg/repository/maintenance.go:120 +0x1a3
github.com/vmware-tanzu/velero/pkg/repository/manager.(*manager).PruneRepo(0xc0004b8000, 0xc00413e000)
	/go/src/github.com/vmware-tanzu/velero/pkg/repository/manager/manager.go:249 +0xb4b
github.com/vmware-tanzu/velero/pkg/controller.(*BackupRepoReconciler).runMaintenanceIfDue(0xc000533110, {0x3182ca0, 0xc00412b6e0}, 0xc00413e000, {0x31a4ae0, 0xc0041093b0})
	/go/src/github.com/vmware-tanzu/velero/pkg/controller/backup_repository_controller.go:315 +0x142
github.com/vmware-tanzu/velero/pkg/controller.(*BackupRepoReconciler).Reconcile(0xc000533110, {0x3182ca0, 0xc00412b6e0}, {{{0xc001cf3e76?, 0x0?}, {0xc0028741b0?, 0xc00412b6e0?}}})
	/go/src/github.com/vmware-tanzu/velero/pkg/controller/backup_repository_controller.go:208 +0x606
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x318aee0?, {0x3182ca0?, 0xc00412b6e0?}, {{{0xc001cf3e76?, 0xb?}, {0xc0028741b0?, 0x0?}}})
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:119 +0xb7
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc00013f0e0, {0x3182cd8, 0xc000957220}, {0x29ad600, 0xc000ba0400})
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:316 +0x3bc
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc00013f0e0, {0x3182cd8, 0xc000957220})
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:266 +0x1be
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:227 +0x79
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 in goroutine 1066
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:223 +0x50c

Anything else you would like to add:

Environment:

  • Velero version (use velero version):
velero version
Client:
	Version: v1.15.0
	Git commit: none
Server:
	Version: v1.15.1
  • Velero features (use velero client config get features):
features: <NOT SET>
  • Kubernetes version (use kubectl version):

v1.22.8

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "I would like to see this bug fixed as soon as possible"
  • 👎 for "There are more important bugs to focus on right now"
@Whil0m
Copy link

Whil0m commented Dec 30, 2024

I got the same panic in v1.14.0

@reasonerjt
Copy link
Contributor

reasonerjt commented Dec 30, 2024

@christophlehmann
Could you check the status of the pods created by the maintenance job?

@Whil0m
what's the k8s version you are using?

@reasonerjt reasonerjt added the Needs info Waiting for information label Dec 30, 2024
@Whil0m
Copy link

Whil0m commented Dec 30, 2024

@christophlehmann Could you check the status of the pods created by the maintenance job?

@Whil0m what's the k8s version you are using?

v1.20.7

@blackpiglet
Copy link
Contributor

There is a similar fix in the main branch. #8271
Please check whether this is related to you scenario.

@Whil0m
Copy link

Whil0m commented Dec 30, 2024

Seems these two are not the same thing.Here is my error message

panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x227dce3]

goroutine 1410 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:116 +0x1e5
panic({0x281c120?, 0x4727010?})
	/usr/local/go/src/runtime/panic.go:770 +0x132
github.com/vmware-tanzu/velero/pkg/repository.getMaintenanceResultFromJob({0x315aea0, 0xc000629950}, 0xc001b9b908)
	/go/src/github.com/vmware-tanzu/velero/pkg/repository/maintenance.go:238 +0x1a3
github.com/vmware-tanzu/velero/pkg/repository.(*manager).PruneRepo(0xc00082f550, 0xc004413500)
	/go/src/github.com/vmware-tanzu/velero/pkg/repository/manager.go:234 +0xacc
github.com/vmware-tanzu/velero/pkg/controller.(*BackupRepoReconciler).runMaintenanceIfDue(0xc000e1b2c0, {0x31450e0, 0xc00425b740}, 0xc004413500, {0x3166de0, 0xc000635a40})
	/go/src/github.com/vmware-tanzu/velero/pkg/controller/backup_repository_controller.go:299 +0x142
github.com/vmware-tanzu/velero/pkg/controller.(*BackupRepoReconciler).Reconcile(0xc000e1b2c0, {0x31450e0, 0xc00425b740}, {{{0xc001a8cde6?, 0x0?}, {0xc001a90540?, 0xc00425b740?}}})
	/go/src/github.com/vmware-tanzu/velero/pkg/controller/backup_repository_controller.go:201 +0x606
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x314d298?, {0x31450e0?, 0xc00425b740?}, {{{0xc001a8cde6?, 0xb?}, {0xc001a90540?, 0x0?}}})
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:119 +0xb7
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc000a3c960, {0x3145118, 0xc0008754a0}, {0x297d680, 0xc0042af5a0})
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:316 +0x3bc
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc000a3c960, {0x3145118, 0xc0008754a0})
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:266 +0x1be
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:227 +0x79
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 in goroutine 1146
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:223 +0x50c

@christophlehmann
Copy link
Author

This is from a good job

status:
  phase: Succeeded
  containerStatuses:
    - name: velero-repo-maintenance-container
      state:
        terminated:
          exitCode: 0
          reason: Completed
          startedAt: '2025-01-01T19:11:52Z'
          finishedAt: '2025-01-01T19:11:53Z'
          containerID: >-
            docker://2fe2b0b865137a518f89418ca69f7569e7a1c8fe61e73eb499517482c81a2985
      lastState: {}
      ready: false
      restartCount: 0
      image: velero/velero:v1.15.1
      imageID: >-
        docker-pullable://velero/velero@sha256:1b9d838c6e3a45b2ca96b85a55f4e8c57961750b4faee29fe3ea37cf5ba87d1d
      containerID: >-
        docker://2fe2b0b865137a518f89418ca69f7569e7a1c8fe61e73eb499517482c81a2985
      started: false

A failed job status looks like

status:
  phase: Failed
  containerStatuses:
    - name: velero-repo-maintenance-container
      state:
        terminated:
          exitCode: 1
          reason: Error
          message: 'An error occurred: <nil>'
          startedAt: '2025-01-01T19:29:39Z'
          finishedAt: '2025-01-01T19:32:36Z'
          containerID: >-
            docker://a66aca167e06ed02384b0bcae6017b773cffbea88c870f03e0cc1bab9c30e804
      lastState: {}
      ready: false
      restartCount: 0
      image: sha256:d5820f3a50c653f4ca36102c9d58882d9066caa2a7b4db8271c366e3b36122c5
      imageID: >-
        docker-pullable://velero/velero@sha256:ea2fda72889fb9ff8d0bca3c35a2ef09126f2bbaa14d39791ae14d889fdce0d6
      containerID: >-
        docker://a66aca167e06ed02384b0bcae6017b773cffbea88c870f03e0cc1bab9c30e804
      started: false

Will test main branch now and report if it helps (velero/velero:main@sha256:ea2fda72889fb9ff8d0bca3c35a2ef09126f2bbaa14d39791ae14d889fdce0d6)

@reasonerjt
Copy link
Contributor

Thanks @christophlehmann

I have a PR to cherry-pick to fix to v1.15.2

Please let us know the result of your verification.

@reasonerjt reasonerjt self-assigned this Jan 2, 2025
@christophlehmann
Copy link
Author

@reasonerjt Thanks! main branch works as expected, no restarts and panics 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs info Waiting for information target/1.15.2
Projects
None yet
Development

No branches or pull requests

4 participants