With DataMover restored PV is Empty #7189

hofq · 2023-12-07T17:30:08Z

What steps did you take and what happened:

I made a Backup for Testing of an Application using this Schedule:

schedule.yaml

apiVersion: velero.io/v1
kind: Schedule
metadata:
  name: prod-30d-eu-central-1
  namespace: velero
  annotations:
spec: 
  paused: false
  schedule: "0 3 * * *" # 1x pro Tag um 3 Uhr
  useOwnerReferencesInBackup: false
  template:
    ttl: 730h # 30 Tage
    datamover: velero
    snapshotMoveData: true
    volumeSnapshotLocations:
    - csi-gp3
    storageLocation: default

i tried restoring it using the follwoing command:
velero create restore --from-backup <name of backup> --include-namespaces <namespace name>

What did you expect to happen:

After everything finished Successfully, i expected the Data to be in the Volume, that was not the case. The Volume was just Empty

The following information will help us better understand what's going on:

If you are using velero v1.7.0+:
Please use velero debug --backup <backupname> --restore <restorename> to generate the support bundle, and attach to this issue, more options please refer to velero debug --help

bundle-2023-12-07-18-18-49.tar.gz

Anything else you would like to add:

The DataUpload and DataDownload looked Good. The File size was realistic and both showed no errors

Environment:

Velero version (use velero version): v1.12.2 (also Tried v1.12.1 and 1.12.2-rc2)
Velero features (use velero client config get features):
Kubernetes version (use kubectl version): Client: v1.28.4
Kubernetes installer & version: v1.27.7-eks-4f4795d
Cloud provider or hardware configuration: EKS with CSI Plugin
OS (e.g. from /etc/os-release): Amazon Linux 2

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

👍 for "I would like to see this bug fixed as soon as possible"
👎 for "There are more important bugs to focus on right now"

The text was updated successfully, but these errors were encountered:

Lyndon-Li · 2023-12-08T07:47:00Z

Is the volume you are checking for this PVC test-pv-claim?
Could you check below directory in your node(should be in the same node where the pod is running) and share what you can see there?
/var/lib/kubelet/pods/<restored pod's UID>/volumes/kubernetes.io~csi/<restored PVC's UID>/mount

hofq · 2023-12-08T10:11:32Z

/var/lib/kubelet/pods/afe222fb-aad9-4d28-9e79-7be40b5504ff
[root@node afe222fb-aad9-4d28-9e79-7be40b5504ff]# tree ./
./
├── containers
│   └── nginx
│       └── b46faa53
├── etc-hosts
├── plugins
│   └── kubernetes.io~empty-dir
│       └── wrapped_kube-api-access-lwl8s
│           └── ready
└── volumes
    ├── kubernetes.io~csi
    │   └── pvc-ab225afb-ac33-4399-bd8d-22c42fdc4126
    │       ├── mount
    │       │   └── lost+found
    │       └── vol_data.json
    └── kubernetes.io~projected
        └── kube-api-access-lwl8s
            ├── ca.crt -> ..data/ca.crt
            ├── namespace -> ..data/namespace
            └── token -> ..data/token

I cannot see a mount Directory on the Node the Pod runs on, so it does not look like it.

Lyndon-Li · 2023-12-08T10:14:15Z

@hofq Please do the check after the restore and when you see the problem. Don't remove the restored pod.

hofq · 2023-12-08T10:17:05Z

The Pod was not removed, but I will re-trigger the restore.

Lyndon-Li · 2023-12-08T10:20:16Z

If the pod was not removed, you should be able to see /var/lib/kubelet/pods/<restored pod's UID>/volumes/kubernetes.io~csi, if so, please share the dir tree you see from there.

hofq · 2023-12-08T10:26:02Z

./
└── pvc-8630c694-199e-4a53-aa13-e10530b8cdf9
    ├── mount
    │   └── lost+found
    └── vol_data.json

This is the Tree directly after the restore

Lyndon-Li · 2023-12-11T02:09:35Z

What files are in the volume? Did you create them manually or are they owned by any application?

Lyndon-Li · 2023-12-11T02:11:42Z

I see you were backing up the default namespace, can you move the workload to a dedicate namespace and test the backup/restore.
This is not a general problem, so let's first make the case simple, and see what happened in your env step by step.

hofq · 2023-12-11T08:41:24Z

Okay, so i will backup the namespace prod-backup-test using my schedule "prod-30d-eu-central-1", which backs up the Namespace in question and a few others. After that i will restore the Namespace prod-backup-test into prod-backup-test2.

prod-backup-test only carries the pod with the volume and pvc. prod-backup-test2 is non-existant before the restore.

here are the commands i've ran:

velero create backup --from-schedule prod-30d-eu-central-1
velero create restore --from-backup prod-30d-eu-central-1-20231207135032 --include-namespaces prod-backup-test --namespace-mappings prod-backup-test:prod-backup-test2

After Checking the Volume it was empty again.

The Schedule in Question is this one:

apiVersion: velero.io/v1
kind: Schedule
metadata:
  annotations:
  labels:
    app: prod-backup-schedules
    app.kubernetes.io/instance: prod-backup-schedules
    type: backup
  name: prod-30d-eu-central-1
  namespace: velero
spec:
  paused: false
  schedule: 0 3 * * *
  template:
    datamover: velero
    includedNamespaces:
    - kube-system
    - monitoring
    - grafana
    - logging
    - prod-backup-test
    snapshotMoveData: true
    storageLocation: default
    ttl: 730h
    volumeSnapshotLocations:
    - csi-gp3
  useOwnerReferencesInBackup: false
~

Lyndon-Li · 2023-12-11T08:54:57Z

@hofq Not sure this work to you ---- please find me in the velero-user slack channel, we can have a live session to troubleshoot the problem.

Lyndon-Li · 2023-12-12T05:03:43Z

@hofq
I have made a fix, please help to verify it through velero/velero:main image from Velero main branch.
Note: if you have used velero/velero:main previously, remember to set the imagePullPolicy to Always for Velero server and node-agent pods.

hofq · 2023-12-12T09:57:31Z

on the main Tag i have the issue that the restore won't even do anything:

velero create restore --from-backup test10 --namespace-mappings prod-backup-test:prod-backup-test10 --wait
Restore request "test10-20231212095232" submitted successfully.
Waiting for restore to complete. You may safely press ctrl-c to stop waiting - your restore will continue in the background.
..........................................................................................................................................................................................................................................................................................................................................................................................................................................................W1212 10:02:12.237005    9459 reflector.go:347] k8s.io/client-go@v0.25.6/tools/cache/reflector.go:169: watch of *v1.Restore ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
.....................I1212 10:02:33.200924    9459 trace.go:205] Trace[561098295]: "Reflector ListAndWatch" name:k8s.io/client-go@v0.25.6/tools/cache/reflector.go:169 (12-Dec-2023 10:02:13.578) (total time: 19621ms):
Trace[561098295]: ---"Objects listed" error:<nil> 19621ms (10:02:33.200)
Trace[561098295]: [19.621754666s] [19.621754666s] END
..................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

The Process of Restoration just runs forever without anything happening in the cluster. Even the Namespace was not created.

Lyndon-Li · 2023-12-12T10:00:37Z

Did you see the restore CR created and being processed?

hofq · 2023-12-12T10:14:22Z

It is there, but no DataDownload was initiated

Lyndon-Li · 2023-12-12T10:16:29Z

Please help to collect Velero log bundle, will further troubleshoot.

hofq · 2023-12-12T10:17:58Z

bundle-2023-12-12-11-17-32.tar.gz

Lyndon-Li · 2023-12-12T10:31:55Z

Probably, I need to make the fix into 1.12 branch. The CRDs have changed in 1.13 (main image) so you client doesn't match them.
Alternatively, you can download Velero's main branch code and use make local command to compile a client. Not sure if you can do this, I will change 1.12 branch anyway, but it takes sometime.

hofq · 2023-12-12T10:37:46Z

Thank you

Lyndon-Li · 2023-12-13T02:24:35Z

@hofq
Fix in 1.12 is ready, please use this image velero/velero:release-1.12-dev for Velero server pod and node-agent pods.

Note: if you have used velero/velero:release-1.12-dev previously, remember to set the imagePullPolicy to Always for Velero server and node-agent pods.

hofq · 2023-12-13T08:22:19Z

I can confirm successful restore!

Thanks for your Help

Lyndon-Li · 2023-12-14T04:02:05Z

This problem is similar to #7027, which will happen in all EKS envs using AWS IAM roles for service accounts.
Under the configuration, a volume called aws-iam-token are inserted as the first location for each pod, including the backup/restore pods created by Velero exposer for data movement. However, Velero always assume the first volume as the backup/restore volume, as a result, Velero data mover reads/writes data to the wrong volume.

danfengliu assigned Lyndon-Li Dec 11, 2023

danfengliu added area/datamover Needs investigation labels Dec 11, 2023

Lyndon-Li mentioned this issue Dec 12, 2023

Issue 7189: generic restore - don't assume the first volume as the restore volume #7201

Merged

Lyndon-Li mentioned this issue Dec 12, 2023

[1.12] Issue 7189: generic restore - don't assume the first volume as the restore volume #7203

Merged

Lyndon-Li added this to the v1.13 milestone Dec 13, 2023

Lyndon-Li added the target/1.12.3 label Dec 13, 2023

Lyndon-Li closed this as completed Dec 13, 2023

Lyndon-Li mentioned this issue Dec 14, 2023

Can't perform a restore volume from s3 using kopia to ebs.csi.aws.com provisioner #7178

Closed

danfengliu added E2E Tests End to end test Need E2E Test Case and removed E2E Tests End to end test labels Dec 14, 2023

danfengliu mentioned this issue Jan 3, 2024

Add AWS credentials E2E test of IRSA #7258

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

With DataMover restored PV is Empty #7189

With DataMover restored PV is Empty #7189

hofq commented Dec 7, 2023

Lyndon-Li commented Dec 8, 2023 •

edited

Loading

hofq commented Dec 8, 2023

Lyndon-Li commented Dec 8, 2023

hofq commented Dec 8, 2023

Lyndon-Li commented Dec 8, 2023

hofq commented Dec 8, 2023

Lyndon-Li commented Dec 11, 2023

Lyndon-Li commented Dec 11, 2023

hofq commented Dec 11, 2023 •

edited

Loading

Lyndon-Li commented Dec 11, 2023

Lyndon-Li commented Dec 12, 2023

hofq commented Dec 12, 2023 •

edited

Loading

Lyndon-Li commented Dec 12, 2023

hofq commented Dec 12, 2023

Lyndon-Li commented Dec 12, 2023

hofq commented Dec 12, 2023

Lyndon-Li commented Dec 12, 2023

hofq commented Dec 12, 2023

Lyndon-Li commented Dec 13, 2023

hofq commented Dec 13, 2023

Lyndon-Li commented Dec 14, 2023 •

edited

Loading

With DataMover restored PV is Empty #7189

With DataMover restored PV is Empty #7189

Comments

hofq commented Dec 7, 2023

Lyndon-Li commented Dec 8, 2023 • edited Loading

hofq commented Dec 8, 2023

Lyndon-Li commented Dec 8, 2023

hofq commented Dec 8, 2023

Lyndon-Li commented Dec 8, 2023

hofq commented Dec 8, 2023

Lyndon-Li commented Dec 11, 2023

Lyndon-Li commented Dec 11, 2023

hofq commented Dec 11, 2023 • edited Loading

Lyndon-Li commented Dec 11, 2023

Lyndon-Li commented Dec 12, 2023

hofq commented Dec 12, 2023 • edited Loading

Lyndon-Li commented Dec 12, 2023

hofq commented Dec 12, 2023

Lyndon-Li commented Dec 12, 2023

hofq commented Dec 12, 2023

Lyndon-Li commented Dec 12, 2023

hofq commented Dec 12, 2023

Lyndon-Li commented Dec 13, 2023

hofq commented Dec 13, 2023

Lyndon-Li commented Dec 14, 2023 • edited Loading

Lyndon-Li commented Dec 8, 2023 •

edited

Loading

hofq commented Dec 11, 2023 •

edited

Loading

hofq commented Dec 12, 2023 •

edited

Loading

Lyndon-Li commented Dec 14, 2023 •

edited

Loading