Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gzip: invalid header - persistence.decode function #8515

Open
amayacitta opened this issue Dec 16, 2024 · 5 comments
Open

gzip: invalid header - persistence.decode function #8515

amayacitta opened this issue Dec 16, 2024 · 5 comments
Assignees
Labels

Comments

@amayacitta
Copy link

What steps did you take and what happened:

Hi guys, we have done a clean install of 1.15.0 with the following, back end storage is Cohesity. There is no publicUrl we're aware of with the Cohesity implementation.

velero install
--provider aws
--plugins velero/velero-plugin-for-aws:v1.11.0
--bucket Tanzu-Dev-Test-TKG-Backups
--backup-location-config region=cohesity,s3ForcePathStyle="true",s3Url=https://s3.maskedexample.com:3000
--secret-file velero/credentials-velero
--use-volume-snapshots=false
--namespace infra-velero
--cacert velero/subcacert.crt

Logs show success for backup location, so does velero backup-location get.

S3 bucket contains the files which we can see with aws client or S3 browser client. Download is not possible via velero client it seems. We are aware of s3ForcePathStyle=true but have this set. We are also aware of publicUrl but this doesnt exist for the Cohesity S3 setup.

time="2024-12-16T21:13:53Z" level=info msg="BackupStorageLocations is valid, marking as available" backup-storage-location=infra-velero/default controller=backup-storage-location logSource="pkg/controller/backup_storage_location_controller.go:127"

We then run a simple backup:
velero backup create test-backup --include-namespaces=developer-ext-amayacitta-gill --exclude-cluster-scoped-resources=true --snapshot-volumes=false

What did you expect to happen:
Backup completes, however we get this debug log thrown. A larger sample is pasted below, debug file also attached.

time="2024-12-16T21:13:53Z" level=error msg="Error getting backup item operations" backup=infra-velero/test-backup controller=backup-finalizer error="gzip: invalid header" error.file="/go/src/github.com/vmware-tanzu/velero/pkg/persistence/object_store.go:416" error.function=github.com/vmware-tanzu/velero/pkg/persistence.decode logSource="pkg/controller/backup_finalizer_controller.go:155"


time="2024-12-16T21:13:53Z" level=error msg="Reconciler error" Backup="{"name":"test-backup","namespace":"infra-velero"}" controller=backup controllerGroup=velero.io controllerKind=Backup error="gzip: invalid header" error.file="/go/src/github.com/vmware-tanzu/velero/pkg/persistence/object_store.go:416" error.function=github.com/vmware-tanzu/velero/pkg/persistence.decode logSource="/go/pkg/mod/github.com/bombsimon/logrusr/v3@v3.0.0/logrusr.go:123" name=test-backup namespace=infra-velero reconcileID=""183f5531-8938-477e-85e7-3cb58485ddd7""
time="2024-12-16T21:13:53Z" level=debug msg="Getting Backup" backup=infra-velero/test-backup controller=backup-finalizer logSource="pkg/controller/backup_finalizer_controller.go:95"
time="2024-12-16T21:13:53Z" level=debug msg="looking for plugin in registry" backup=infra-velero/test-backup controller=backup-finalizer kind=ObjectStore logSource="pkg/plugin/clientmgmt/manager.go:141" name=velero.io/aws
time="2024-12-16T21:13:53Z" level=debug msg="creating new restartable plugin process" backup=infra-velero/test-backup command=/plugins/velero-plugin-for-aws controller=backup-finalizer kind=ObjectStore logSource="pkg/plugin/clientmgmt/manager.go:156" name=velero.io/aws
time="2024-12-16T21:13:53Z" level=debug msg="starting plugin" args="[/plugins/velero-plugin-for-aws --features= --uploader-type=kopia --log-level debug]" backup=infra-velero/test-backup cmd=/plugins/velero-plugin-for-aws controller=backup-finalizer logSource="pkg/plugin/clientmgmt/process/logrus_adapter.go:75" path=/plugins/velero-plugin-for-aws
time="2024-12-16T21:13:53Z" level=debug msg="plugin started" backup=infra-velero/test-backup cmd=/plugins/velero-plugin-for-aws controller=backup-finalizer logSource="pkg/plugin/clientmgmt/process/logrus_adapter.go:75" path=/plugins/velero-plugin-for-aws pid=52
time="2024-12-16T21:13:53Z" level=debug msg="waiting for RPC address" backup=infra-velero/test-backup cmd=/plugins/velero-plugin-for-aws controller=backup-finalizer logSource="pkg/plugin/clientmgmt/process/logrus_adapter.go:75" plugin=/plugins/velero-plugin-for-aws
time="2024-12-16T21:13:53Z" level=debug msg="Setting log level to DEBUG" backup=infra-velero/test-backup cmd=/plugins/velero-plugin-for-aws controller=backup-finalizer logSource="/go/pkg/mod/github.com/vmware-tanzu/velero@v1.15.0/pkg/plugin/framework/server.go:269" pluginName=velero-plugin-for-aws
time="2024-12-16T21:13:53Z" level=debug msg="plugin address" address=/tmp/plugin948232077 backup=infra-velero/test-backup cmd=/plugins/velero-plugin-for-aws controller=backup-finalizer logSource="pkg/plugin/clientmgmt/process/logrus_adapter.go:75" network=unix pluginName=velero-plugin-for-aws
time="2024-12-16T21:13:53Z" level=debug msg="using plugin" backup=infra-velero/test-backup cmd=/plugins/velero-plugin-for-aws controller=backup-finalizer logSource="pkg/plugin/clientmgmt/process/logrus_adapter.go:75" version=2
time="2024-12-16T21:13:53Z" level=debug msg="waiting for stdio data" backup=infra-velero/test-backup cmd=/plugins/velero-plugin-for-aws controller=backup-finalizer logSource="pkg/plugin/clientmgmt/process/logrus_adapter.go:75" pluginName=stdio
time="2024-12-16T21:13:53Z" level=debug msg="Checking if object exists" backup=infra-velero/test-backup bucket=Tanzu-Dev-Test-TKG-Backups cmd=/plugins/velero-plugin-for-aws controller=backup-finalizer key=backups/test-backup/test-backup-itemoperations.json.gz logSource="/go/src/velero-plugin-for-aws/velero-plugin-for-aws/object_store.go:303" pluginName=velero-plugin-for-aws
time="2024-12-16T21:13:53Z" level=debug msg="Object exists" backup=infra-velero/test-backup bucket=Tanzu-Dev-Test-TKG-Backups cmd=/plugins/velero-plugin-for-aws controller=backup-finalizer key=backups/test-backup/test-backup-itemoperations.json.gz logSource="/go/src/velero-plugin-for-aws/velero-plugin-for-aws/object_store.go:319" pluginName=velero-plugin-for-aws
time="2024-12-16T21:13:53Z" level=error msg="Error getting backup item operations" backup=infra-velero/test-backup controller=backup-finalizer error="gzip: invalid header" error.file="/go/src/github.com/vmware-tanzu/velero/pkg/persistence/object_store.go:416" error.function=github.com/vmware-tanzu/velero/pkg/persistence.decode logSource="pkg/controller/backup_finalizer_controller.go:155"

The following information will help us better understand what's going on:

bundle-2024-12-16-21-36-35.tar.gz

Environment:

  • Cohesity 7.1.1
  • Velero 1.15.0
  • No additional features
  • Kubernetes: v1.29.4+vmware.3-fips.1
  • Tanzu TKGS cluster on latest vSphere 8
  • VMware Photon OS/Linux
@reasonerjt
Copy link
Contributor

reasonerjt commented Dec 18, 2024

@amayacitta

It looks like you're a VMware(by Broadcom) customer, I suggest you contact GSS support of VMware(by Broadcom) to have official support.

As for the specific problem you are seeing, I feel it MAY be compatibility issues of Cohesity or some network issue that the .gz file is not served correctly from Cohesity to velero via S3 API. Could you test velero with other storage service like MinIO or AWS S3?

Have you used velero on the same storage before? If the answer is "yes", please provide the versions of velero and the plugin. velero-plugin-for-aws v1.10.x and later releases use S3 SDK v2 and we have observed some storage services are seeing compatibility issue b/c their "S3-Compatible" APIs are acting differently from AWS S3.

More details see: https://github.com/vmware-tanzu/velero-plugin-for-aws

@reasonerjt reasonerjt added the Needs info Waiting for information label Dec 18, 2024
@reasonerjt reasonerjt self-assigned this Dec 18, 2024
@amayacitta
Copy link
Author

amayacitta commented Dec 18, 2024

Hi @reasonerjt - We are indeed a commercial customer and are in process of logging the ticket. We're also logging one with Cohesity to see if there is anything they can do. I'll keep this issue updated so there is visibility of the fix.

For example we also use Cohesity for Grafana Loki and that works with the following connection details.

  s3: s3://xxxx:xxxx@cohesity/Tanzu-Dev-Test-TKG-Loki-Chunk
  s3forcepathstyle: false
  endpoint: https://s3.maskedexample.com:3000/

S3 on Cohesity also works generally for everything else we've thrown at it, it's only Velero that has an issue - is there anything that can be done the the below go code to make it work?

error.file="/go/src/github.com/vmware-tanzu/velero/pkg/persistence/object_store.go:416" error.function=github.com/vmware-tanzu/velero/pkg/persistence.decode logSource="pkg/controller/backup_finalizer_controller.go:155"

I can use AWS cli to download the files from cohesity, as well as a vanilla S3 browser client. Its seems to be an issue unique to the Velero client. This is the first time we're trying Velero on Cohesity.

I'll try again with s3ForcePathStyle="false" as we do with Loki.

velero install
--provider aws
--plugins velero/velero-plugin-for-aws:v1.11.0
--bucket Tanzu-Dev-Test-TKG-Backups
--backup-location-config region=cohesity,s3ForcePathStyle="false",s3Url=https://s3.maskedexample.com:3000/
--secret-file velero/credentials-velero
--use-volume-snapshots=false
--namespace infra-velero
--cacert velero/subcacert.crt

@amayacitta
Copy link
Author

amayacitta commented Dec 19, 2024

Having re-tried with cohesity,s3ForcePathStyle="false" the debug logs still throw

time="2024-12-19T13:20:17Z" level=error msg="Reconciler error" Backup="{"name":"test-backup-3","namespace":"infra-velero"}" controller=backup controllerGroup=velero.io controllerKind=Backup error="gzip: invalid header" error.file="/go/src/github.com/vmware-tanzu/velero/pkg/persistence/object_store.go:416" error.function=github.com/vmware-tanzu/velero/pkg/persistence.decode logSource="/go/pkg/mod/github.com/bombsimon/logrusr/v3@v3.0.0/logrusr.go:123" name=test-backup-3 namespace=infra-velero reconcileID=""ed5215f7-2350-42c2-aaca-d51481ecbe0a""

However backup describe says the below.

  Backup Volumes:
    <error getting backup volume info: request failed: <?xml version="1.0" encoding="UTF-8"?>
  <Error>
  <Code>InvalidArgument</Code>
  <Message>Future time.</Message>
  </Error>

We are waiting for a response from the commercial ticket. Cohesity came back and see no collelated error on their end, however i've asked for debug to be enabled so we can tail the logs live as we do a backup. I'll report back.

Has anyone seen the above describe error before? again looks like it can't read the Cohesity S3, however it puts all files in the bucket no problem. Its just the GET that fails.

@reasonerjt
Copy link
Contributor

reasonerjt commented Dec 20, 2024

@amayacitta

It MAY be due to compatibility issues between Cohesity S3 and AWS S3 SDK. Therefore, if cohesity doesn't print any error in the log it doesn't mean there's no compatibility issue.

Loki maybe using other SDK or doesn't involve gzip decoding so you didn't see the error.

The code in velero plugin to fetch the .gz file from storage is here:
https://github.com/vmware-tanzu/velero-plugin-for-aws/blob/93949ba42c19976313dfb832251fca1c7d00d2f2/velero-plugin-for-aws/object_store.go#L330

and when velero tries to read and decode the content the error occurs

gzr, err := gzip.NewReader(jsongzReader)

You may write some code on your side to try to get the same object and try to decode it. Or ask Cohesity developer to verify.

Could you test velero with other storage services like MinIO or AWS S3?

If possible, please also try AWS S3 or Minio and narrow down the problem.

@amayacitta
Copy link
Author

@reasonerjt

Thanks for the updates, we will escalate with Cohesity to investigate a little deeper. Also Broadcom will look to see if there is anything they can do their end.

I'll keep you updated. Have a good Christmas as I doubt much will happen before ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants