Skip to content
This repository has been archived by the owner on Sep 11, 2024. It is now read-only.

feat: Add support for setting object metadata Content-Encoding #359

Conversation

jclarysse
Copy link
Contributor

Users willing to leverage GCS capability to decompress gzip objects on server-side when accessing them through the Storage API requested the fixed-metadata Content-Encoding (default: null) to become configurable so that its value can be set (ie. to gzip) when the connector uploads a new file to the bucket.
https://cloud.google.com/storage/docs/metadata#content-encoding

@jclarysse jclarysse requested review from a team as code owners April 3, 2024 12:16
@jclarysse
Copy link
Contributor Author

Wasn't able to run my integration test on my local and here it failed. I'll now go back to my local to hopefully fix it.

@jclarysse jclarysse marked this pull request as draft April 4, 2024 07:17
@jclarysse
Copy link
Contributor Author

@jjaakola-aiven shared that the integration test passed on his local.

@jclarysse jclarysse force-pushed the jclarysse/config-gcs-object-encoding branch 2 times, most recently from b3750e2 to db14bad Compare June 3, 2024 07:25
@jclarysse
Copy link
Contributor Author

The expected behaviour is that for compressed blobs with metadata Content-Encoding=gzip, the result of object download should be uncompressed. This can be easily verified using GCP sample code downloadFile.js.

Since GCS connector previously only had tests based on object read, I had to add some boilerplate-code to make reading from download possible.

The new test contentEncodingAwareDownload() passes when using parameters compression=none and content-encoding=none. Unfortunately, it fails to decode required fields when using parameterscompression=gzip and content-encoding=gzip as the bytes do not seem to be uncompressed.

java.lang.IllegalArgumentException: Illegal base64 character 1f

I wonder if this is a limitation of Testcontainer's DatastoreEmulator.

@jclarysse jclarysse force-pushed the jclarysse/config-gcs-object-encoding branch 5 times, most recently from 6784edb to 420c2fa Compare June 3, 2024 13:00
@jclarysse
Copy link
Contributor Author

@jjaakola-aiven Thanks for your help with fixing the test so that both compression and encoding work as expected. I pushed again using your patch. Please review.

@jjaakola-aiven jjaakola-aiven marked this pull request as ready for review June 4, 2024 10:36
Users willing to leverage GCS capability to decompress gzip objects
on server-side when accessing them through the Storage API requested
the fixed-metadata `Content-Encoding` (default: null) to become
configurable so that its value can be set (ie. to `gzip`) when the
connector uploads a new file to the bucket.
https://cloud.google.com/storage/docs/metadata#content-encoding
@jclarysse jclarysse force-pushed the jclarysse/config-gcs-object-encoding branch from 420c2fa to d93f29c Compare June 6, 2024 13:00
@jjaakola-aiven jjaakola-aiven merged commit a303f93 into Aiven-Open:main Jun 6, 2024
4 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants