Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix broken MD5 hash cache. #1313

Closed
wants to merge 2 commits into from
Closed

Conversation

Melraidin
Copy link
Contributor

This change fixes issues causing no matches to occur for existing objects at S3 in some cases. This issue caused all versions of a package in a repo to be uploaded again to S3.

Lookups in the cache of file MD5s were failing as the key in the map had its prefix stripped while the key used for the lookup did not have the prefix stripped. This change stops stripping the prefix when building the cache.

Retrieving object MD5s from object metadata (as opposed to the etag that comes with the object list) were failing as the key in the metadata map is now "md5" (all lowercase) vs. "Md5" that was used pre-v2 AWS SDK.

Checklist

  • unit-test added (if change is algorithm)
  • functional test added/updated (if change is functional)
  • man page updated (if applicable)
  • bash completion updated (if applicable)
  • documentation updated
  • author name in AUTHORS

@neolynx neolynx self-assigned this Jul 14, 2024
@neolynx neolynx added the fix tests Tests are failing label Jul 15, 2024
@neolynx
Copy link
Member

neolynx commented Jul 15, 2024

I wonder why the prefix was added on one side only, and what it's purpose is. If the prefix somehow makes sense, maybe we should add it in the missing part ?

The tests are now complaining about the missing prefix, so they should be fixed (no more prefix) if we decide to drop the prefix...

@neolynx
Copy link
Member

neolynx commented Aug 13, 2024

the following system tests fail:

  • t06_publish: S3Publish2Test t06_publish.s3: publish to S3: publish update removed some packages
  • t06_publish: S3Publish3Test t06_publish.s3: publish to S3: publish switch - removed some packages
  • t06_publish: S3Publish5Test t06_publish.s3: publish to S3: publish drop - component cleanup
  • t06_publish: S3Publish6Test t06_publish.s3: publish to S3: publish update removed some packages with SSE AES256

in order to test, create a aws.creds file in the git root (don't add it to git):

export AWS_ACCESS_KEY_ID=123452
export AWS_SECRET_ACCESS_KEY=asdfghrew

and run: make docker-system-tests
(after creating the dev docker with make docker-build-aptly-dev)

@neolynx
Copy link
Member

neolynx commented Aug 13, 2024

the following unit tests fail:

  • FAIL: public_test.go:130: PublishedStorageSuite.TestFilelist
  • FAIL: public_test.go:153: PublishedStorageSuite.TestFilelistPlusWorkaround
  • FAIL: public_test.go:218: PublishedStorageSuite.TestRemoveDirs
  • FAIL: public_test.go:234: PublishedStorageSuite.TestRemoveDirsPlusWorkaround

run unit tests locally with: make docker-unit-tests

@neolynx
Copy link
Member

neolynx commented Sep 26, 2024

I rebased the PR on master...

could you maybe help fixing the tests ?

thanks in advance !

Kevin Martin added 2 commits October 1, 2024 14:37
This change fixes issues causing no matches to occur for existing
objects at S3 in some cases. This issue caused all versions of a
package in a repo to be uploaded again to S3.

Lookups in the cache of file MD5s were failing as the key in the map
had its prefix stripped while the key used for the lookup did not have
the prefix stripped. This change stops stripping the prefix when
building the cache.

Retrieving object MD5s from object metadata (as opposed to the etag
that comes with the object list) were failing as the key in the
metadata map is now "md5" (all lowercase) vs. "Md5" that was used
pre-v2 AWS SDK.
@neolynx
Copy link
Member

neolynx commented Oct 2, 2024

I was looking at the failing S3Publish2Test. Not removing the prefix when adding the path in internalFilelist results in files not removed from S3. So there must be another place where the a prefix is removed, and a comparison later fails.

the md5 field seems to be only used for encrypted S3, where I could not test it...

if you are still interested to contribute and get your issue fixed, it would be great if you could provide a failing example of what you try to fix...

@neolynx
Copy link
Member

neolynx commented Oct 2, 2024

looking into the Md5 in metadata, this is not a AWS field, as it seems, but our own data, which is uploaded 20 lines below that, so I think this is correct...

@neolynx
Copy link
Member

neolynx commented Oct 2, 2024

closing, too many tests failing, not clear what needs to be fixed.

please reopen with more details...

@neolynx neolynx closed this Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants