Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducible compression #5

Open
karel-brinda opened this issue Oct 29, 2024 · 3 comments
Open

Reproducible compression #5

karel-brinda opened this issue Oct 29, 2024 · 3 comments

Comments

@karel-brinda
Copy link

Hello,

I think GeCo3 is a fantastic work, but currently it's applicability beyond the scope of experimental compression works is currently zero due to the missing reproducibility of decompression. And this is really pity because it's a great method!

Is there a chance that there could be support for a reproducible mode of compression?

Ideally, the program should be able to fail if the decompression possibly incorrect, and probably even fail (unless there's some mechanism like md5 that would guarantee that the result is the same despite a slightly different instruction set).

@karel-brinda
Copy link
Author

For instance, we're currently using it for compressing Masked Superstrings of k-mer sets, where we use it for compressing the actual superstring of input $k$-mers – ie. a perfect use case for GeCo3.

However, it's impossible to share such compressed superstrings with anyone as they wouldn't likely be decompressed correctly, we have to resort to xz2 in the end.

@pratas
Copy link
Member

pratas commented Oct 29, 2024

Hi Karel,

Thank you for your words.
If the purpose is to compress such sequence in FASTA format without reference, then I recommend JARVIS3.
Is this the purpose?

Cheers

@karel-brinda
Copy link
Author

In this use case, we always have a single sequence – superstring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants