Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to deal with huge fastq files #229

Open
zhangzhen opened this issue Jul 23, 2024 · 1 comment
Open

how to deal with huge fastq files #229

zhangzhen opened this issue Jul 23, 2024 · 1 comment
Labels

Comments

@zhangzhen
Copy link

Nextflow adopts the scatter-gather method to process huge fastq files. First, split one huge fastq file into multiple smaller fastq files, and then submit jobs that process each individual fastq file to the batch system. Last, merge their results from individual processing to form the sample-level result.
What is the pypiperic way to do that?

@vreuter
Copy link
Member

vreuter commented Jul 23, 2024

Hi @zhangzhen , pypiper wasn't really designed to do partitioning and parallelism, but rather to be applied to something that's already partitioned/chunked/etc., either naturally (e.g., biological samples) or artificially (e.g., how you could split the FASTQ arbitrarily). pepkit/looper would be how you'd normally do this sort of thing (submission of a single pypiper pipeline to multiple pieces of data). @donaldcampbelljr or @nsheff may have more recent information, though, as I've not worked in depth on the project in a while.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants