how to deal with huge fastq files #229

zhangzhen · 2024-07-23T00:40:08Z

Nextflow adopts the scatter-gather method to process huge fastq files. First, split one huge fastq file into multiple smaller fastq files, and then submit jobs that process each individual fastq file to the batch system. Last, merge their results from individual processing to form the sample-level result.
What is the pypiperic way to do that?

vreuter · 2024-07-23T00:51:15Z

Hi @zhangzhen , pypiper wasn't really designed to do partitioning and parallelism, but rather to be applied to something that's already partitioned/chunked/etc., either naturally (e.g., biological samples) or artificially (e.g., how you could split the FASTQ arbitrarily). pepkit/looper would be how you'd normally do this sort of thing (submission of a single pypiper pipeline to multiple pieces of data). @donaldcampbelljr or @nsheff may have more recent information, though, as I've not worked in depth on the project in a while.

vreuter added the question label Jul 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to deal with huge fastq files #229

how to deal with huge fastq files #229

zhangzhen commented Jul 23, 2024

vreuter commented Jul 23, 2024

how to deal with huge fastq files #229

how to deal with huge fastq files #229

Comments

zhangzhen commented Jul 23, 2024

vreuter commented Jul 23, 2024