Samtools Customview Fails with Unavoidable Segmentation Faults #199

ieres-amgen · 2023-05-31T18:29:53Z

Description of the bug

The "SAMTOOLS_CUSTOMVIEW" process fails on some files due to a segmentation fault, with a core dump, suggesting a memory issue. On my most recent run I allocated 768 GB RAM with 96 CPUs to the task, and it still failed. The input file is fairly large but nothing absurd for a FASTQ, ~15 GB. On slack Harshil suggested that the command may crash due to a lot of piping and offered some ideas to resolve:

Splitting up that command to not use as many pipes
Picard CollectMultipleMetrics to get the fragment length distribution
Converting from BED -> BAM and then scraping, or using a simple pysam parsing script.

Thanks!

Command used and terminal output

No response

Relevant files

No response

System information

No response

chris-cheshire · 2023-06-15T15:25:11Z

Hi @ieres-amgen, how technically minded are you? Are you able to test/implement any of these as a feature?

ieres-amgen · 2023-06-21T21:20:03Z

Hi @chris-cheshire, I am not particularly fluent with nextflow, but I did try splitting the pipe up to ascertain the cause of the break, and found that the samtools view bit completes successfully, whereas a core dump/seg fault is encountered that breaks the pipe when passed to sort -T '.'. I will try splitting this command up further (to essentially use no piping) and see if that can resolve. If it still doesn't, I think I would feel more confident trying to implement an nf module of Picard CollectMultipleMetrics to get what's needed here, but would certainly be willing to try the third option if you think it'd be easier to implement or can provide some guidance.

ieres-amgen · 2023-06-30T21:28:52Z

I found that I was able to resolve by replacing the current process for collecting fragment lengths with the below implementation of Picard CollectInsertSizeMetrics. I note a custom process label for more memory, the setting of avail_mem, and the use of a ${prefix}.tmp file and tail -n+12 on that file to ensure the formatting of this module's ${prefix}.txt file will be correct for the FRAG_LEN_HIST downstream task (otherwise calc_frag_hist.py will encounter a ParserError):

process SAMTOOLS_CUSTOMVIEW {
    tag "$meta.id"
    label 'process_samtoolscustomview'

    conda (params.enable_conda ? "bioconda::picard=2.27.4" : null)
    container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
        'https://depot.galaxyproject.org/singularity/picard:2.27.4--hdfd78af_0' :
        'quay.io/biocontainers/picard:2.27.4--hdfd78af_0' }"

    input:
    tuple val(meta), path(bam), path(bai)

    output:
    tuple val(meta), path("*.txt"), emit: tsv
    tuple val(meta), path("*.pdf"), emit: histogram
    path  "versions.yml"          , emit: versions

    when:
    task.ext.when == null || task.ext.when

    script:
    def prefix   = task.ext.suffix ? "${meta.id}${task.ext.suffix}" : "${meta.id}"
	
	def avail_mem = 500
    """
	picard \\
        -Xmx${avail_mem}g \\
        CollectInsertSizeMetrics \\
        --INPUT $bam \\
        --OUTPUT ${prefix}.tmp \\
        --Histogram_FILE ${prefix}.pdf
		
	tail -n+12 ${prefix}.tmp > ${prefix}.txt
		
    cat <<-END_VERSIONS > versions.yml
    "${task.process}":
        picard: \$(echo \$(picard CollectInsertSizeMetrics --version 2>&1) | grep -o 'Version:.*' | cut -f2- -d:)
    END_VERSIONS
    """
	
	
	stub:
    def prefix = task.ext.prefix ?: "${meta.id}"
    """
    touch ${prefix}.pdf
    touch ${prefix}.txt
    cat <<-END_VERSIONS > versions.yml
    "${task.process}":
        picard: \$(echo \$(picard CollectInsertSizeMetrics --version 2>&1) | grep -o 'Version:.*' | cut -f2- -d:)
    END_VERSIONS
    """
}

I am not sure if there is an option in Picard to exclude the various header outputs in lines 1-11 of the OUTPUT file, or if a newer version of the software may provide a different format. But this resolved my issues. Thanks!

-Ittai

chris-cheshire · 2023-07-04T10:47:03Z

thank you for your investigation! I will try and roll this in as a feature for the next release. It won't be 3.2 though I am afraid, as this is already about to ship.

ieres-amgen added the bug Something isn't working label May 31, 2023

chris-cheshire added the enhancement Improvement for existing functionality label Jun 15, 2023

ieres-amgen closed this as completed Jun 30, 2023

chris-cheshire reopened this Jul 4, 2023

chris-cheshire self-assigned this Jul 4, 2023

chris-cheshire added the WIP Work in progress label Jul 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Samtools Customview Fails with Unavoidable Segmentation Faults #199

Samtools Customview Fails with Unavoidable Segmentation Faults #199

ieres-amgen commented May 31, 2023

chris-cheshire commented Jun 15, 2023

ieres-amgen commented Jun 21, 2023

ieres-amgen commented Jun 30, 2023 •

edited

Loading

chris-cheshire commented Jul 4, 2023

Samtools Customview Fails with Unavoidable Segmentation Faults #199

Samtools Customview Fails with Unavoidable Segmentation Faults #199

Comments

ieres-amgen commented May 31, 2023

Description of the bug

Command used and terminal output

Relevant files

System information

chris-cheshire commented Jun 15, 2023

ieres-amgen commented Jun 21, 2023

ieres-amgen commented Jun 30, 2023 • edited Loading

chris-cheshire commented Jul 4, 2023

ieres-amgen commented Jun 30, 2023 •

edited

Loading