This section evaluates trimming accuracy regarding different read properties, including adapter presence or absence, base error, and adapter length. To achieve the goal, Atria integrates a benchmarking toolkit for read simulation and trimming analysis.
The details are described in the Atria paper.
- Atria v3.0.0
- AdapterRemoval v2.3.1
- Skewer v0.2.2
- Fastp v0.21.0
- Ktrim v1.2.1
- Atropos v1.1.29
- SeqPurge v2012_12
- Trim Galore v0.6.5
- Trimmomatic v0.39
- Cutadapt v2.8 (#3)
Twenty-one million read pairs were simulated with a uniform read length (100 bp), different error profiles, adapter length, and original insert sizes.
The baseline error profile comprises a 0.1% substitution rate, 0.001% insertion rate, and 0.001% deletion rate, inspired by an Illumina error profile analysis. 1x, 2x, 3x, 4x, and 5x baseline error profile, and 66 to 120 even insert sizes are chosen.
In this way, the reads with the least insert size have full lengths of adapters. The reads with 66-98 original insert sizes contain adapters, and the reads with 100-120 original insert sizes are free from adapter contamination, except for few reads with a 100 bp insert size containing indels. In each condition combination, 30 thousand read pairs were simulated to avoid random errors.
Figure 1 Adapter trimming accuracy on adapter presence and absence, different base errors, and adapter lengths (Interactive plots can be downloaded here)
A1, B1, and C1 are statistics for reads with adapter contamination, while A2, B2, C2 for reads without adapters.
A1 and A2 show the accumulated rates of accurate trim, one bp over trim, one bp under trim, multiple bp over trim, and multiple bp under trim.
B1 and B2 show the trimming accuracy on different error profiles.
C1 and C2 show the trimming accuracy on different adapter lengths.
Ktrim throwed an error when processing simulated fastq files. Its accuracy was benched using other methods in the Atria paper.
Figure 2 Benchmark of adapter-trimming speed for uncompressed and compressed files on different threading options (Interactive plots can be downloaded here)
The simulated paired-end data with a 100 bp read length was trimmed in both uncompressed and compressed format using up to 32 threads. Speed is the ratio of the number of bases to elapsed time (wall time). SeqPurge does not support uncompressed outputs, so it is not shown in the uncompressed benchmark. In the trimming for compressed data, the speed of AdapterRemoval, Skewer, Fastp, Atropos, and Trimmomatic kept constant when the number of threads increased from 4 to 32, so we only benchmark those trimmers using 1, 2, and 4 threads. Ktrim does not support output compressed files, so it is not shown in the compressed benchmark.