ONT_ARTIC_Summaries

To install:

Clone this repository:

git clone https://github.com/LooseLab/ONT_ARTIC_Summaries

Then:

cd ONT_ARTIC_Summaries
conda env create -f environment.yml

To activate:

conda activate ONT_ARTIC_Summaries

To run:

python simple_stats.py -o output.file -c NOTTS sequencing_summary_file1.txt sequencing_summary_file2.txt```

You can optionally use wildcards to find sequencing summary files:

python simple_stats.py -o output.file -c NOTTS sequencing_summary*.txt

Note: if the output file already exists, data will be appended to it.

Help:

python simple_stats.py -h
usage: simple_stats.py [-h] [-o FILE] [-c STRING] FILE [FILE ...]

positional arguments:
  FILE                  One or many input files. These must be ONT sequencing
                        summary files as called by MinKNOW or Guppy with
                        bardoing options set. Reads demultiplexed with
                        PoreChop will not work. The files can be optionally
                        compressed.

optional arguments:
  -h, --help            show this help message and exit
  -o FILE, --output FILE
                        File name to write out the summary stats.
  -c STRING, --centre STRING
                        Sequencing Center Generating this Data.

Expected output is a TSV file:

sequencing_centre       run_id  experiment_id   sample_id       flowcell_id     run_time        pore_count      barcode_count   passes_filtering        barcode_arrangement     yield   read_count      min_length      max_length      mean_length     median_length   std_length
NOTTS   d00b486adc2ea7266fc2d4549d293d6fb22638f1        CV      CV094_24_M1     FAN43036        17069.68        1345    25      False   barcode05       189458  358     254     926     529.2122905027933       500.5   95.17517191957664
NOTTS   d00b486adc2ea7266fc2d4549d293d6fb22638f1        CV      CV094_24_M1     FAN43036        17069.68        1345    25      False   barcode06       230839  448     225     1555    515.265625      502.0   87.33024300710545
...
NOTTS   d00b486adc2ea7266fc2d4549d293d6fb22638f1        CV      CV094_24_M1     FAN43036        17069.68        1345    25      True    barcode05       42963045        83875   164     1870    512.2270640834575       507.0   57.61899716876888
NOTTS   d00b486adc2ea7266fc2d4549d293d6fb22638f1        CV      CV094_24_M1     FAN43036        17069.68        1345    25      True    barcode06       50445170        98387   187     1666    512.7219043166272       509.0   47.506330979971004
...

The file contains one row per barcode pass/fail dataset. So a run with 96 barcodes will generate 194 rows.

These data will be used to analyse yield and performance of various ARTIC protocols. The script anticipates sequencing summary files from MinKNOW/Guppy running with barcoding enabled. It will skip any files that do not contain barcoded data.

Once you have collated the data from your centre, please either send the data to Matt Loose or upload to the shared google doc.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
README.md		README.md
environment.yml		environment.yml
simple_stats.py		simple_stats.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ONT_ARTIC_Summaries

About

Releases

Packages

Contributors 2

Languages

LooseLab/ONT_ARTIC_Summaries

Folders and files

Latest commit

History

Repository files navigation

ONT_ARTIC_Summaries

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages