Skip to content

Latest commit

 

History

History
82 lines (50 loc) · 4.69 KB

README.md

File metadata and controls

82 lines (50 loc) · 4.69 KB

ONT Running Scripts

This is a repository of scripts associated with 'running' / 'analysing data of' the Oxford Nanopore MinION.

I have written a total of six scripts (here to be praised and criticized).

Provided below are details and examples of usage of the scripts.

fast5-transfer-realtime.py

Throughput from the MinION is rapidly increasing. Therefore it is often important to move the data off to a server, prior to doing any subsequent analysis of the data.
This script is to be run from a computer that can see both the server and the laptop that is running the MinION (this can often be the laptop).

The script will continue to search for fast5 files until no more fast5 files are found after 800 seconds. This parameter can be adjusted with the watch option. This will last the entire 48 hours for a high quality run.

Dependencies:

If you're on Windows, I would recommend using Cygwin to run these commands.

Example

fast5-transfer-realtime.py --run_name e_coli_R9
--reads_directory C:/data/reads --server_directory Z:/

Output

A folder is created within the server directory called YYYY_MM_DD_<RUN_NAME> For subsequent scripts this is often referred to as the 'run_directory'. Inside this directory another folder called dump is created. This is where the fast5 files are placed.

Future options

I hope to add ssh and ftp options to this command in the near future.

Nanonet-realtime

Depending on your default modification settings, it may be that the files in the dump directory are not writable. Although it is possible to fiddle and tinker with these settings, it is a safe options to keep these files here as a back-up. Nanonet-realtime copies these files into a folder called 'reads' which is also a sub-directory of the run_directory. Prior to doing so it places these reads into a tmp directory, performs a 1D nanonet analysis on these reads, exported to a fasta file. The fasta file is placed in the fasta folder (a sub-directory of the run_directory or working_directory) with the follwing naming convention <RUN_NAME>_1D_<postix_time>.fa

If you are already in the run_directory when executing this script, the only variable you need is the --run_name.

Dependencies

Nanonet (from ONT)

Examples

nanonet-realtime.py --run_name e_coli_R9 --working_directory /data/2016_09_13_e_coli_R9

Output

Multiple fasta files are created within the fasta directory. Files are moved to the reads directory.

Future options

Yet to think of anything...

Metrichor-cli-wrapper.py

The metrichor-cli can be very intimidating. Prior to running this script I would recommend running metrichor-cli-configuredependencies.sh. If you have run the configure dependencies script in the past you will still need to run the metrichor-cli-setpaths.sh script everytime you log into the server. You may add this script to your .bash_profile, however I cannot then guarantee that it won't interfere with other programs you may run in the future.

The metrichor-cli-wrapper requires two arguments, the run_directory or working_directory and the type of workflow you wish to run (I haven't included all of these but typing in metrichor-cli --list will indicate all the numbers and you can update my dictionary within the script).

A log file will be exported to <run_directory>_log. If you do not specify the reads directory, it will assume the reads exist in <run_directory>_reads.

The reads will be returned into reads/downloads. You cannot change this.

Dependencies

Open up the configure-dependencies script to view what you will need. I had great difficulty installing the hdf5 modules without root permissions and have done my best to replicate what I did to get it to work. You will also need a gcc compiler of 4.9 or greater.

Example

metrichor-cli-wrapper.py --working_directory /data/2016_09_13_e_coli_R9 --workflow 2D_Basecalling

Future options

Diagnosing bugs within the metrichor code or installation scripts.

Onecodex-realtime.py

Onecodex is an online metagenomic profiler using a k-mer based exact alignment tool, (maybe Kraken?).
This script using the Onecodex search tool to obtain tax_ids for a sample. The script takes the fasta files within the fasta directory (that have been generated by nanonet) and uploads them to onecodex. Output is a tab-delimited file read_name <tab> tax_id. Due to stringent alignment required and the inaccuracy of 1D fasta files the alignment rate is still quite poor. As of September 2016, onecodex does not have any limits on using their search tool for research purposes.

Dependencies

Python libraries: Biopython, requests and json.

Example

onecodex-realtime.py --run_name outbreak_sputum --run_directory /2019_09_13_pandemics