From 775134b6fef6bc96c2b53b8d9274cc9661fbc167 Mon Sep 17 00:00:00 2001 From: Ulthran Date: Fri, 18 Oct 2024 14:40:52 -0400 Subject: [PATCH 1/3] Add LTP files to .gitignore --- .gitignore | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/.gitignore b/.gitignore index 21c3332..7551ea0 100644 --- a/.gitignore +++ b/.gitignore @@ -1,3 +1,7 @@ +# LTP refs +LTP_*.csv +LTP_*.fasta + # Vsearch databases *.udb From d2fc6974125d127ee392d10f1d1c0406f06bea20 Mon Sep 17 00:00:00 2001 From: Ulthran Date: Fri, 18 Oct 2024 14:57:54 -0400 Subject: [PATCH 2/3] Add usage sections --- README.md | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/README.md b/README.md index 1187299..fd25a7a 100644 --- a/README.md +++ b/README.md @@ -94,6 +94,20 @@ that the output directory will be in the same directory as `my_sequences.fasta`. Please see the output of `unassign --help` for a list of the available options. +### Trim ragged + + + +### Count mismatches + + + +### Percent ID ANI sample + + + +Should there also be a command and section for prepare_strain_data? + ## Contributing We welcome ideas from our users about how to improve this From 287828d39b64dd55f5142a64c806f6a3d1bc0a26 Mon Sep 17 00:00:00 2001 From: Ulthran Date: Fri, 18 Oct 2024 16:02:45 -0400 Subject: [PATCH 3/3] Add trimragged docs --- README.md | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/README.md b/README.md index fd25a7a..fdb58ff 100644 --- a/README.md +++ b/README.md @@ -96,7 +96,24 @@ options. ### Trim ragged +The `trimragged` program takes in a query sequence to search and trim and an input fasta file (or it can read from stdin): +```bash +trimragged AGAGTTTGATCCTGGCTCAG --input_file my_sequences.fasta +``` + +Trimragged is included to extract different regions from the full length 16S rRNA gene. The purpose of this auxiliary software is to account for the full length 16S rRNA sequences where only a part of the primer is present in the sequence. This can be due to low quality at the beginning or at the end of a sequence due to limitations of sequencing platforms. + +The software operates in three steps: 1) Matching the full length of the primer, 2) Matching the partial primer, 3) Aligning reads to other sequences with a known primer location. The sequence of the primer to search and trim is required for the software. Only one primer is accepted at a time, so the user needs to run the software twice with each primer sequence. + +Step 1: The software first searches for the full length of the primer sequence. If mismatches are allowed, then the software expands all possibilities of the primer sequence mutations in a list and searches for each. Once a hit is found, the start and end index is stored as a PrimerMatch object. + +Step 2: If the min_partial argument is greater than 0, the software then searches for partial matches of the primer in the remaining sequences. The software makes a list of all the possibilities of primers, removing nucleotides from the beginning of the sequence till the minimum length specified by min_partial is reached. Then the software searches for each of the possible primer sequences. Once a hit is found, the start and end index is stored as a Primer Match object. + +Step 3: The last part of the software relies on building a database of the sequences with already identified primer sequences from the previous two steps. Then the rest of the reads are aligned against the database of sequences with known primer locations using vsearch. Once a hit is found, and the positions of the primers are estimated by extending the aligned region. + +Please see the output of `trimragged --help` for a list of the available +options. ### Count mismatches