A Github action to annotate problematic sequences from given Genbank files.
dna-annotate is a Github Action that receives a path for an input directory, a regex pattern that should be used to filter genbank files or another interetsing file name pattern, and a directory where the output will be written. This action will use all this information to annotate problematic parts of a given sequence.
Currently, dna-annotate attempts to find and annotate:
- Repetitions greater than 10 base pairs in length
- Hairpins
- Homopolymers
- Most common restriction binding sites
If you have some feature that you think will make this action better, please feel free to create an issue.
Every argument is required.
Option | Description | Default |
---|---|---|
input-dir | Directory where all the input genbank files will be read | input |
input-pattern | Regex to filter files in the input directory | .*\.\(gb|gbk\) |
output-dir | Directory where all the output genbank files will be written | output |
This parameter is the path of the directory for your genbank files to read and annotate. You can use this parameter to setup different pipelines for different folders, so your project can be divided in folders with different processes. By default the action will use input
as the input directory.
Default: input
This parameter is a regex pattern using re2 syntax to filter files from within input-dir. So even inside a given input directory, you can select a specific file or group of files for the current job. By default the action will match files with genbank extensions (.gb
or .gbk
).
Example: Match only BBF10k-prefixed files, freegene 10k gene project parts.
Default: .*\.\(gb\|gbk\)
This parameter is the path of the directory for outputting annotated sequences as genbank files. By default the action will use output
as the output directory.
Default: output
Basic:
- name: dna-annotate
uses: Open-Science-Global/dna-annotate@v0.6.1
See action.yml for a comprehensive list of all the options.
See Friendzymes Cookbook for further examples and sample data.