A place to store custom and forked scripts used for genomic analysis- a list slowly growing as things come up.
A reusable script that wraps the steps provided by ALLMAPS to identify and split chimeric contigs.
Sort and index a BAM file, along with removing unmapped reads. Provide the number of threads as the second argument to run multithreaded.
It took me forever to get blasr/sparc installed and running correctly for hybrid genome assemblies, and after finally getting it to work, I vowed to never ever have to deal with it again, so this scipt does the necessary tweaks to get sparc_split_and_run.sh working right, and from your $PATH
. Deprecated since adding PR's to DBG2OLC repo
Simple isolation of contigs below a specified sequence coverage threshold. Typically used for the genome.file
output from dDocent
's FreeBayes
step when FreeBayes
crashes due to memory load because de novo assembly with too many contigs. Output usually fed into faSomeRecords to "prune" the de novo assembly of low-coverage contigs.
Simple wrapper for SAMtools
which counts the total number of reads and number of mapped reads in bam files.
Takes an input file of strings (like 6bp indices) and does and all vs. all match to count the number of mismatches between the indices. Outputs an html heatmap and textfile of the pairwise comparisons.
Iteratively performs the first steps of the Jellyfish Kmer counting method
For those times you forget the command to export (and strip the prefix from) your current conda environment to a yaml file. Use condadeps
to list only the manually (explicitly) installed programs.
A convenience wrapper to perform fastStructure
anaylses for a range of 1
to k
values, then summarize all the marginal likelihoods into a single file.
Parallelized unzipping of .gz files from one directory into another. Can do an entire directory, or only files containing something specific in their name, such as lobster
, _R1_
, britneyspears
, etc.
Returns the reverse, complement, or reverse-complement of DNA bases in a text file.
Converts pacbio sequences from bam to fasta/q. A wrapper for bam2fastx