Skip to content

altemose/chm13_hsat

Repository files navigation

Assembly_HSat2and3_v2.pl

by Nicolas Altemose 2021

  • purpose: to annotate HSat2 and HSat3 arrays in a set of input DNA sequences
  • inputs: a fasta file containing DNA sequences, and two provided text files containing HSat2 and HSat3 specific 24-mers
  • output: a BED9 file listing all contiguous regions in the reference likely to be HSat2 or HSat3, along with their strand orientation
  • note: merges all adjacent regions within 5 kb (same strand, same type)
  • HSat2_kmers.txt and HSat3_kmers.txt must be in the same directory in which this script is executed
  • the kmers in these input files were defined using HSat2/3 HuRef reads identified in Altemose et al. PLoS Comp Bio 2014
  • usage: perl Assembly_HSat2and3_v2.pl /path/to/reference.fasta
  • runtime and memory usage: annotates the full chm13 assembly in about 2 minutes on a macbook pro, with a memory footprint around ~1 GB

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages