Skip to content

HealthHackAu2016/pirates

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pirates

Sequences read by a processor have a number of main components in the following order:

  • 16 letter non-random ID
  • +- 250 letters of the genetic sequence itself Additionally, we have quality information generated by the sequence for each character of each component. We call all this information together a read.

Our algorithm processes these reads (UIDs, genetic sequence, quality information) to remove errors generated by the sequencer. Errors can occur in any component of the sequence information. We start by matching the IDs of these sequences to form groups/clusters with the same sequence ID. If two reads have the same ID we form a consensus using the sequence itself. We generate the consensus by comparing our new sequence to our reference sequence and taking the higher quality character from either sequence. We are then left with consensus reads created from the summary of many other reads and singleton reads. We then compare the singletons to the consensus groups we have created using a similar methodology as above.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages