pirates

Sequences read by a processor have a number of main components in the following order:

16 letter non-random ID
+- 250 letters of the genetic sequence itself Additionally, we have quality information generated by the sequence for each character of each component. We call all this information together a read.

Our algorithm processes these reads (UIDs, genetic sequence, quality information) to remove errors generated by the sequencer. Errors can occur in any component of the sequence information. We start by matching the IDs of these sequences to form groups/clusters with the same sequence ID. If two reads have the same ID we form a consensus using the sequence itself. We generate the consensus by comparing our new sequence to our reference sequence and taking the higher quality character from either sequence. We are then left with consensus reads created from the summary of many other reads and singleton reads. We then compare the singletons to the consensus groups we have created using a similar methodology as above.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
consensus		consensus
label-comparison		label-comparison
label-reading		label-reading
README.md		README.md
TEAM.md		TEAM.md
license.md		license.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pirates

About

Releases

Packages

Contributors 3

Languages

License

HealthHackAu2016/pirates

Folders and files

Latest commit

History

Repository files navigation

pirates

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages