-
Notifications
You must be signed in to change notification settings - Fork 4
Deletions file
When looking for the alignment on blast it is possible that the length of the matching was shorter that allele schema. This is because some nucleotide are not present on the sample and when blast look for the matching find some deletions in the sample.
The information about this scenario is pop out in a file called deletions.tsv.
Inside of this file will have the information when a deletion is occurred in the sample. It is a tabulate separated file with this heading.
Core Gene | Sample Name | Deletion item | Allele | Contig | Bitscore |
---|
Query length | Contig length | New sequence length | Mismatch | gaps |
---|
Contig start | Contig end | New sequence |
---|
Core Gene is the name of the gene in the Schema.
Sample Name is the name of the sample file.
Deletion item contains the information about the impact of this deletion. When deletion occurs it will impact on the translated to protein. The protein length from the sample can be shorted that the one in the schema (it will be named as ASM) or on the contrary longer (ALM).
Could be possible that the same protein could generated by other sample, then to keep track on this effect they are named with the name of the core gene plus a sequential number. The information of this field will be like this:
ASM_DELETE_lmo0078_0.
The ASM will indicate that protein is shorter that in the schema. It has been a deletion in the gene from the sample and lmo0078_0 shows that the core gene name is lmo0078 and the "0" means that it is the first time that the protein has been identified for this gene. If other sample file will contain (on this core gene) the same protein codification it will have the same number "0". On the contrary if the translate protein is different it will have "1" to display that they are not the same.
Allele. Indicates the allele number in the core gene schema that blast was identified as better match.
Contig. It is the contig name in the sample
Bitscore. It is bitscore provided by blast when looking for the matching
Query lenght. It is the allele core gene length.
Contig length. It is the length that matches in the sample.
New sequence length. It is the new length that