Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

masking sites in target genome after alignment #303

Open
JeffWeinell opened this issue Jun 4, 2024 · 2 comments
Open

masking sites in target genome after alignment #303

JeffWeinell opened this issue Jun 4, 2024 · 2 comments

Comments

@JeffWeinell
Copy link

I have an alignment of 58 snake genomes stored as a HAL file and generated using Progressive Cactus. For each genome in the alignment, I have a BED file specifying site positions in the ungapped genome that I want to be hard-masked (with Ns) in an updated alignment.

The example below illustrates what I am trying to do.

Input files that I have:

(1) An alignment (portrayed here as an alignment block with dummy data for simplicity).

genome1.seqABC  CATAATT----CACCACTCGCACCAGGACGAAAAACGTATTCTTgctgacgcgtttcttatt
genome2.seqXYZ  cataattcaTCCACCACTCGCAccagGACGAAAAACGT------gctgacgcgtttcttatt

(2) BED file (dummy data) with regions of ungapped genome2 that I want to be hard-masked in the updated alignment.

seqXYZ	0	9
seqXYZ	22	26

Desired updated alignment

After hard-masking the target genome sites in the BED file, the updated alignment includes unmasked, soft-masked, and hard-masked sites:

genome1.seqABC  CATAATT----CACCACTCGCACCAGGACGAAAAACGTATTCTTgctgacgcgtttcttatt
genome2.seqXYZ  NNNNNNNNNTCCACCACTCGCANNNNGACGAAAAACGT------gctgacgcgtttcttatt

I would greatly appreciate any help with how to solve this problem!

-Jeff

@glennhickey
Copy link
Collaborator

I don't think HAL has any tools that allow you to modify the sequences. Your best bet is probably to export to MAF then do the masking with your own script. The taffy python API can parse MAF files and may be helpful for this.

@JeffWeinell
Copy link
Author

Thanks!

I have the alignment also in a MAF file (converted using cactus-hal2maf), but I ran into the same problem (no obvious tool for the job) as when starting with the HAL file. The programs taffy, maf_parse (implemented in PHAST), and MafFilter seemed promising, but as far I can tell they won't do what I need either.

I can't be the only person that has needed to do this. If I come across a solution elsewhere, I'll share it here.

Thanks again,
-Jeff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants