Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hal2maf into vcf #285

Open
tzaquin opened this issue Oct 23, 2023 · 3 comments
Open

hal2maf into vcf #285

tzaquin opened this issue Oct 23, 2023 · 3 comments

Comments

@tzaquin
Copy link

tzaquin commented Oct 23, 2023

Hello,
I was using cactus-progressive to create a pangenome from multiple species. Now that I want to analyze the variations between samples, I find myself a bit stuck.
I used the cactus-hal2maf using a reference to an ancestor. Next, I tried to get vcf file using maf2vcf. Yet I'm getting the following error:

ERROR: Couldn't find a header line (must start with Hugo_Symbol, Chromosome, or Tumor_Sample_Barcode): cactus.maf

and this is the maf file:

##maf version=1 scoring=N/A

a
s       Anc3.Anc3refChr0        0       81      +       2129    G----------ctaaccctaaccct--aaccctaaccctaaccc-taaccccaaaccctaac-cctaccccaaaccctaacctaaaccctaaccc
s       Thunnus_orientalis.scaffold1    0       82      +       34771766        G----------ctaaccctaaccct--aaccctaaccctaaccc-taacacctaaccctaacgcctacccaaaaccctaacctaaaccctaaccc
s       Thunnus_thynnus.scaffold1       0       94      +       34640894        TACCCTAAACCCaaaccctaaccctcaaaccctaaccctaacccctaaccccaaaccctaac-cctacccctacccctaacccaaaccctaaccc

a
s       Anc3.Anc3refChr0        81      408     +       2129    taaccctaa-ccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaccccaaacccta-ccctaaccctaaccctaacccgaacccaac---cctaaccc-taaccctaacccgaaccctaacccta-ccctaacccaaaccctacccctaaccctaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccc-taaccctaa-ccctaaccctaaccctacccctaaccctaaccc-taaccctaaccctaaccctaaccc------------tatccctaaccctaaccctaaccc--aaccctaacccaaccctaaccctaaccctaaccctaaccctaaccctaaccctca---------ccctaaccctaaccctaaccctaaccctaaccctaaccc
s       Thunnus_orientalis.scaffold1    82      409     +       34771766        taaccctaa-cccgaaccctaacccgaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaacccgaacccta-cccgaaccctaaccctaacccgaacccaac---cctaaccc-taaccctaacccgaaccctaaccctaaccctaaccctaaccctaaccctaacccgaacctaaccctaaccctaaccctaaccctaaccctaaccctaaccc-taaccctaa-ccctaaccctaaccctaaccctaaccctaaccc-taaccctaaccctaaccctaaccc------------tatccctaaccctaaccctaccct--aaccctaacctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaa---------ccctaaccctaaccctaaccctaaccctaaccctaaccc

I would much appreciate any help. Maybe you could recommend a different method to getting the vcf file?
Thank you

@glennhickey
Copy link
Collaborator

The MAF format output by cactus-hal2maf is described here, where there is no mention of Hugo_Symbol etc. I guess you will need to consult with the authors or documentation of maf2vcf and add in the appropriate header yourself.

If you want a VCF directly from Cactus, you need to use the pangenome pipeline, but that only works for samples from the same or very closely related species (I'd also argue the VCF format itself is mostly suited for this type of data)

@tzaquin
Copy link
Author

tzaquin commented Oct 24, 2023

Thank you for your response.
When you are saying "closely related species", what might be a distance cut off? For example, I'm working on species from the same genus which separated around 3MYA.

Thank you

@glennhickey
Copy link
Collaborator

I'd say 3MYA is definitely worth trying out with the pangenome pipeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants