-
A common question for people starting to use |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 45 replies
-
Usually, the best way to infer ancestral alleles is to consider the allele present in one or (preferably) more outgroup species. This, of course, requires that the sequences in the outgroup(s) can be aligned with your data set. There are several published methods for doing this (see "Software", below). Note that the trivial approach of taking the REF allele as the ancestral state is likely to prove inaccurate in many cases, as is the shortcut of taking the most frequent allelic state (rough simulations show that in typical demographic simulations the most frequent allele is ancestral roughly ~80-90% of the time, for sample sizes of hundreds or thousands) Sometimes a VCF will already have an ancestral allele defined for each site, in the
However, in our experience, the AA field is not always trustworthy: if the VCF is an old one, the ancestral allele it may have been inferred using rather simplistic algorithms. In some cases, other researchers may provide ancestral alleles in a separate file, for example the Ensembl project uses a simple multi-species algorithm to create ancestral allele FASTA files for a limited set of species, including humans. In many cases, however, you may need to infer the ancestral alleles yourself. Even if the ancestral allele states already exist for your species, running the analysis yourself gives you more control over the inference process and you can chose to use more recent approaches. Software for inferring ancestral allelesPeter Keightley's |
Beta Was this translation helpful? Give feedback.
-
We've been looking at different methods to do ancestral allele inference and they all start from an alignment with outer species. What would be the best approach to deal with non-aligned regions that can’t be inferred: leaving them as missing, using major allele as proxy, ... other? |
Beta Was this translation helpful? Give feedback.
-
Ensembl has ancestral sequences inferred by Ortheus for some species https://m.ensembl.org/info/genome/compara/ancestral_sequences.html |
Beta Was this translation helpful? Give feedback.
Usually, the best way to infer ancestral alleles is to consider the allele present in one or (preferably) more outgroup species. This, of course, requires that the sequences in the outgroup(s) can be aligned with your data set. There are several published methods for doing this (see "Software", below). Note that the trivial approach of taking the REF allele as the ancestral state is likely to prove inaccurate in many cases, as is the shortcut of taking the most frequent allelic state (rough simulations show that in typical demographic simulations the most frequent allele is ancestral roughly ~80-90% of the time, for sample sizes of hundreds or thousands)
Sometimes a VCF will already have …