This repository contains the datasets and scripts used in the following paper:
- Y. Tabatabaee, C. Zhang, T. Warnow, S. Mirarab, Phylogenomic branch length estimation using quartets, Bioinformatics, Volume 39, Issue Supplement_1, June 2023, Pages i185–i193, https://doi.org/10.1093/bioinformatics/btad221
For experiments in this study, we generated a new quartet dataset and regenerated species trees with substitution-unit branch lengths for previously published datasets from Zhang et. al. (2018) and Mai et. al. (2017). We also analyzed the mammalian biological dataset from Song et. al. (2012).
All datasets can be accessed from this Google Drive link.
Quartet simulations
- Raw dataset is available in quartet_simulations.zip and includes species trees, true gene trees and other SimPhy outputs.
- Species trees with SU branch lengths are also available in this repository in simulated-data/quartets.
- Results and intermediate data from the experiments in the paper are available in quartets_results.tar.gz.
30-taxon MVRoot ILS simulations
- Original dataset is from Mai at al. (2017) and available at https://uym2.github.io/MinVar-Rooting/.
- Species trees with SU branch lengths are available in MVRoot_SU.tar.gz and also here in simulated-data/MVroot.
- Results and intermediate data from the experiments in the paper are available in MVRoot_results.tar.xz.
101-taxon ASTRALIII ILS simulations
- Original dataset is from Zhang et. al. (2018) and available at https://gitlab.com/esayyari/ASTRALIII/.
- Species trees with SU branch lengths are available in ASTRALIII_SU.tar.gz and also in simulated-data/ASTRALIII.
- Results and intermediate data from the experiments in the paper are in ASTRALIII_SU_results.tar.xz.
The preprocessed mammalian dataset (in which genes with mismatching names are removed) is available here and includes an estimated ASTRAL species tree, gene trees and alignments. The files generated for our analysis are available in biological-mammalian.