- pathogen-cluster-mutations: Add this new top-level command to allow users to create a table of mutations that appear in clusters or other previously defined genetic groups (#36)
- pathogen-embed: Use "simplex" encoding by default for PCA (#35)
- pathogen-cluster: Change default minimum number of samples for a cluster from 5 to 10 (#35)
- pathogen-cluster: Add
--distance-matrix
input argument to support HDBSCAN clustering of genetic distances frompathogen-distance
(#33)
- Do not internally sort embedding inputs by sequence name (#32)
- Let scikit-learn automatically pick SVD algorithm to use for PCA instead of hardcoding the "full" solver (#31)
- admin: Publish to PyPI with GitHub Actions (#30)
- Use inferred types for external embedding parameters (#29)
- Add alternate encodings of nucleotide sequences for PCA in
pathogen-embed
(#23)
- Removed seaborn as a dependency in favor of base matplotlib (#13)
- Set default learning rate for t-SNE to "auto" such that the learning rate scales with the sample size (#12)
- Add support for multiple alignment and/or distance matrix inputs to
pathogen-embed
(#19) - Add optional output from
pathogen-embed
that produces the boxplot figure of Euclidean by genetic distance (#14)
- Display default parameters for subcommands of pathogen-embed (#12)
- Fix t-SNE keyword argument error associated with recent versions of scikit-learn (#6)
- Pass random seed argument from the command line to PCA and MDS implementations (#6)
- Fix MDS stress output
- Add stress value dataframe to MDS arguments to relay fitness of the embedding
- Created separate commands for embedding, clustering, and distance matrix creation (pathogen-embed, pathogen-cluster, pathogen-distance) (#2)
- Migrated source code and documentation from cartography to pathogen-embed
- Only calculate the distance matrix for an alignment if it isn't available already (194fd74)
- Source embedding params from cluster data (8c26898)
- Initialize t-SNE embeddings with PCA instead of a random initialization (89fc458)
- Avoid re-reading alignment for PCA and t-SNE (5fb2bbb)
- Issue on github. The hamming distance calculation now also works on lowercase fasta files as well as uppercase.
- First version of embed-pathogen.