This repository contains the code for the Genomic MSA Transformer project. The project uses DNA sequences in a multiple sequence alignment transformer. These are trained in an unsupervised fashion. A classifier then uses the embeddings from the transformer to classify operons within genomes of various organisms.
For more details on the project, please refer to our paper: Learning Genome Architecture Using MSA Transformers
To install the dependencies, run:
pip install -r requirements.txt
To train the model, run:
python train.py
To test the model, run:
python test.py
The model achieved an accuracy of 90% on the test set.
I hope this helps! Let me know if you have any other questions.