Code accompanying the preprint: RNA genome conservation and secondary structure in SARS-CoV-2 and SARS-related viruses https://doi.org/10.1101/2020.03.27.012906
Top-level python scripts run conservation and secondary structure analysis:
conservation.py
finds conserved intervals in SARS-related viruses and SARS-CoV-2 sequencesunstructured.py
finds unstructured intervals, and conserved unstructured intervalsrnaz_analysis.py
analyzes the RNAz screen data and compiles conserved structured intervalsalifoldz_analysis.py
prepares alignment windows for rscape and alifoldz analysis, and compares alifoldz hits with those from RNAz
The alignments
folder includes starting alignments of SARS-related and SARS-CoV-2 sequences.
The rnaz_data
folder includes output from a genome-wide RNAz screen on SARS-related viruses.
The alifoldz
folder includes output from alifoldz analysis.
The rscape
folder includes output from rscape analysis.
The scanfold_data
folder includes ScanFold output from Andrews, et al. bioRXiv 2020
The example_results
folder includes example output files from the top-level python scripts, which should be reproduced by running the scripts.
python packages in (pip install requirements.txt
):
- scipy
- numpy
- biopython
External Daslab dependencies:
- arnie
- Contrafold 2.0 is used for secondary structure calculations
External packages:
- R-scape v1.4.0