This repository contains all data related to the paper "Data mining of DNA sequences submitted by Peruvian institutions to public genetic databases". For more information about this work, please contact:
- Pedro Romero, pedro.romero@upch.pe (corresponding author)
- Camila Castillo-Vilcahuaman, camila.castillo.v@upch.pe (owner of the repository)
- bold_data_species.txt: This file contains the BOLD data used in this study. It uses a
.tabular
format. - PATRIC_genome.csv: This file contains the PATRIC data used in this study. It uses a
.csv
format. - used_queries_per_inst_nucleotide.txt: This file explains the query words used in the Nucleotide data.
- list: This file contains all query words used in a practical list for
for
loops.
Scripts used in this study can be run using Jupyter notebooks with a bash kernel on Binder:
BinderBash used from here: https://github.com/gjbex/BinderBash.
Additionally, we submited:
- mining_peru_sequence_DB.pdf (ENGLISH)
- mining_peru_secuencias_DB.pdf (SPANISH)