Skip to content

reymonera/mining_peru_sequence_DB

Repository files navigation

Data mining of DNA sequences submitted by Peruvian institutions to public genetic databases

This repository contains all data related to the paper "Data mining of DNA sequences submitted by Peruvian institutions to public genetic databases". For more information about this work, please contact:

  1. Pedro Romero, pedro.romero@upch.pe (corresponding author)
  2. Camila Castillo-Vilcahuaman, camila.castillo.v@upch.pe (owner of the repository)

Files in this repository

  1. bold_data_species.txt: This file contains the BOLD data used in this study. It uses a .tabular format.
  2. PATRIC_genome.csv: This file contains the PATRIC data used in this study. It uses a .csv format.
  3. used_queries_per_inst_nucleotide.txt: This file explains the query words used in the Nucleotide data.
  4. list: This file contains all query words used in a practical list for for loops.

Script used

Scripts used in this study can be run using Jupyter notebooks with a bash kernel on Binder:

Binder

BinderBash used from here: https://github.com/gjbex/BinderBash.

Additionally, we submited:

  1. mining_peru_sequence_DB.pdf (ENGLISH)
  2. mining_peru_secuencias_DB.pdf (SPANISH)

About

Scripts used for the data mining in

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published