Marco Cacciabue, Melina Obregón, Axel N. Fenoglio
voRtex is a package created with the purpose to help in the data analysis of large groups of foot and mouth desease rna sequences.
this package contains a collection of functions designed to manage and analyze sample data in an efficient way. It manages VCF files, bed files, fasta files, DNAStringSet and else.
VCFToDataFrame.R creates a data frame off a vcf file, containing the information of allel position, allel frecuency and depth coverage
file <- system.file("extdata", "variant_file.vcf", package = "voRtex", mustWork = TRUE)
vcf_data <- VariantAnnotation::readVcf(file)
vcfdataframe<-VCFToDataFrame(vcf_data)
vcfdataframe
With compute_coverage.R we can create a dataframe containing the average position and coverage of a sequence, using a .bed file and giving the function a window size of choosing
FilePath <- system.file("extdata", "SRR12664421_full_coverage.bed",
package = "voRtex", mustWork = TRUE)
data <- read.table(FilePath, col.names = c("reference", "startpos", "endpos", "coverage"))
data_processed<-compute_coverage(data, 50,TRUE)
Then with ggplot_heatmap.R we can create a heatmap based on the data frame created with compute_coverage
color <- c("#D53E4F","#F46D43","#FDAE61","#FEE08B","#E6F598","#ABDDA4","#66C2A5","#3288BD")
Rplot<-ggplot_heatmap(inputdata=data_processed,
color_pal = color)
Rplot
Resulting in this beautiful heat map
SRR12664421_HeatmapWith NreadFilter we use a .bed file containing the coverage of a sequence, and it creates a dateframe with that information and then filters the rows based on a filter value, and returns a list containing the filtered dataframe and the percentage of bases that passed the filter.
FilePath <- system.file("extdata", "SRR12664421_full_coverage.bed",
package = "voRtex", mustWork = TRUE)
OGDataFrame <- read.table(FilePath,
col.names = c("reference","startpos","endpos","nreads"))
salida <- NreadFilter(OGDataFrame,5000)
With NCountFilter we take a DNAStringSet file and filter it according to the minimum number of “N” (base not readed) we want.
FilePath <- system.file("extdata",
"renamed_all.fasta",
package = "voRtex",
mustWork = TRUE)
DNASequence <- Biostrings::readDNAStringSet(FilePath)
NCountFilter(DNASequence,1000)