Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version 1.0.8 #31

Merged
merged 48 commits into from
Feb 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
1f6114c
Migrate functions that create gbk and bam derivatives to `parseData.R…
alephnull7 Feb 16, 2024
a0475d4
merger of `parseSource` and `PACVr.parseSource()`
alephnull7 Feb 16, 2024
5e1891a
Move `coverage` mutation to `coverage` creation
alephnull7 Feb 16, 2024
6546966
Unified `gbkData` object for PACVR analysis
alephnull7 Feb 16, 2024
4849f53
New `analysisSpecs` property
alephnull7 Feb 16, 2024
80c97e5
Update `PACVr.verboseInformation()` to use unified `gbkData`
alephnull7 Feb 16, 2024
5d09e18
Refactor of `PACVr.visualizeWithRCircos()` and `visualizeWithRCircos()`
alephnull7 Feb 16, 2024
ac7d22a
Update `checkIREquality` to directly use `gbkSeq`
alephnull7 Feb 16, 2024
436f073
Derive lengths from `gbkSeq`
alephnull7 Feb 16, 2024
35e555e
Update `fillDataFrame()` to directly use `gbkLengths`
alephnull7 Feb 16, 2024
74b38f5
Remove depreciated `isRealRegions()`
alephnull7 Feb 16, 2024
455fe98
Add scaled depth stat to `getCovDepth()`; add optional `removeSmall` …
alephnull7 Feb 16, 2024
10489a0
Resolve "Undefined global variables" check
alephnull7 Feb 16, 2024
cf67bbe
Feature with multiple qualifications of the same name fix
alephnull7 Feb 17, 2024
2989a7d
Updated `read.gb2DF()` testing to reflect unified `analysisSpecs`
alephnull7 Feb 17, 2024
514a6df
Remove spaces from source as `quadripRegions`
alephnull7 Feb 17, 2024
da71150
Refactoring of `getCovSummaries()` and addition of genome summary for…
alephnull7 Feb 18, 2024
79fb678
Refactor creation of regions coverage summary
alephnull7 Feb 20, 2024
26fa620
Modify `updateCovDataField()` to use `covData` fields for `length` cr…
alephnull7 Feb 20, 2024
c78221e
Only consider general qualification duplication case
alephnull7 Feb 20, 2024
f37d9b8
Less general qualification duplication match
alephnull7 Feb 20, 2024
6e54961
Remove depreciated parameter from `combineDupQuals()`
alephnull7 Feb 20, 2024
5a6b649
Correct file check for `getGbkRaw()`
alephnull7 Feb 20, 2024
93c82f4
Additional check on `gbkFile`
alephnull7 Feb 20, 2024
f1e036d
Include `windowSize` in `analysisSpecs`
alephnull7 Feb 20, 2024
71937c5
Unification of parameters in `plotSpecs`
alephnull7 Feb 20, 2024
8f0b2e3
Updated parameter name in `PACVr.calcCoverage()`
alephnull7 Feb 20, 2024
7424cf3
Updated call of `PACVr.calcCoverage()`
alephnull7 Feb 20, 2024
da92809
Updated call of `PACVr.verboseInformation()`
alephnull7 Feb 20, 2024
cdcbbea
Updated call of `PACVr.verboseInformation()`
alephnull7 Feb 20, 2024
0863972
Creation of `output` field in `getPlotSpecs()`
alephnull7 Feb 20, 2024
7c948d1
Support for PNG output
alephnull7 Feb 21, 2024
3c1bd38
Log as fatal on unsuccessful run
alephnull7 Feb 21, 2024
b7035c8
Enhanced handling for `output` parameter
alephnull7 Feb 22, 2024
7798468
Single parse of `GenomicAlignments::coverage()`; Allow `seqnames` to …
alephnull7 Feb 22, 2024
2c621dc
Qualifier `note` not required for standard coverage analysis
alephnull7 Feb 22, 2024
6b47796
In `PACVr_run_parallel.R`, print size of multiprocess tasks
alephnull7 Feb 22, 2024
e255c6a
Suppress `read.gb` messages
alephnull7 Feb 22, 2024
7ebb55c
Source as regions fallback when `FilterByKeywords()` returns empty
alephnull7 Feb 22, 2024
63564f4
Updated testing
alephnull7 Feb 22, 2024
01ee80e
Inclusion of `png()` use in `NAMESPACE`
alephnull7 Feb 22, 2024
ffe9616
Updated package-wide imports/exports
alephnull7 Feb 22, 2024
fec4957
Version 1.0.8
alephnull7 Feb 22, 2024
cf2fbd6
Updated documentation for release
alephnull7 Feb 22, 2024
9ff9db1
dos2unix on both `R/*` and `tests/` push
alephnull7 Feb 22, 2024
f738317
No `R-CMD-check` on pull request; complications when new push is part…
alephnull7 Feb 22, 2024
1e3dc0b
Retry version 1.0.8
alephnull7 Feb 22, 2024
44573fe
Updated documentation for release
alephnull7 Feb 22, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 0 additions & 5 deletions .github/workflows/R-CMD-check.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,6 @@ on:
workflows: [document]
types:
- completed
push:
branches: [main, master]
paths: ["tests/**"]
pull_request:
branches: [main, master]

name: R-CMD-check

Expand Down
7 changes: 5 additions & 2 deletions .github/workflows/dos2unix.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help
on:
push:
paths: ["R/**"]
paths: ["R/**", "tests/**"]

name: dos2unix

Expand All @@ -18,13 +18,16 @@ jobs:
run: sudo apt-get install dos2unix

- name: Run dos2unix
run: find ./R -type f -name "*.R" -exec dos2unix {} \;
run: |
find ./R -type f -name "*.R" -exec dos2unix {} \;
find ./tests/testthat -type f -name "*.R" -exec dos2unix {} \;

- name: Commit and push changes
run: |
git config --local user.name "$GITHUB_ACTOR"
git config --local user.email "$GITHUB_ACTOR@users.noreply.github.com"
git add ./R/*.R
git add ./tests/testthat/*.R
git commit -m "Conversion of R files from CRLF to LF" || echo "No changes to commit"
git pull --ff-only
git push origin
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
CHANGELOG
---------
#### Version 1.0.8 (2024.02.22)
* Handling of GenBank features with multiple qualifiers of the same name
* Coverage summaries added to `verbose` output files
* Parameters `syntenyLineType` and `regionsCheck` for `PACVr.complete()` have been combined into `IRCheck`
* Support for PNG `output`
* Samples without `note` can be used for standard coverage analysis
* Sample name in BAM file can match either `VERSION` or `ACCESSION` of GenBank file for `verbose` analysis
* Analysis continues without regions when IR presence test unsuccessful

#### Version 1.0.7 (2024.02.01)
* More robust parsing of feature sequence locations using INSDC standards
Expand Down
7 changes: 4 additions & 3 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: PACVr
Version: 1.0.7
Date: 2024-02-01
Version: 1.0.8
Date: 2024-02-22
Title: Plastome Assembly Coverage Visualization
Authors@R: c(person("Gregory", "Smith", role=c("ctb")),
person("Nils", "Jenke", role=c("ctb")),
Expand All @@ -19,7 +19,8 @@ Imports:
RCircos (>= 1.2.0),
grDevices,
stats,
utils
utils,
tidyr
Description: Visualizes the coverage depth of a complete plastid genome as well as the equality of its inverted repeat regions in relation to the circular, quadripartite genome structure and the location of individual genes. For more information, please see Gruenstaeudl and Jenke (2020) <doi:10.1186/s12859-020-3475-0>.
License: BSD 3-clause License + file LICENSE
RoxygenNote: 7.3.1
Expand Down
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ importFrom(dplyr,"%>%")
importFrom(grDevices,colors)
importFrom(grDevices,dev.off)
importFrom(grDevices,pdf)
importFrom(grDevices,png)
importFrom(stats,aggregate)
importFrom(stats,cov)
importFrom(stats,sd)
Expand Down
5 changes: 2 additions & 3 deletions R/IRoperations.R
Original file line number Diff line number Diff line change
@@ -1,10 +1,9 @@
#!/usr/bin/env RScript
#contributors=c("Gregory Smith", "Nils Jenke", "Michael Gruenstaeudl")
#email="m_gruenstaeudl@fhsu.edu"
#version="2024.02.01.1736"
#version="2024.02.22.2236"

checkIREquality <- function(gbkData, regions, dir, sample_name) {
gbkSeq <- read.gbSeq(gbkData)
checkIREquality <- function(gbkSeq, regions, dir, sample_name) {
if ("IRb" %in% regions[, 4] && "IRa" %in% regions[, 4]) {
repeatB <- as.numeric(regions[which(regions[, 4] == "IRb"), 2:3])
repeatA <-
Expand Down
5 changes: 3 additions & 2 deletions R/PACVr-package.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,17 @@
"_PACKAGE"

## usethis namespace: start
#' @export RCircos.Env
#' @importFrom dplyr %>%
#' @importFrom grDevices colors
#' @importFrom grDevices dev.off
#' @importFrom grDevices pdf
#' @importFrom grDevices png
#' @importFrom RCircos RCircos.Env
#' @importFrom stats aggregate
#' @importFrom stats cov
#' @importFrom stats sd
#' @importFrom utils write.csv
#' @importFrom utils write.table
#' @importFrom RCircos RCircos.Env
#' @export RCircos.Env
## usethis namespace: end
NULL
208 changes: 51 additions & 157 deletions R/PACVr.R

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate the general reduction of the number of variables (e.g., "gbkData", "analysisSpecs", and "plotSpecs" now contains various sub-variables, making it sufficient to only pass only these along instead of creating new variables)

Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#!/usr/bin/env RScript
#contributors=c("Gregory Smith", "Nils Jenke", "Michael Gruenstaeudl")
#email="m_gruenstaeudl@fhsu.edu"
#version="2024.02.01.1736"
#version="2024.02.22.2236"

PACVr.read.gb <- function(gbkFile) {
gbkRaw <- getGbkRaw(gbkFile)
Expand All @@ -12,121 +12,50 @@ PACVr.read.gb <- function(gbkFile) {
return(gbkData)
}

PACVr.parseName <- function (gbkData) {
return(read.gbSampleName(gbkData))
}

PACVr.parseQuadripRegions <- function (gbkData, gbkDataDF) {
raw_quadripRegions <- ParseQuadripartiteStructure(gbkDataDF)
quadripRegions <- fillDataFrame(gbkData, raw_quadripRegions)
return(quadripRegions)
}

PACVr.parseSource <- function(gbkDataDF) {
return(parseSource(gbkDataDF))
}

PACVr.parseGenes <- function (gbkDataDF) {
# This function parses the genes of a GenBank file
logger::log_info('Parsing the different genes')
genes <- ExtractAllGenes(gbkDataDF)
return(genes)
}

PACVr.calcCoverage <-
function (bamFile, windowSize=250) {
logger::log_info('Calculating the sequencing coverage')
coverage <- CovCalc(bamFile, windowSize)
return(coverage)
}

PACVr.generateIRGeneData <- function(genes, quadripRegions,
syntenyLineType) {
# Parse GenBank file
if ("IRb" %in% quadripRegions[, 4] &&
"IRa" %in% quadripRegions[, 4]) {
linkData <- GenerateIRSynteny(genes, syntenyLineType)
return(linkData)
}
return(-1)
}

PACVr.verboseInformation <- function(gbkData,
bamFile,
genes,
quadripRegions,
coverageRaw,
analysisSpecs,
output) {
sampleName <- PACVr.parseName(gbkData)
verbosePath <- getVerbosePath(sampleName, output)
printCovStats(bamFile,
genes,
quadripRegions,
plotSpecs) {
sampleName <- gbkData$sampleName
verbosePath <- getVerbosePath(sampleName,
plotSpecs)
printCovStats(coverageRaw,
gbkData$genes,
gbkData$quadripRegions,
sampleName,
analysisSpecs,
verbosePath)
if (!is.null(analysisSpecs$syntenyLineType)) {
checkIREquality(gbkData,
quadripRegions,
if (analysisSpecs$isSyntenyLine) {
checkIREquality(gbkData$seq,
gbkData$quadripRegions,
verbosePath,
sampleName)
}
logger::log_info('Verbose output saved in `{verbosePath}`')
}

PACVr.visualizeWithRCircos <- function(gbkData,
genes,
quadripRegions,
coverage,
windowSize,
logScale,
threshold,
relative,
linkData,
syntenyLineType,
textSize) {
# Step 1. Generate plot title
plotTitle <- read.gbPlotTitle(gbkData)
# Step 2. Visualize
analysisSpecs,
plotSpecs) {
logger::log_info('Generating a visualization of the sequencing coverage')
isOutput <- plotSpecs$isOutput

if (isOutput) {
createVizFile(plotSpecs)
}

visualizeWithRCircos(
plotTitle,
genes,
quadripRegions,
gbkData,
coverage,
windowSize,
threshold,
logScale,
relative,
linkData,
syntenyLineType,
textSize
analysisSpecs,
plotSpecs
)
}

PACVr.quadripRegions <- function(gbkData,
gbkDataDF,
isIRCheck) {
if (isIRCheck) {
logger::log_info('Parsing the quadripartite genome structure')
quadripRegions <- PACVr.parseQuadripRegions(gbkData,
gbkDataDF)
} else {
quadripRegions <- PACVr.parseSource(gbkDataDF)
}
return(quadripRegions)
}

PACVr.linkData <- function(genes,
quadripRegions,
syntenyLineType) {
linkData <- NULL
if (!is.null(syntenyLineType)) {
logger::log_info('Inferring the IR regions and the genes within the IRs')
linkData <- PACVr.generateIRGeneData(genes,
quadripRegions,
syntenyLineType)
if (isOutput) {
dev.off()
logger::log_info('Visualization saved as `{plotSpecs$output}`')
}
return(linkData)
}

#' @title Execute the complete pipeline of \pkg{PACVr}
Expand Down Expand Up @@ -191,79 +120,44 @@ PACVr.complete <- function(gbkFile,
verbose=FALSE,
output=NA) {
######################################################################
gbkData <- PACVr.read.gb(gbkFile)
analysisSpecs <- getAnalysisSpecs(IRCheck)
gbkDataDF <- read.gb2DF(gbkData,
analysisSpecs)
if (is.null(gbkDataDF)) {
logger::log_error(paste("No usable data to perform specified analysis"))
return(NULL)
read.gbData <- PACVr.read.gb(gbkFile)
analysisSpecs <- getAnalysisSpecs(IRCheck,
windowSize)
gbkData <- PACVr.gbkData(read.gbData,
analysisSpecs)
rm(read.gbData)
gc()
if (is.null(gbkData)) {
logger::log_fatal('Unsuccessful.')
return(-1)
}

###################################
quadripRegions <- PACVr.quadripRegions(gbkData,
gbkDataDF,
analysisSpecs$isIRCheck)

###################################
genes <- PACVr.parseGenes(gbkDataDF)
plotSpecs <- getPlotSpecs(logScale,
threshold,
relative,
textSize,
output)

###################################
coverage <- PACVr.calcCoverage(bamFile,
windowSize)

###################################
linkData <- PACVr.linkData(genes,
quadripRegions,
analysisSpecs$syntenyLineType)
analysisSpecs$windowSize,
plotSpecs$logScale)

###################################
if (verbose) {
PACVr.verboseInformation(gbkData,
bamFile,
genes,
quadripRegions,
coverage$raw,
analysisSpecs,
output)
plotSpecs)
}

###################################
if (!is.na(output)) {
logger::log_info('Generating a visualization of the sequencing coverage')
pdf(output, width=10, height=10)
PACVr.visualizeWithRCircos(
gbkData,
genes,
quadripRegions,
coverage,
windowSize,
threshold,
logScale,
relative,
linkData,
IRCheck,
textSize
)
dev.off()
logger::log_info('Visualization (including coverage) saved as `{output}`')
} else {
logger::log_info('No coverage data inferred; generating empty visualization')
PACVr.visualizeWithRCircos(
gbkData,
genes,
quadripRegions,
coverage,
windowSize,
threshold,
logScale,
relative,
linkData,
IRCheck,
textSize
)
dev.off()
logger::log_info('Visualization (excluding coverage) saved as `{output}`')
}
PACVr.visualizeWithRCircos(gbkData,
coverage$plot,
analysisSpecs,
plotSpecs)

######################################################################
logger::log_success('Done.')
######################################################################
Expand Down
2 changes: 1 addition & 1 deletion R/customizedRCircos.R
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#!/usr/bin/env RScript
#contributors=c("Gregory Smith", "Nils Jenke", "Michael Gruenstaeudl")
#email="m_gruenstaeudl@fhsu.edu"
#version="2024.02.01.1736"
#version="2024.02.22.2236"


# The following R functions were taken from the R package RCircos and then modified.
Expand Down
Loading