Version 1.0.8 #31

alephnull7 · 2024-02-17T21:36:41Z

The first main change in the package consists of continued refactoring, specifically in treating related data as a composite object when possible. As a result, all GenBank data used within PACVr is now contained in a single object, both versions of the coverage data (GenomicAlignments::coverage() and the derivative from CovCalc()) are fields of the coverage object, and input parameters related to plotting are contained in plotSpecs. I am applying some object-oriented principles, such as in the validation of PACVr.complete() parameters in both getAnalysisSpecs() and getPlotSpecs(). However, the use cases of these objects are fairly simple, partially due to only one instance of each existing during the execution of PACVr.complete(), so I have not yet defined them in one of R's object-oriented systems like S3.
When running some test data, specifically an annotated GenBank file for NC_009143, I noticed the possibility and lack of handling for a feature with multiple qualifiers of the same name. Originally, I considered combining this information into a list. Due to some of the filtering operations done in the package involving regex matching, they are concatenated into a character string instead.
Experiences with other test data have led to note qualifiers being unnecessary for the execution of standard coverage analysis, and allowing the sample name in BAM file to match either VERSION or ACCESSION of GenBank file for verbose analysis.
When attempting to perform the IR presence test, if no feature matches are found, instead of leading to an error, this failure is indicated, and execution continues using the unpartitioned source as a single region to analyze.
Greater validation of the output parameter is performed, and PNG file support has been added.
Within getCovDepth(), lowCoverage in the summary table has been renamed lowCovWin_abs, and a statistic corresponding to lowCovWin_abs/regionLen named lowCovWin_relToRegionLen has been added. An additional row for <sample_name>_coverage.summary.regions.tsv has been added to the verbose output, Complete_genome, with coverage stats corresponding to the sums for regionLen and lowCovWin_abs. Note that the latter item is just a sum of the low coverage counts when each region is considered separately, not the low coverage count when considering the entire source. This aligns with the indicated requirements, but I wanted to mention this since the two sums typically differ. Since evenness is based upon the raw coverage data for sequences, this depth statistic for Complete_genome does use the coverage mean of the entire source in the calculation.

…`; removal of depreciated `PACVr.parseName()`; merger of `parseSource` and `PACVr.parseSource()`

…option

… `regions`

…eation; update junction naming

…be either `VERSION` or `ACCESSION`

… of open pull request

michaelgruenstaeudl · 2024-02-22T19:17:04Z

R/PACVr.R

I appreciate the general reduction of the number of variables (e.g., "gbkData", "analysisSpecs", and "plotSpecs" now contains various sub-variables, making it sufficient to only pass only these along instead of creating new variables)

michaelgruenstaeudl · 2024-02-22T19:19:32Z

R/parseData.R

Implementing the function PACVr.gbkData() is a great idea, especially the fact that the initially read data is cleaned quickly after parsing to avoid excessive memory usage (e.g., gc()).

michaelgruenstaeudl · 2024-02-22T23:31:18Z

R/verboseInformation.R

I like this straightforward implementation of the calculation of the number of low coverage windows per region relative to the region size. dplyr for the win!

michaelgruenstaeudl · 2024-02-22T23:39:53Z

R/read.gb2PACVr.R

Thank you for implementing "combineDupQuals()". I had not even thought of the relevance of such a function, although I have not fully understood what it does.

michaelgruenstaeudl · 2024-02-23T00:10:38Z

R/verboseInformation.R

The sense behind "removeSmall" is as follows: if a region/gene/etc. were smaller than the window size specified by the user (default: 250 bp), then a coverage calculation may not make sense. Hence, the sizeThreshold should not be hard-coded but should take the value of windowSize if removeSmall=TRUE.

michaelgruenstaeudl · 2024-02-23T00:18:29Z

R/customizedRead.gb.R

Good improvements to handle cases when no input file specified or input file is empty.

michaelgruenstaeudl · 2024-02-23T02:00:11Z

R/visualizeWithRCircos.R

Yes, adding support for png-formatted output is helpful.

alephnull7 added 22 commits February 16, 2024 13:09

Migrate functions that create gbk and bam derivatives to `parseData.R…

1f6114c

…`; removal of depreciated `PACVr.parseName()`; merger of `parseSource` and `PACVr.parseSource()`

merger of parseSource and PACVr.parseSource()

a0475d4

Move coverage mutation to coverage creation

5e1891a

Unified gbkData object for PACVR analysis

6546966

New analysisSpecs property

4849f53

Update PACVr.verboseInformation() to use unified gbkData

80c97e5

Refactor of PACVr.visualizeWithRCircos() and visualizeWithRCircos()

5d09e18

Update checkIREquality to directly use gbkSeq

ac7d22a

Derive lengths from gbkSeq

436f073

Update fillDataFrame() to directly use gbkLengths

35e555e

Remove depreciated isRealRegions()

74b38f5

Add scaled depth stat to getCovDepth(); add optional removeSmall …

455fe98

…option

Resolve "Undefined global variables" check

10489a0

Feature with multiple qualifications of the same name fix

cf67bbe

Updated read.gb2DF() testing to reflect unified analysisSpecs

2989a7d

Remove spaces from source as quadripRegions

514a6df

Refactoring of getCovSummaries() and addition of genome summary for…

da71150

… `regions`

Refactor creation of regions coverage summary

79fb678

Modify updateCovDataField() to use covData fields for length cr…

26fa620

…eation; update junction naming

Only consider general qualification duplication case

c78221e

Less general qualification duplication match

f37d9b8

Remove depreciated parameter from combineDupQuals()

6e54961

alephnull7 marked this pull request as draft February 20, 2024 21:18

alephnull7 added 7 commits February 20, 2024 13:27

Correct file check for getGbkRaw()

5a6b649

Additional check on gbkFile

93c82f4

Include windowSize in analysisSpecs

f1e036d

Unification of parameters in plotSpecs

71937c5

Updated parameter name in PACVr.calcCoverage()

8f0b2e3

Updated call of PACVr.calcCoverage()

7424cf3

Updated call of PACVr.verboseInformation()

da92809

alephnull7 and others added 19 commits February 20, 2024 15:31

Updated call of PACVr.verboseInformation()

cdcbbea

Creation of output field in getPlotSpecs()

0863972

Support for PNG output

7c948d1

Log as fatal on unsuccessful run

3c1bd38

Enhanced handling for output parameter

b7035c8

Single parse of GenomicAlignments::coverage(); Allow seqnames to …

7798468

…be either `VERSION` or `ACCESSION`

Qualifier note not required for standard coverage analysis

2c621dc

In PACVr_run_parallel.R, print size of multiprocess tasks

6b47796

Suppress read.gb messages

e255c6a

Source as regions fallback when FilterByKeywords() returns empty

7ebb55c

Updated testing

63564f4

Inclusion of png() use in NAMESPACE

01ee80e

Updated package-wide imports/exports

ffe9616

Version 1.0.8

fec4957

Updated documentation for release

cf2fbd6

dos2unix on both R/* and tests/ push

9ff9db1

No R-CMD-check on pull request; complications when new push is part…

f738317

… of open pull request

Retry version 1.0.8

1e3dc0b

Updated documentation for release

44573fe

alephnull7 changed the title ~~Refactored GenBank data flow; Refactored PACVr.visualizeWithRCircos(); Fix features with duplicate qualifier names; Expanded coverage summaries~~ Version 1.0.8 Feb 22, 2024

alephnull7 marked this pull request as ready for review February 22, 2024 23:20

michaelgruenstaeudl approved these changes Feb 23, 2024

View reviewed changes

michaelgruenstaeudl merged commit 1dba485 into michaelgruenstaeudl:master Feb 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Version 1.0.8 #31

Version 1.0.8 #31

alephnull7 commented Feb 17, 2024 •

edited

Loading

michaelgruenstaeudl Feb 22, 2024

michaelgruenstaeudl Feb 22, 2024

michaelgruenstaeudl Feb 22, 2024

michaelgruenstaeudl Feb 22, 2024

michaelgruenstaeudl Feb 23, 2024

michaelgruenstaeudl Feb 23, 2024

michaelgruenstaeudl Feb 23, 2024

Version 1.0.8 #31

Version 1.0.8 #31

Conversation

alephnull7 commented Feb 17, 2024 • edited Loading

michaelgruenstaeudl Feb 22, 2024

Choose a reason for hiding this comment

michaelgruenstaeudl Feb 22, 2024

Choose a reason for hiding this comment

michaelgruenstaeudl Feb 22, 2024

Choose a reason for hiding this comment

michaelgruenstaeudl Feb 22, 2024

Choose a reason for hiding this comment

michaelgruenstaeudl Feb 23, 2024

Choose a reason for hiding this comment

michaelgruenstaeudl Feb 23, 2024

Choose a reason for hiding this comment

michaelgruenstaeudl Feb 23, 2024

Choose a reason for hiding this comment

alephnull7 commented Feb 17, 2024 •

edited

Loading