-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Version 1.0.8 #31
Version 1.0.8 #31
Conversation
…`; removal of depreciated `PACVr.parseName()`; merger of `parseSource` and `PACVr.parseSource()`
…eation; update junction naming
…be either `VERSION` or `ACCESSION`
… of open pull request
PACVr.visualizeWithRCircos()
; Fix features with duplicate qualifier names; Expanded coverage summariesThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appreciate the general reduction of the number of variables (e.g., "gbkData", "analysisSpecs", and "plotSpecs" now contains various sub-variables, making it sufficient to only pass only these along instead of creating new variables)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementing the function PACVr.gbkData() is a great idea, especially the fact that the initially read data is cleaned quickly after parsing to avoid excessive memory usage (e.g., gc()
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this straightforward implementation of the calculation of the number of low coverage windows per region relative to the region size. dplyr for the win!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for implementing "combineDupQuals()". I had not even thought of the relevance of such a function, although I have not fully understood what it does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sense behind "removeSmall" is as follows: if a region/gene/etc. were smaller than the window size specified by the user (default: 250 bp), then a coverage calculation may not make sense. Hence, the sizeThreshold should not be hard-coded but should take the value of windowSize if removeSmall=TRUE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good improvements to handle cases when no input file specified or input file is empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, adding support for png-formatted output is helpful.
GenomicAlignments::coverage()
and the derivative fromCovCalc()
) are fields of thecoverage
object, and input parameters related to plotting are contained inplotSpecs
. I am applying some object-oriented principles, such as in the validation ofPACVr.complete()
parameters in bothgetAnalysisSpecs()
andgetPlotSpecs()
. However, the use cases of these objects are fairly simple, partially due to only one instance of each existing during the execution ofPACVr.complete()
, so I have not yet defined them in one of R's object-oriented systems like S3.note
qualifiers being unnecessary for the execution of standard coverage analysis, and allowing the sample name in BAM file to match eitherVERSION
orACCESSION
of GenBank file forverbose
analysis.output
parameter is performed, and PNG file support has been added.getCovDepth()
,lowCoverage
in the summary table has been renamedlowCovWin_abs
, and a statistic corresponding tolowCovWin_abs
/regionLen
namedlowCovWin_relToRegionLen
has been added. An additional row for <sample_name>_coverage.summary.regions.tsv has been added to the verbose output,Complete_genome
, with coverage stats corresponding to the sums forregionLen
andlowCovWin_abs
. Note that the latter item is just a sum of the low coverage counts when each region is considered separately, not the low coverage count when considering the entire source. This aligns with the indicated requirements, but I wanted to mention this since the two sums typically differ. Sinceevenness
is based upon the raw coverage data for sequences, this depth statistic forComplete_genome
does use the coverage mean of the entire source in the calculation.