Skip to content

Design Issues Summary

Raphael Sonabend edited this page May 2, 2019 · 5 revisions

Wrappers in R6

Issue: How should wrappers be implemented given that they are not directly supported?

Conclusion: Conventional implementation of wrappers is possible, by hardcoding in the wrapper which functions to access from the wrapped model and which to access from the wrapper. This is done easiest by using a GenericWrapper abstract class.

Decorators in R6

Issue: How should decorators be implemented given that they are not directly supported?

Conclusion: A workaround has been found that is similar to wrappers except occurs in object construction. See the following tests for more details.

Sets

Issue: Should we use the 'sets' library in R for symbolic representation of types etc. of distributions?

Conclusion: No, whilst this library is good for calculations it doesn't include things like cartesian products and has a poor print/summary interface. Hence we will make our own simple placeholder class for now.

getters/setters/convention

Issue: How to make use of OOP conventions with get/set in R6?

Conclusion: Make all variables private with accessor functions but give these functions the same name as the private variables (i.e. without the 'get' prefix). Give all properties and traits their own accessor for easier S3 use.

Multiple Inheritance

Issue: Not supported in R6 without messy workarounds

Conclusion: Remove classes for VariateForm (uni/multi/matrix) and ValueSupport (cont/disc). Use patterns not involving inheritance.

Method/Variable Naming

Issue: How to name methods and variables

Conclusion: Use informative names for R6 variables/methods. Copy these for S3. Avoid masking where possible but make use of S3 generics to help user-experience. e.g. To find the expectation of a distribution D: D$expectation(), expectation(D), mean(D). This will be re-visited once the final list of methods and variables is drawn up.

Traceability

Issue: How best to trace the automatic generation of functions so that the user can understand at any point how an automatically generated function was derived.

Conclusion: This is important to a technical user but less so to a non-technical user. This is solved in two ways: first in a generic wrapper that allows a call to original distributions and secondly by using dynamic commenting within functions that outlines the most important features of the new function. Where and when we use these comments will be re-visited in implementation.

Debugging

Issue: How to make use of input validation and internal checks.

Conclusion: Use 'checkmate' style functions with three options for validation: check, assert and test. These either throw errors that stop the code or return error strings that don't stop the code. All three are implemented and the user can select which to use. By default we use check so that in long argument chains the code isn't broken but informative errors are produced.

Plottable Structure

Issue: Enable a plot method that returns a list structure that can be passed to ggplot or plot

Conclusion: Essential data to keep are a range of inputs to the distribution, outputs depending on function to plot, model quantiles (where appropriate) and separately a list of graphical parameters that can be utilised by ggplot or plot but can also be overwritten by user

Exotic Functions

Issue: How to separate 'core' functions from 'exotic' functions, i.e. functions that a non-technical user may desire vs. more complex functions

Conclusion: Use the Decorator design pattern to separate different types of exotic functions, e.g. Measures, Survival, Statistical Theory. But have all available through S3 dispatch. Need to confirm this is possible in R6 (that we can have dynamic initialization).

Global vs Local Options

Issue: Should options for functions and auto-generation be set via global options or locally (as parameters).

Conclusion: Wherever possible all options should be set locally. In the case of overloading arithmetics e.g. '+' or where we want to simplify the number of arguments, we set defaults in the R6 method, which is then called by the overloaded arithmetic or dispatch.

Domain Specification

Issue: How should the domain/scientific type of the variable be identified?

Conclusion: Make a S3 Object called 'Set' where mathematical sets can be defined and easily parsed. Use this to have a 'ScientficType' as a trait and a 'distrDomain' and 'support' as properties. These include 'gaps'. Also have checkmate style assert/test/checkDomain.

Type Hierarchy

Issue: For OO implementation should the hierarchy follow a ValueSupport -> VariateForm hierarchy or the other way.

Conclusion: Use multiple inheritance so that aspects of any combination can be utilised. Need to check multiple inheritance compatibility with R6.

Lists/Dictionaries/Classes

Issue: For certain variables in distributions are lists, dictionaries or classes preferred.

Conclusion: When a variable provides simply informative, for example lists information, then we use dictionaries as they allow for method chaining. If dispatch on the variable is required, or more sophisticated print/summary information, then we use S3 classes.

Abstraction of Distribution and ProbFamily

Issue: From the distr package, how do we separate out properties of Distribution and ProbFamily.

Conclusion: These are kept as separate classes. ProbFamily contains Distribution (so properties in ProbFamily can't be added as a decorator to Distribution). Constructed in a similar way to Distributions so that parameterization is undertaken in construction. ProbFamily focuses on information such as L2Deriv and FisherInfo.

Automated generation of p/d/q/r

Issue: If a user supplies not all of p/d/q/r then how should the remaining be generated?

Conclusion: Functions are not generated by default but only when a particular wrapper is called to do so. Then a particular method can be selected and when applicable warning errors will be returned indicating the accuracy of the method.

Summary method

Issue: How to define a summary method for distributions?

Conclusion: See Issue #6 for the design. This will now also include a line of the form "Distribution X is paramaterized with the following parameters"

distr functions to carry forward

Issue: Which methods from distr are carried to the upgrade

Conclusion: Core methods for analysing distributions are all brought forward and also for common distribution properties. We do not carry forward functions specifically for robust statistics.

Estimate/fit method

Issue: Should we include estimation methods as part of the core distr6 package?

Conclusion: No but we should have the flexibility to allow for this in the future. This includes separating 'settable' and 'fittable' parameters, i.e. parameterization vs. tuning.

L2Deriv and FisherInfo

Issue: How do we bring forward these functions from distr?

Conclusion: The abstraction of distribution and probfamily into separate classes allows for this naturally.

Properties vs Traits

Issue: How do we define a distribution in terms of properties and traits?

Conclusion: Properties are object variables set in construction, traits are class variables set in definition. See Issue #2 for the full list.

Implementation of joint, truncated/huberized distributions

Issue: How to implement join, conditional, truncated/huberized distributions.

Conclusion: Use wrappers for all of these, so we have access to the original models and can easily overload functions as required.

Parameterization

Issue: How do we separate different parameterizations of a distribution?

Conclusion: Whilst we initially discussed using the Visitor pattern, we quickly realised this would lead to a lot of extra work in terms of implementing each different parameterization and then building different parameter interfaces and changing function definitions. The conclusion was to use parameter sets with a 'settable' column, in construction the parameters supplied by the user define which are 'settable' and therefore which parameterization is used. All other non-settable parameters are automatically calculated, thus function definitions don't need to be adapted for each parameterization.