-
Notifications
You must be signed in to change notification settings - Fork 23
Design Issues Summary
Issue: How should wrappers be implemented given that they are not directly supported?
Conclusion: Conventional implementation of wrappers is possible, by hardcoding in the wrapper which functions to access from the wrapped model and which to access from the wrapper. This is done easiest by using a GenericWrapper abstract class.
Issue: How should decorators be implemented given that they are not directly supported?
Conclusion: A workaround has been found that is similar to wrappers except occurs in object construction. See the following tests for more details.
Issue: Should we use the 'sets' library in R for symbolic representation of types etc. of distributions?
Conclusion: No, whilst this library is good for calculations it doesn't include things like cartesian products and has a poor print/summary interface. Hence we will make our own simple placeholder class for now.
Issue: How to make use of OOP conventions with get/set in R6?
Conclusion: Make all variables private with accessor functions but give these functions the same name as the private variables (i.e. without the 'get' prefix). Give all properties and traits their own accessor for easier S3 use.
Issue: Not supported in R6 without messy workarounds
Conclusion: Remove classes for VariateForm (uni/multi/matrix) and ValueSupport (cont/disc). Use patterns not involving inheritance.
Issue: How to name methods and variables
Conclusion: Use informative names for R6 variables/methods. Copy these for S3. Avoid masking where possible but make use of S3 generics to help user-experience. e.g. To find the expectation of a distribution D: D$expectation()
, expectation(D)
, mean(D)
. This will be re-visited once the final list of methods and variables is drawn up.
Issue: How best to trace the automatic generation of functions so that the user can understand at any point how an automatically generated function was derived.
Conclusion: This is important to a technical user but less so to a non-technical user. This is solved in two ways: first in a generic wrapper that allows a call to original distributions and secondly by using dynamic commenting within functions that outlines the most important features of the new function. Where and when we use these comments will be re-visited in implementation.
Issue: How to make use of input validation and internal checks.
Conclusion: Use 'checkmate' style functions with three options for validation: check, assert and test. These either throw errors that stop the code or return error strings that don't stop the code. All three are implemented and the user can select which to use. By default we use check so that in long argument chains the code isn't broken but informative errors are produced.
Issue: Enable a plot method that returns a list structure that can be passed to ggplot or plot
Conclusion: Essential data to keep are a range of inputs to the distribution, outputs depending on function to plot, model quantiles (where appropriate) and separately a list of graphical parameters that can be utilised by ggplot or plot but can also be overwritten by user
Issue: How to separate 'core' functions from 'exotic' functions, i.e. functions that a non-technical user may desire vs. more complex functions
Conclusion: Use the Decorator design pattern to separate different types of exotic functions, e.g. Measures, Survival, Statistical Theory. But have all available through S3 dispatch. Need to confirm this is possible in R6 (that we can have dynamic initialization).
Issue: Should options for functions and auto-generation be set via global options or locally (as parameters).
Conclusion: Wherever possible all options should be set locally. In the case of overloading arithmetics e.g. '+' or where we want to simplify the number of arguments, we set defaults in the R6 method, which is then called by the overloaded arithmetic or dispatch.
Issue: How should the domain/scientific type of the variable be identified?
Conclusion: Make a S3 Object called 'Set' where mathematical sets can be defined and easily parsed. Use this to have a 'ScientficType' as a trait and a 'distrDomain' and 'support' as properties. These include 'gaps'. Also have checkmate style assert/test/checkDomain.
Issue: For OO implementation should the hierarchy follow a ValueSupport -> VariateForm hierarchy or the other way.
Conclusion: Use multiple inheritance so that aspects of any combination can be utilised. Need to check multiple inheritance compatibility with R6.
Issue: For certain variables in distributions are lists, dictionaries or classes preferred.
Conclusion: When a variable provides simply informative, for example lists information, then we use dictionaries as they allow for method chaining. If dispatch on the variable is required, or more sophisticated print/summary information, then we use S3 classes.
Issue: From the distr package, how do we separate out properties of Distribution and ProbFamily.
Conclusion: These are kept as separate classes. ProbFamily contains Distribution (so properties in ProbFamily can't be added as a decorator to Distribution). Constructed in a similar way to Distributions so that parameterization is undertaken in construction. ProbFamily focuses on information such as L2Deriv and FisherInfo.
Issue: If a user supplies not all of p/d/q/r then how should the remaining be generated?
Conclusion: Functions are not generated by default but only when a particular wrapper is called to do so. Then a particular method can be selected and when applicable warning errors will be returned indicating the accuracy of the method.
Issue: How to define a summary method for distributions?
Conclusion: See Issue #6 for the design. This will now also include a line of the form "Distribution X is paramaterized with the following parameters"
Issue: Which methods from distr are carried to the upgrade
Conclusion: Core methods for analysing distributions are all brought forward and also for common distribution properties. We do not carry forward functions specifically for robust statistics.
Issue: Should we include estimation methods as part of the core distr6 package?
Conclusion: No but we should have the flexibility to allow for this in the future. This includes separating 'settable' and 'fittable' parameters, i.e. parameterization vs. tuning.
Issue: How do we bring forward these functions from distr?
Conclusion: The abstraction of distribution and probfamily into separate classes allows for this naturally.
Issue: How do we define a distribution in terms of properties and traits?
Conclusion: Properties are object variables set in construction, traits are class variables set in definition. See Issue #2 for the full list.
Issue: How to implement join, conditional, truncated/huberized distributions.
Conclusion: Use wrappers for all of these, so we have access to the original models and can easily overload functions as required.
Issue: How do we separate different parameterizations of a distribution?
Conclusion: Whilst we initially discussed using the Visitor pattern, we quickly realised this would lead to a lot of extra work in terms of implementing each different parameterization and then building different parameter interfaces and changing function definitions. The conclusion was to use parameter sets with a 'settable' column, in construction the parameters supplied by the user define which are 'settable' and therefore which parameterization is used. All other non-settable parameters are automatically calculated, thus function definitions don't need to be adapted for each parameterization.