A Julia package to extend the RandomVariables.jl package, as an add-on to the Distributions.jl package. The scope of this package is:
- Allow for conditional expectations
X=RV(Normal()); 𝔼(X|(X<3))
- Allow for addition of two independent random variables to create a new random variable
Z=X+Y
- Allow for the multiplication of two independent random variables to create a new random variable
Z=X*Y
- Allow for the maximum and minimum of two independent random variables to create a new random variable
Z=max(X,Y)
,Z=min(X,Y)
- A plotting functions for the CDF and PDF of random variables
This is still very much a work in progress. General optimizations and bug fixes are needed.
Both Distributions.jl and RandomVariables.jl must be installed and loaded to use ExtendRandomVariables.jl
julia> import Pkg
julia> Pkg.add("Distributions")
julia> Pkg.add("RandomVariables")
julia> Pkg.add("QuadGK")
julia> Pkg.add("Roots")
julia> Pkg.add("Plots")
julia> Pkg.add("FiniteDifferences")
julia> Pkg.add(url = "https://github.com/AMMercier/ExtendRandomVariables.jl.git")
julia> using RandomVariables, Distributions, ExtendRandomVariables, FiniteDifferences, Plots
Mathematics is often a "beautiful machine" where we create abstract objects and allow them to interact in strange and wonderful ways to produce results greater than the initial definitions would lead the creator to believe. Consequently, it is often not necessary to enumerate all the pathways by which these constructs interact, but rather simply provide the ruleset by which they interrelate and allow a user to mix and match as they please. This "toybox" approach to mathematics is easily applicable to - and perhaps highly beneficial for - computer programming. In this way, the goal is not to create a single program, but rather an ecosystem of interacting types and operations to allow emergent behavior. This synergy between mathematics and programming presents an opportunity in a critical area to modern pure and applied mathematics: random variables.
Random variables, a key concept in probability theory, hold paramount importance in various domains, ranging from finance to biology. By their very nature, random variables introduce uncertainty into mathematical models, allowing us to capture the inherent unpredictability of real-world phenomena. Random variables are a function which map from a sample space,
Therefore, the goal of this project is to extend the usefulness of random variables as types form the RandomVariables.jl in Julia to allow operations on random variables in an analogous manner as floats, with easily approachable and familiar syntax like X+Y
. Also, we introduce conditional moments by allowing a conditional expectation. Additionally, we facilitate the plotting of CDF and PDF (or PMF) of the distribution induced by a random variable. In this way, the rules are set such that any given user can utilize rules to craft their own emergent "beautiful machine".
While the majority of the following discussion of the methods of this approach applies to continuous random variables, the same rules analogously apply to discrete random variables, as well.
Random variables can be defined by the RV()
from the RandomVariables.jl package function from any univariate distribution taken from the Distriutions.jl package
X = RV(Normal()) #Random variable X~N(0,1)
Y = RV(Poisson(4)) #Random variable Y~Poisson(4)
All random variables have two components: 1) an underlying distribution and 2) an ID to keep track of dependent vs. independent variables.
Probabilites are only relevant when considering events. That is, the probability function is from the event space, cc
, oo
, and co
or oc
for closed, open, or clopen intervals, respectively. The collection of intervals defining the event is called a "box". For
We define an event through various operators such as X>3
and combine events. For example, we can write
#Events
A = X > 1
B = X ≤ 3
#Combinations
A∩B
Given a random variable f(X)<x
. Therefore, to find P(f(X)<x)
we have
Therefore, in addition to the distribution and ID, we must keep track of the function fInv
. In sum, we can write expressions such as P(exp(X)<3)
.
The following functions are supported by RandomVariables.jl, for
X+a
X*a
-
inv(X)
, i.e.$Z=X^{-1}$ -
X/a
ora/X
exp(X)
sqrt(X)
abs(X)
X^a
We can recall that the probability of event A conditional on the given event
That is, the probability of event P(A∩B)
, we can easily write P(A|B)
.
The conditional expectation differs from the conditional event in that the first argument, 𝔼(X|A)
, is a random variable and not an event. Because we usually require the conditional PDF (or PMF) to compute the conditional expectation and we have direct access to the CDF of any odd combination of random variables through the expression P(X<x)
, we compute the conditional expectation by
Thus, we can compute the conditional expectation for either continuous or discrete random variables using the 𝔼
expression (\bbE
followed by tab
in Julia):
𝔼(X)
𝔼(X|(X<3))
𝔼(X|(exp(X)<2)∩(abs(X)<1.5))
It should be noted that if
Lastly, mathematically, for
We can find the addition or multiplication of two random variables through a "convolutional" framework. For addition of two independent random variables, we have that
which allows X+Y
and X-Y
. Similarly, for the multiplication of two random variables
This is implemented using Gaussian quadrature from the QuadGK.jl package. It is worth noting that for some random variables, the order of the corresponding polynomial is quite large and computation time relatively slow. Other more computationally efficient options to complete these convolutions could be explored in the future, such as FTT. It is worth noting that the evaluation of the integral commonly fails to converge about the range that contains
For max
or min
of two independent random variables
Therefore, if we wish to have the PDF, we must compute
Relatedly, if we wish to determine the quantile function for our random variables, we can find the inverse of the corresponding CDF, find_zero(x -> cdf(d,x) - p, 0)
where d
is our distribution. For a discrete CDF, we have that for the p-th quantile, we find the smallest value of 1e-10
quantile or 1-1e-10
quantile, respectively.
Taken together, we can do some preliminary checks and examples. Consider the following:
#Continuous random variables
julia> X1 = RV(Normal()); Y1 = RV(Normal(3))
RV(Normal{Float64}(μ=3.0, σ=1.0), 11)
julia> Z1=X1+Y1; 𝔼(Z1)
3.0000000000000226
julia> W1=X1*Y2; 𝔼(W1)
-8.719078556333654e-16
#Discrete random variables
julia> X2 = RV(Poisson(3)); Y2 = RV(Poisson(4))
RV(Poisson{Float64}(λ=4.0), 17)
julia> Z2 = X2 + Y2; 𝔼(Z2)
6.99999999697663
julia> W2 = X2*Y2; 𝔼(W2)
11.999971100191376
#Misc. Examples:
julia> 𝔼(W1|(exp(W1) < 1))
-2.3942635174048403
julia> 𝔼(W2|(exp(W2) < 3))
0.38603161430513455
When considering a random variable, there are two primary descriptions we would wish to plot: the cumulative distribution function (CDF) and the probability density (or mass) function (PDF or PMF). Consequently, we can write:
X = RV(Normal())
Y = RV(Exponential(3))
Z = X+Y
plotCDF(Z, -6, 10, xlabel="test")
plotPDF(Z, -6, 10, xlabel="test")
This provides the following plots of the CDF and PDF, respectively.
Additionally, for the PMF of a discrete random variable, we have:
X = RV(Poisson(3)); Y = RV(Poisson(4))
Z = X+Y
plotPDF(Z)
When the end points aren't explicitly specified, as in plotPDF(Z)
, the x limits of the plot are the 1% and 99% quantiles.
Also, we can also use the "bang" operator !
via Julia convention: plotCDF!
and plotPDF!
.
Finally, while plotPDF(X*Y)
works, it is slow (especially on first compiles). We can see this in the following plot of PDF of the product of two independent Gaussian random variables.
The largest component of future work is to implement dependent random variables, which would allow more interesting conditional moments. This could be accomplished through a variety of methods, including covariance structure or including interpolated, empirical distributions. Moreover, more arithmetic operations on random variables (such as X/Y
or X^Y
) would also be appreciated. This extends to the expectation of the transformations of combinations of random variables, such as 𝔼(exp(X+Y))
or 𝔼(exp(X*Y))
which commonly fails due to singularities causing the quadrature to fail. Broadly speaking, for more complex distributions, Monte Carlo methods could be used at the cost of either precision, accuracy, or performance. With respect to plotting, allowing plots for transformations of random variables would be an easy next step. Additionally, more general optimizations could be made, especially with respect to the addition and multiplication of random variables. Furthermore, random variables from mixed distributions (both continuous and discrete) would also be greatly appreciated and would allow for even greater ease of use for the user. Lastly, reworking the base types from RandomVariables.jl to be compatible with automatic differentiation, including differentiation of random variables (i.e., Malliavin calculus), would be a more ambitious extension of the RandomVariables.jl. It is a hope that at some point in the future random variables may be implemented, up to their limitations, similar to integers, floats, or other types we use daily.
Alexander M. Mercier, amercier@g.harvard.edu
MacKenzie, Robert Beverley. "The Darwinian theory of the transmutation of species examined." London: Nisbet & Co (1868): 318.