Customize variability and target testing

Variability in SBC workflow

In SBC, parameter, predictor, hyperparameter are three input variables that verify the performance of your proposed generator and inference engine via simulation [1]. Specific values are assumed for each variable based on which outcome variable and posterior sample values are simulated. With this assumed input and retrieved output values, SBC quantifies how much your generator+inference engine is self-consistent based on the fact that ranks are uniform if the distribution of input and output random variable are equal (proof from Talts et al., 2018). In this sense, rvar<ndraws>[dim] datatype whose definition is multidimensional, sample-based random variable representation, fits this simulation-based test goal. Different input variable values are tested but they are different faces of the same variable. In the SBC workflow figure below, S is the number of initial parameter values, N is the outcome variable dimension, and M is the number of final posterior samples.

Using variability for targeted testing

The focus of verification could be robustness w.r.t one or combinations of ranging parameters (e.g. how much one's wage affects life expectancy) or predictor (e.g. what the range and distribution of the wages are) or hyperparameter (e.g. prior parameter for group heterogeneity or observational error). By specifying greater variability to the target variable with rvar, you can customize your test. For example, the following verification test targets for param and predictor; both are given greater variability than hyperparam which is a constant.

library(SBC)
S = 200
N = 8
M = 20
chains = 2
thin_ranks = 10

generator <- function(hyperparam, param, predictor){
  # hyperparamter
  sigma = hyperparam$sigma
  prior_width = hyperparam$prior_width
  
  # paramter
  w = param$w 
  b = param$b
  
  # predictor
  x = predictor$x 
  
  # generate
  S = ndraws(param[[1]])
  y_true = 1 / (1 + exp(-w*x-b))
  y <- rfun(rnorm)(n = N, mean = y_true, sd = sigma) 
  gen_rvars <- draws_rvars(N = N, x = x, y = y, prior_width = prior_width, sigma = sigma)
  SBC_datasets(
    parameters = as_draws_matrix(param), 
    generated = draws_rvars_to_standata(gen_rvars)
  )
}

stan_code <- "
data {
  int N;
  vector[N] y;
  vector[N] x;
  real<lower=0> prior_width;
  real<lower=0> sigma;
}

parameters {
  real w;
  real b;
}

model {
  vector[N] y_true = inv_logit(w * x + b);
  y ~ normal(y_true, sigma);
  w ~ normal(0, prior_width);
  b ~ normal(0, prior_width);
}
"
backend <- SBC_backend_cmdstan_sample(cmdstan_model(write_stan_file(stan_code)), chains = chains, iter_sampling = M * thin_ranks / chains)
datasets <- generator(
  hyperparam = list(prior_width = 10, sigma = 0.1, n_dataval = 10),
  param = draws_rvars(w = rvar(rnorm(S, 0, 1)), b = rvar(rnorm(S, 0, 1))), 
  predictor = draws_rvars(x = rfun(rnorm, ndraws = S) (n = N, 0, 2))
)
result <- compute_results(datasets, backend)
plot_rank_hist(result)

Variability datatype

Variability of input and outcome variables can be customized with their datatype specification.

Input types

Depending on how much the input variable is fixed across and within parallel simulations, three variability datatypes exist: global constant, simulation-wise constant, full variable.

constant

use-cases: hyperparameter with scalar types, input known to have low variability, fixed parameters for the conditional likelihood (recommended in Gelman et al., 2020), parameter forming difficult posterior geometry (can fix as MAP)
sigma, prior_width from the code

simulation-wise constant

use-cases: predictor whose values are simulated within one simulation but are the same across parallel simulation experiments
implementation: rfun(rnorm, ndraws = 1) (n = N, 0, 2), rvar<1>[8] mean ± sd: [1] -1.17 ± NA...[8] -5.10 ± NA

full variable

use-cases: parameter, predictor whose values are different across parallel simulation experiments
implementation:
- w, b: rvar(rnorm(S, 0, 1)), rvar<200>[1] mean ± sd: 0.077 ± 1
- x: rfun(rnorm, ndraws = S)(n = N, 0, 2), rvar<200>[8] mean ± sd: [1] -1.17 ± NA...[8] -5.10 ± NA

Outcome types

latent outcome

function of input variables
y_true: 1 / (1 + exp(-w*x-b)), rvar<200>[8] mean ± sd: [1] 0.55 ± 0.84...[8] 1.06 ± 0.65

outcome

y: rfun(rnorm)(n = N, mean = y_true, sd = sigma): rvar<200>[8] mean ± sd: [1] 0.52 ± 0.36...[8] 0.48 ± 0.27 loc, scale vector act columwise

> cbind(draws_of(loc),draws_of(scale))
        [,1]      [,2]
1   8.827531 1.0028804
2   8.773037 1.0078621
3  10.400346 0.9888083
...
9   3.508049 1.0006897
10  9.996540 1.0032488
>   y <- rfun(rnorm) (n = N, mean = loc, sd = scale) # two are the same but 
>   y <- rvar_rng(rnorm, n = N, mean = loc, sd = scale) #rfun is more credible though slow
> draws_of(y)
        [,1]      [,2]      [,3]
1   9.141878  8.075538 10.622291
2  12.256540 14.454926  9.243738
3   7.485816  6.869056  3.038485
...
9  13.422072 10.198783 11.414952
10  8.946459  2.942293 10.206024

Possible SBC extension

Modelers can customize the amount of variability in each step; one extension is the last level of outcome variable generation. This is expressed as an added layer of D below. This extension is useful for designing approximation strategies by relieving the constraint where the number of simulated outcomes is not necessarily equal to that of input simulation.

[1] Each stan program could be transformed into a factor and variable (Berstein et. al., 2020). parameter, predictor, hyperparmeter for which specific rvar are given are variables (oval in the figure) while the object of the verification test is factors (rectangle). Factor (your proposal)'s performance is quantified with simulation results given the specific rvar for each variable.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly