Package 'stackr'

Title: Create Mixture Models From Predictive Samples
Description: The `stackr` package provides an easy way to combine predictions from individual time series or panel data models to an ensemble. `stackr` stacks (Yuling Yao, Aki Vehtari, Daniel Simpson, and Andrew Gelman (2018) <doi:10.1214/17-BA1091>) Models according to the Continuous Ranked Probability Score (CRPS) (Tilmann Gneiting & Adrian E Raftery (2007) <doi:10.1198/016214506000001437>) over k-step ahead predictions. It is therefore especially suited for timeseries and panel data. A function for leave-one-out CRPS may be added in the future. Predictions need to be predictive distributions represented by predictive samples. Usually, these will be sets of posterior predictive simulation draws generated by an MCMC algorithm. Given some training data with true observed values as well as predictive samples generated from different models, `crps_weights` finds the optimal (in the sense of minimizing expected cross-validation predictive error) weights to form an ensemble from these models. Using these weights, `mixture_from_samples` can then provide samples from the optimal model mixture by drawing from the predictice samples of the individual models in the correct proportion. This gives a mixture model solely based on predictive samples and is in this regard superior to other ensembling techniques like Bayesian Model Averaging.
Authors: Nikos Bosse [aut, cre] , Yuling Yao [aut], Sam Abbott [aut] , Sebastian Funk [aut]
Maintainer: Nikos Bosse <[email protected]>
License: MIT + file LICENSE
Version: 0.1.0
Built: 2024-12-02 06:15:53 UTC
Source: https://github.com/epiforecasts/stackr

Help Index


Obtain CRPS stacking weights

Description

given true values and predictive samples from different models, 'crps_weights' returns the stacking weights which produce the ensemble that minimises the Continuos Ranked Probability Score (CRPS).

Usage

crps_weights(data, lambda = NULL, gamma = NULL, dirichlet_alpha = 1.001)

Arguments

data

a data.frame with the following entries:

  • observed, the true observed values

  • predicted, predicted values corresponding to the true values in observed

  • model, the name of the model used to generate the correspondig predictions

  • geography (optional), the regions for which predictions are generated. If geography is missing, it will be assumed there are no geographical differenes to take into account. Internally, regions will be ordered alphabetically

  • date (the date of the corresponding prediction / true value). Also works with numbers to indicate timesteps

lambda

weights given to timepoints. If lamba is NULL, the default gives more weight to recent time points with lambda[t] = 2 - (1 - t / T)^2. Note that elemeents of lambda need not necessarily sum up to one as the stan model automatically constraints the final weights to sum to one irrespective of lambda. lambda = "equal" uses equal weights

gamma

weights given to regions. If gamma is NULL the default is equal weights for the regions. Weights are mapped to regions alphabetically, so make sure that the the weights correspond to the regions in alphabetical order.

dirichlet_alpha

prior for the weights. Default is 1.001

Value

returns a vector with the model weights

References

Strictly Proper Scoring Rules, Prediction,and Estimation, Tilmann Gneiting and Adrian E. Raftery, 2007, Journal of the American Statistical Association, Volume 102, 2007 - Issue 477

Using Stacking to Average Bayesian Predictive Distributions, Yuling Yao , Aki Vehtari, Daniel Simpson, and Andrew Gelman, 2018, Bayesian Analysis 13, Number 3, pp. 917–1003

Examples

## Not run: 
library("data.table")
splitdate <- as.Date("2020-03-28")
data <- setDT(example_data)

traindata <- data[date <= splitdate]
testdata <- data[date > splitdate]

weights <- crps_weights(traindata)

## End(Not run)

Make mixture model from predictive samples

Description

The function takes a data.frame with predictive samples generated from different models as well as weights corresponding to these models as input. It then returns predictive samples from a mixture model generated by stacking the original models using these weights.

Usage

mixture_from_samples(data, weights = NULL, ...)

Arguments

data

a data.frame with the following entries:

  • observed, the true observed values (optional)

  • predicted, predicted values corresponding to the true values in observed

  • model, the name of the model used to generate the correspondig predictions

  • geography (optional), the regions for which predictions are generated. If geography is missing, it will be assumed there are no geographical differenes to take into account. Internally, regions will be ordered alphabetically

  • date (the date of the corresponding prediction / true value). Also works with numbers to indicate timesteps

weights

stacking weights used to combine the original model to a mixture model. If NULL (default), weights will first be estimated using [crps_weights()].

...

any additional parameters to pass to [crps_weights()] if 'weights' is NULL.

Value

data.frame with samples from the mixture model. The following columns are returned:

  • observed, the true observed values, if they were given as input

  • predicted, predicted values corresponding to the true values in observed

  • model, the name of the model used to generate the correspondig predictions

  • geography (optional), the regions for which predictions are generated. If geography is missing, it will be assumed there are no geographical differenes to take into account. Internally, regions will be ordered alphabetically

  • date (the date of the corresponding prediction / true value). Also works with numbers to indicate timesteps

References

Using Stacking to Average Bayesian Predictive Distributions, Yuling Yao, Aki Vehtari, Daniel Simpson, and Andrew Gelman, 2018, Bayesian Analysis 13, Number 3, pp. 917–1003

Examples

## Not run: 
library("data.table")
data <- setDT(example_data)
weights <- c(0.2, 0.3, 0.4, 0.1)
mix <- mixture_from_samples(data, weights = weights)

## End(Not run)