Title: | Create Mixture Models From Predictive Samples |
---|---|
Description: | The `stackr` package provides an easy way to combine predictions from individual time series or panel data models to an ensemble. `stackr` stacks (Yuling Yao, Aki Vehtari, Daniel Simpson, and Andrew Gelman (2018) <doi:10.1214/17-BA1091>) Models according to the Continuous Ranked Probability Score (CRPS) (Tilmann Gneiting & Adrian E Raftery (2007) <doi:10.1198/016214506000001437>) over k-step ahead predictions. It is therefore especially suited for timeseries and panel data. A function for leave-one-out CRPS may be added in the future. Predictions need to be predictive distributions represented by predictive samples. Usually, these will be sets of posterior predictive simulation draws generated by an MCMC algorithm. Given some training data with true observed values as well as predictive samples generated from different models, `crps_weights` finds the optimal (in the sense of minimizing expected cross-validation predictive error) weights to form an ensemble from these models. Using these weights, `mixture_from_samples` can then provide samples from the optimal model mixture by drawing from the predictice samples of the individual models in the correct proportion. This gives a mixture model solely based on predictive samples and is in this regard superior to other ensembling techniques like Bayesian Model Averaging. |
Authors: | Nikos Bosse [aut, cre] , Yuling Yao [aut], Sam Abbott [aut] , Sebastian Funk [aut] |
Maintainer: | Nikos Bosse <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.0 |
Built: | 2024-12-02 06:15:53 UTC |
Source: | https://github.com/epiforecasts/stackr |
given true values and predictive samples from different models, 'crps_weights' returns the stacking weights which produce the ensemble that minimises the Continuos Ranked Probability Score (CRPS).
crps_weights(data, lambda = NULL, gamma = NULL, dirichlet_alpha = 1.001)
crps_weights(data, lambda = NULL, gamma = NULL, dirichlet_alpha = 1.001)
data |
a data.frame with the following entries:
|
lambda |
weights given to timepoints. If |
gamma |
weights given to regions. If |
dirichlet_alpha |
prior for the weights. Default is 1.001 |
returns a vector with the model weights
Strictly Proper Scoring Rules, Prediction,and Estimation, Tilmann Gneiting and Adrian E. Raftery, 2007, Journal of the American Statistical Association, Volume 102, 2007 - Issue 477
Using Stacking to Average Bayesian Predictive Distributions, Yuling Yao , Aki Vehtari, Daniel Simpson, and Andrew Gelman, 2018, Bayesian Analysis 13, Number 3, pp. 917–1003
## Not run: library("data.table") splitdate <- as.Date("2020-03-28") data <- setDT(example_data) traindata <- data[date <= splitdate] testdata <- data[date > splitdate] weights <- crps_weights(traindata) ## End(Not run)
## Not run: library("data.table") splitdate <- as.Date("2020-03-28") data <- setDT(example_data) traindata <- data[date <= splitdate] testdata <- data[date > splitdate] weights <- crps_weights(traindata) ## End(Not run)
The function takes a data.frame with predictive samples generated from different models as well as weights corresponding to these models as input. It then returns predictive samples from a mixture model generated by stacking the original models using these weights.
mixture_from_samples(data, weights = NULL, ...)
mixture_from_samples(data, weights = NULL, ...)
data |
a data.frame with the following entries:
|
weights |
stacking weights used to combine the original model to a mixture model. If NULL (default), weights will first be estimated using [crps_weights()]. |
... |
any additional parameters to pass to [crps_weights()] if 'weights' is NULL. |
data.frame with samples from the mixture model. The following columns are returned:
observed, the true observed values, if they were given as input
predicted, predicted values corresponding to the true values in observed
model, the name of the model used to generate the correspondig predictions
geography (optional), the regions for which predictions are generated. If geography is missing, it will be assumed there are no geographical differenes to take into account. Internally, regions will be ordered alphabetically
date (the date of the corresponding prediction / true value). Also works with numbers to indicate timesteps
Using Stacking to Average Bayesian Predictive Distributions, Yuling Yao, Aki Vehtari, Daniel Simpson, and Andrew Gelman, 2018, Bayesian Analysis 13, Number 3, pp. 917–1003
## Not run: library("data.table") data <- setDT(example_data) weights <- c(0.2, 0.3, 0.4, 0.1) mix <- mixture_from_samples(data, weights = weights) ## End(Not run)
## Not run: library("data.table") data <- setDT(example_data) weights <- c(0.2, 0.3, 0.4, 0.1) mix <- mixture_from_samples(data, weights = weights) ## End(Not run)