Package 'lopensemble'

Title: Create Mixture Models From Predictive Samples
Description: Combines predictions from individual time series or panel data models into an ensemble using stacking (Yao, Vehtari, Simpson, and Gelman (2018) <doi:10.1214/17-BA1091>) based on the Continuous Ranked Probability Score (CRPS) (Gneiting and Raftery (2007) <doi:10.1198/016214506000001437>) over k-step ahead predictions. Predictions must be predictive distributions represented by samples, typically posterior predictive simulation draws from a Markov chain Monte Carlo (MCMC) algorithm. Given training data with observed values and predictive samples from different models, optimal stacking weights are computed to minimize expected cross-validation predictive error. These weights can then be used to generate samples from the mixture model by drawing from individual model predictions in the correct proportions.
Authors: Nikos Bosse [aut, cre, cph] (ORCID: <https://orcid.org/0000-0002-7750-5280>), Yuling Yao [aut], Sam Abbott [aut] (ORCID: <https://orcid.org/0000-0001-8057-8037>), Sebastian Funk [aut] (ORCID: <https://orcid.org/0000-0002-2842-3406>)
Maintainer: Nikos Bosse <[email protected]>
License: MIT + file LICENSE
Version: 0.1.2.9000
Built: 2026-06-01 07:51:11 UTC
Source: https://github.com/epiforecasts/lopensemble

Help Index


Obtain CRPS stacking weights

Description

given true values and predictive samples from different models, 'crps_weights' returns the stacking weights which produce the ensemble that minimises the Continuos Ranked Probability Score (CRPS).

Usage

crps_weights(data, lambda = NULL, gamma = NULL, dirichlet_alpha = 1.001)

Arguments

data

a data.frame with the following entries:

  • observed, the true observed values

  • predicted, predicted values corresponding to the true values in observed

  • model, the name of the model used to generate the correspondig predictions

  • geography (optional), the regions for which predictions are generated. If geography is missing, it will be assumed there are no geographical differenes to take into account. Internally, regions will be ordered alphabetically

  • date (the date of the corresponding prediction / true value). Also works with numbers to indicate timesteps

lambda

weights given to timepoints. If lamba is NULL, the default gives more weight to recent time points with lambda[t] = 2 - (1 - t / T)^2. Note that elemeents of lambda need not necessarily sum up to one as the stan model automatically constraints the final weights to sum to one irrespective of lambda. lambda = "equal" uses equal weights

gamma

weights given to regions. If gamma is NULL the default is equal weights for the regions. Weights are mapped to regions alphabetically, so make sure that the the weights correspond to the regions in alphabetical order.

dirichlet_alpha

prior for the weights. Default is 1.001

Value

returns a vector with the model weights

References

Strictly Proper Scoring Rules, Prediction,and Estimation, Tilmann Gneiting and Adrian E. Raftery, 2007, Journal of the American Statistical Association, Volume 102, 2007 - Issue 477

Using Stacking to Average Bayesian Predictive Distributions, Yuling Yao , Aki Vehtari, Daniel Simpson, and Andrew Gelman, 2018, Bayesian Analysis 13, Number 3, pp. 917–1003

Examples

## Not run: 
library("data.table")
splitdate <- as.Date("2020-03-28")
data <- setDT(example_data)

traindata <- data[date <= splitdate]
testdata <- data[date > splitdate]

weights <- crps_weights(traindata)

## End(Not run)

Make mixture model from predictive samples

Description

The function takes a data.frame with predictive samples generated from different models as well as weights corresponding to these models as input. It then returns predictive samples from a mixture model generated by stacking the original models using these weights.

Usage

mixture_from_samples(data, weights = NULL, ...)

Arguments

data

a data.frame with the following entries:

  • observed, the true observed values (optional)

  • predicted, predicted values corresponding to the true values in observed

  • model, the name of the model used to generate the correspondig predictions

  • geography (optional), the regions for which predictions are generated. If geography is missing, it will be assumed there are no geographical differenes to take into account. Internally, regions will be ordered alphabetically

  • date (the date of the corresponding prediction / true value). Also works with numbers to indicate timesteps

weights

stacking weights used to combine the original model to a mixture model. If NULL (default), weights will first be estimated using [crps_weights()].

...

any additional parameters to pass to [crps_weights()] if 'weights' is NULL.

Value

data.frame with samples from the mixture model. The following columns are returned:

  • observed, the true observed values, if they were given as input

  • predicted, predicted values corresponding to the true values in observed

  • model, the name of the model used to generate the correspondig predictions

  • geography (optional), the regions for which predictions are generated. If geography is missing, it will be assumed there are no geographical differenes to take into account. Internally, regions will be ordered alphabetically

  • date (the date of the corresponding prediction / true value). Also works with numbers to indicate timesteps

References

Using Stacking to Average Bayesian Predictive Distributions, Yuling Yao, Aki Vehtari, Daniel Simpson, and Andrew Gelman, 2018, Bayesian Analysis 13, Number 3, pp. 917–1003

Examples

## Not run: 
library("data.table")
data <- setDT(example_data)
weights <- c(0.2, 0.3, 0.4, 0.1)
mix <- mixture_from_samples(data, weights = weights)

## End(Not run)