Title:  Utilities for Scoring and Assessing Predictions 

Description:  `scoringutils` facilitates the evaluation of forecasts in a convenient framework based on `data.table`. It allows user to to check their forecasts and diagnose issues, to visualise forecasts and missing data, to transform data before scoring, to handle missing forecasts, to aggregate scores, and to visualise the results of the evaluation. The package mostly focuses on the evaluation of probabilistic forecasts and allows evaluating several different forecast types and input formats. Find more information about the package in the Vignettes as well as in the accompanying paper (<doi:10.48550/arXiv.2205.07090>). 
Authors:  Nikos Bosse [aut, cre] , Sam Abbott [aut] , Hugo Gruson [aut] , Johannes Bracher [ctb] , Toshiaki Asakura [ctb] , James Mba Azam [ctb] , Sebastian Funk [aut] 
Maintainer:  Nikos Bosse <[email protected]> 
License:  MIT + file LICENSE 
Version:  1.2.2.9000 
Built:  20240704 12:41:20 UTC 
Source:  https://github.com/epiforecasts/scoringutils 
Adds a columns with relative skills computed by running
pairwise comparisons on the scores.
For more information on
the computation of relative skill, see get_pairwise_comparisons()
.
Relative skill will be calculated for the aggregation level specified in
by
.
add_relative_skill( scores, by = "model", metric = intersect(c("wis", "crps", "brier_score"), names(scores)), baseline = NULL )
add_relative_skill( scores, by = "model", metric = intersect(c("wis", "crps", "brier_score"), names(scores)), baseline = NULL )
scores 
An object of class 
by 
Character vector with column names that define the grouping level
for the pairwise comparisons. By default ( 
metric 
A string with the name of the metric for which a relative skill shall be computed. By default this is either "crps", "wis" or "brier_score" if any of these are available. 
baseline 
A string with the name of a model. If a baseline is
given, then a scaled relative skill with respect to the baseline will be
returned. By default ( 
Compute the absolute error of the median calculated as
$\textrm{abs}(\textrm{observed}  \textrm{median prediction})$
The median prediction is the predicted value for which quantile_level == 0.5,
the function therefore requires 0.5 to be among the quantile levels in
quantile_level
.
ae_median_quantile(observed, predicted, quantile_level)
ae_median_quantile(observed, predicted, quantile_level)
observed 
Numeric vector of size n with the observed values. 
predicted 
Numeric nxN matrix of predictive
quantiles, n (number of rows) being the number of forecasts (corresponding
to the number of observed values) and N
(number of columns) the number of quantiles per forecast.
If 
quantile_level 
Vector of of size N with the quantile levels for which predictions were made. 
Numeric vector of length N with the absolute error of the median.
observed < rnorm(30, mean = 1:30) predicted_values < replicate(3, rnorm(30, mean = 1:30)) ae_median_quantile( observed, predicted_values, quantile_level = c(0.2, 0.5, 0.8) )
observed < rnorm(30, mean = 1:30) predicted_values < replicate(3, rnorm(30, mean = 1:30)) ae_median_quantile( observed, predicted_values, quantile_level = c(0.2, 0.5, 0.8) )
Absolute error of the median calculated as
$%
\textrm{abs}(\textrm{observevd}  \textrm{median\_prediction})$
ae_median_sample(observed, predicted)
ae_median_sample(observed, predicted)
observed 
A vector with observed values of size n 
predicted 
nxN matrix of predictive samples, n (number of rows) being
the number of data points and N (number of columns) the number of Monte
Carlo samples. Alternatively, 
vector with the scoring values
observed < rnorm(30, mean = 1:30) predicted_values < matrix(rnorm(30, mean = 1:30)) ae_median_sample(observed, predicted_values)
observed < rnorm(30, mean = 1:30) predicted_values < matrix(rnorm(30, mean = 1:30)) ae_median_sample(observed, predicted_values)
forecast
objectProcess and validate a data.frame (or similar) or similar with forecasts
and observations. If the input passes all input checks, it will be converted
to a forecast
object. A forecast object is a data.table
with a
class forecast
and an additional class that depends on the forecast type.
See the details section below for more information
on the expected input formats.
as_forecast()
gives users some control over how their data is parsed.
Using the arguments observed
, predicted
, and model
, users can rename
existing columns of their input data to match the required columns for a
forecast object. Using the argument forecast_unit
, users can specify the
the columns that uniquely identify a single forecast (and remove the others,
see set_forecast_unit()
for details).
as_forecast(data, ...) ## Default S3 method: as_forecast( data, forecast_unit = NULL, forecast_type = NULL, observed = NULL, predicted = NULL, model = NULL, quantile_level = NULL, sample_id = NULL, ... )
as_forecast(data, ...) ## Default S3 method: as_forecast( data, forecast_unit = NULL, forecast_type = NULL, observed = NULL, predicted = NULL, model = NULL, quantile_level = NULL, sample_id = NULL, ... )
data 
A data.frame (or similar) with predicted and observed values.
See the details section of 
... 
Additional arguments 
forecast_unit 
(optional) Name of the columns in 
forecast_type 
(optional) The forecast type you expect the forecasts
to have. If the forecast type as determined by 
observed 
(optional) Name of the column in 
predicted 
(optional) Name of the column in 
model 
(optional) Name of the column in 
quantile_level 
(optional) Name of the column in 
sample_id 
(optional) Name of the column in 
Depending on the forecast type, an object of the following class will be returned:
forecast_binary
for binary forecasts
forecast_point
for point forecasts
forecast_sample
for samplebased forecasts
forecast_quantile
for quantilebased forecasts
Various different forecast types / forecast formats are supported. At the moment, those are:
point forecasts
binary forecasts ("soft binary classification")
Probabilistic forecasts in a quantilebased format (a forecast is represented as a set of predictive quantiles)
Probabilistic forecasts in a samplebased format (a forecast is represented as a set of predictive samples)
Forecast types are determined based on the columns present in the input data. Here is an overview of the required format for each forecast type:
All forecast types require a data.frame or similar with columns observed
predicted
, and model
.
Point forecasts require a column observed
of type numeric and a column
predicted
of type numeric.
Binary forecasts require a column observed
of type factor with exactly
two levels and a column predicted
of type numeric with probabilities,
corresponding to the probability that observed
is equal to the second
factor level. See details here for more information.
Quantilebased forecasts require a column observed
of type numeric,
a column predicted
of type numeric, and a column quantile_level
of type
numeric with quantilelevels (between 0 and 1).
Samplebased forecasts require a column observed
of type numeric,
a column predicted
of type numeric, and a column sample_id
of type
numeric with sample indices.
For more information see the vignettes and the example data
(example_quantile, example_sample_continuous, example_sample_discrete,
example_point()
, and example_binary).
In order to score forecasts, scoringutils
needs to know which of the rows
of the data belong together and jointly form a single forecasts. This is
easy e.g. for point forecast, where there is one row per forecast. For
quantile or samplebased forecasts, however, there are multiple rows that
belong to a single forecast.
The forecast unit or unit of a single forecast is then described by the
combination of columns that uniquely identify a single forecast.
For example, we could have forecasts made by different models in various
locations at different time points, each for several weeks into the future.
The forecast unit could then be described as
forecast_unit = c("model", "location", "forecast_date", "forecast_horizon")
.
scoringutils
automatically tries to determine the unit of a single
forecast. It uses all existing columns for this, which means that no columns
must be present that are unrelated to the forecast unit. As a very simplistic
example, if you had an additional row, "even", that is one if the row number
is even and zero otherwise, then this would mess up scoring as scoringutils
then thinks that this column was relevant in defining the forecast unit.
In order to avoid issues, we recommend setting the forecast unit explicitly,
usually through the forecast_unit
argument in as_forecast()
. This will
drop unneeded columns, while making sure that all
necessary, 'protected columns' like "predicted" or "observed" are retained.
as_forecast(example_binary) as_forecast( example_quantile, forecast_unit = c("model", "target_type", "target_end_date", "horizon", "location") )
as_forecast(example_binary) as_forecast( example_quantile, forecast_unit = c("model", "target_type", "target_end_date", "horizon", "location") )
Function assesses whether input dimensions match. In the following, n is the number of observations / forecasts. Scalar values may be repeated to match the length of the other input. Allowed options are therefore:
observed
is vector of length 1 or length n
predicted
is:
a vector of of length 1 or length n
a matrix with n rows and 1 column
assert_dims_ok_point(observed, predicted)
assert_dims_ok_point(observed, predicted)
observed 
Input to be checked. Should be a factor of length n with
exactly two levels, holding the observed values.
The highest factor level is assumed to be the reference level. This means
that 
predicted 
Input to be checked. 
Returns NULL invisibly if the assertion was successful and throws an error otherwise.
Assert that an object is a forecast object (i.e. a data.table
with a class
forecast
and an additional class forecast_*
corresponding to the forecast
type).
assert_forecast(forecast, forecast_type = NULL, verbose = TRUE, ...) ## Default S3 method: assert_forecast(forecast, forecast_type = NULL, verbose = TRUE, ...) ## S3 method for class 'forecast_binary' assert_forecast(forecast, forecast_type = NULL, verbose = TRUE, ...) ## S3 method for class 'forecast_point' assert_forecast(forecast, forecast_type = NULL, verbose = TRUE, ...) ## S3 method for class 'forecast_quantile' assert_forecast(forecast, forecast_type = NULL, verbose = TRUE, ...) ## S3 method for class 'forecast_sample' assert_forecast(forecast, forecast_type = NULL, verbose = TRUE, ...)
assert_forecast(forecast, forecast_type = NULL, verbose = TRUE, ...) ## Default S3 method: assert_forecast(forecast, forecast_type = NULL, verbose = TRUE, ...) ## S3 method for class 'forecast_binary' assert_forecast(forecast, forecast_type = NULL, verbose = TRUE, ...) ## S3 method for class 'forecast_point' assert_forecast(forecast, forecast_type = NULL, verbose = TRUE, ...) ## S3 method for class 'forecast_quantile' assert_forecast(forecast, forecast_type = NULL, verbose = TRUE, ...) ## S3 method for class 'forecast_sample' assert_forecast(forecast, forecast_type = NULL, verbose = TRUE, ...)
forecast 
A forecast object (a validated data.table with predicted and
observed values, see 
forecast_type 
(optional) The forecast type you expect the forecasts
to have. If the forecast type as determined by 
verbose 
Logical. If 
... 
Additional arguments 
Returns NULL
invisibly.
Various different forecast types / forecast formats are supported. At the moment, those are:
point forecasts
binary forecasts ("soft binary classification")
Probabilistic forecasts in a quantilebased format (a forecast is represented as a set of predictive quantiles)
Probabilistic forecasts in a samplebased format (a forecast is represented as a set of predictive samples)
Forecast types are determined based on the columns present in the input data. Here is an overview of the required format for each forecast type:
All forecast types require a data.frame or similar with columns observed
predicted
, and model
.
Point forecasts require a column observed
of type numeric and a column
predicted
of type numeric.
Binary forecasts require a column observed
of type factor with exactly
two levels and a column predicted
of type numeric with probabilities,
corresponding to the probability that observed
is equal to the second
factor level. See details here for more information.
Quantilebased forecasts require a column observed
of type numeric,
a column predicted
of type numeric, and a column quantile_level
of type
numeric with quantilelevels (between 0 and 1).
Samplebased forecasts require a column observed
of type numeric,
a column predicted
of type numeric, and a column sample_id
of type
numeric with sample indices.
For more information see the vignettes and the example data
(example_quantile, example_sample_continuous, example_sample_discrete,
example_point()
, and example_binary).
forecast < as_forecast(example_binary) assert_forecast(forecast)
forecast < as_forecast(example_binary) assert_forecast(forecast)
The function runs input checks that apply to all input data, regardless of forecast type. The function
asserts that the forecast is a data.table which has columns observed
and
predicted
, as well as a column called model
.
checks the forecast type and forecast unit
checks there are no duplicate forecasts
if appropriate, checks the number of samples / quantiles is the same for all forecasts.
assert_forecast_generic(data, verbose = TRUE)
assert_forecast_generic(data, verbose = TRUE)
data 
A data.table with forecasts and observed values that should be validated. 
verbose 
Logical. If 
returns the input
Assert that forecast type is as expected
assert_forecast_type(data, actual = get_forecast_type(data), desired = NULL)
assert_forecast_type(data, actual = get_forecast_type(data), desired = NULL)
data 
A forecast object as produced by 
actual 
The actual forecast type of the data 
desired 
The desired forecast type of the data 
Returns NULL invisibly if the assertion was successful and throws an error otherwise.
Function assesses whether the inputs correspond to the requirements for scoring binary forecasts.
assert_input_binary(observed, predicted)
assert_input_binary(observed, predicted)
observed 
Input to be checked. Should be a factor of length n with
exactly two levels, holding the observed values.
The highest factor level is assumed to be the reference level. This means
that 
predicted 
Input to be checked. 
Returns NULL invisibly if the assertion was successful and throws an error otherwise.
Function assesses whether the inputs correspond to the requirements for scoring intervalbased forecasts.
assert_input_interval(observed, lower, upper, interval_range)
assert_input_interval(observed, lower, upper, interval_range)
observed 
Input to be checked. Should be a numeric vector with the observed values of size n. 
lower 
Input to be checked. Should be a numeric vector of size n that holds the predicted value for the lower bounds of the prediction intervals. 
upper 
Input to be checked. Should be a numeric vector of size n that holds the predicted value for the upper bounds of the prediction intervals. 
interval_range 
Input to be checked. Should be a vector of size n that denotes the interval range in percent. E.g. a value of 50 denotes a (25%, 75%) prediction interval. 
Returns NULL invisibly if the assertion was successful and throws an error otherwise.
Function assesses whether the inputs correspond to the requirements for scoring point forecasts.
assert_input_point(observed, predicted)
assert_input_point(observed, predicted)
observed 
Input to be checked. Should be a numeric vector with the observed values of size n. 
predicted 
Input to be checked. Should be a numeric vector with the predicted values of size n. 
Returns NULL invisibly if the assertion was successful and throws an error otherwise.
Function assesses whether the inputs correspond to the requirements for scoring quantilebased forecasts.
assert_input_quantile( observed, predicted, quantile_level, unique_quantile_levels = TRUE )
assert_input_quantile( observed, predicted, quantile_level, unique_quantile_levels = TRUE )
observed 
Input to be checked. Should be a numeric vector with the observed values of size n. 
predicted 
Input to be checked. Should be nxN matrix of predictive
quantiles, n (number of rows) being the number of data points and N
(number of columns) the number of quantiles per forecast.
If 
quantile_level 
Input to be checked. Should be a vector of size N that denotes the quantile levels corresponding to the columns of the prediction matrix. 
unique_quantile_levels 
Whether the quantile levels are required to be
unique ( 
Returns NULL invisibly if the assertion was successful and throws an error otherwise.
Function assesses whether the inputs correspond to the requirements for scoring samplebased forecasts.
assert_input_sample(observed, predicted)
assert_input_sample(observed, predicted)
observed 
Input to be checked. Should be a numeric vector with the observed values of size n. 
predicted 
Input to be checked. Should be a numeric nxN matrix of
predictive samples, n (number of rows) being the number of data points and
N (number of columns) the number of samples per forecast.
If 
Returns NULL invisibly if the assertion was successful and throws an error otherwise.
Determines bias from quantile forecasts. For an increasing number of quantiles this measure converges against the sample based bias version for integer and continuous forecasts.
bias_quantile(observed, predicted, quantile_level, na.rm = TRUE)
bias_quantile(observed, predicted, quantile_level, na.rm = TRUE)
observed 
Numeric vector of size n with the observed values. 
predicted 
Numeric nxN matrix of predictive
quantiles, n (number of rows) being the number of forecasts (corresponding
to the number of observed values) and N
(number of columns) the number of quantiles per forecast.
If 
quantile_level 
Vector of of size N with the quantile levels for which predictions were made. Note that if this does not contain the median (0.5) then the median is imputed as being the mean of the two innermost quantiles. 
na.rm 
Logical. Should missing values be removed? 
For quantile forecasts, bias is measured as
$B_t = (1  2 \cdot \max \{i  q_{t,i} \in Q_t \land q_{t,i} \leq x_t\})
\mathbf{1}( x_t \leq q_{t, 0.5}) \\
+ (1  2 \cdot \min \{i  q_{t,i} \in Q_t \land q_{t,i} \geq x_t\})
1( x_t \geq q_{t, 0.5}),$
where $Q_t$
is the set of quantiles that form the predictive
distribution at time $t$
and $x_t$
is the observed value. For
consistency, we define $Q_t$
such that it always includes the element
$q_{t, 0} =  \infty$
and $q_{t,1} = \infty$
.
$1()$
is the indicator function that is $1$
if the
condition is satisfied and $0$
otherwise.
In clearer terms, bias $B_t$
is:
$1  2 \cdot$
the maximum percentile rank for which the corresponding
quantile is still smaller than or equal to the observed value,
if the observed value is smaller than the median of the predictive
distribution.
$1  2 \cdot$
the minimum percentile rank for which the corresponding
quantile is still larger than or equal to the observed value if the observed
value is larger
than the median of the predictive distribution..
$0$
if the observed value is exactly the median (both terms cancel
out)
Bias can assume values between 1 and 1 and is 0 ideally (i.e. unbiased).
Note that if the given quantiles do not contain the median, the median is imputed as the mean of the two innermost quantiles.
For a large enough number of quantiles, the
percentile rank will equal the proportion of predictive samples below the
observed value, and the bias metric coincides with the one for
continuous forecasts (see bias_sample()
).
scalar with the quantile bias for a single quantile prediction
predicted < matrix(c(1.5:23.5, 3.3:25.3), nrow = 2, byrow = TRUE) quantile_level < c(0.01, 0.025, seq(0.05, 0.95, 0.05), 0.975, 0.99) observed < c(15, 12.4) bias_quantile(observed, predicted, quantile_level)
predicted < matrix(c(1.5:23.5, 3.3:25.3), nrow = 2, byrow = TRUE) quantile_level < c(0.01, 0.025, seq(0.05, 0.95, 0.05), 0.975, 0.99) observed < c(15, 12.4) bias_quantile(observed, predicted, quantile_level)
Determines bias from predictive MonteCarlo samples. The function automatically recognises, whether forecasts are continuous or integer valued and adapts the Bias function accordingly.
bias_sample(observed, predicted)
bias_sample(observed, predicted)
observed 
A vector with observed values of size n 
predicted 
nxN matrix of predictive samples, n (number of rows) being
the number of data points and N (number of columns) the number of Monte
Carlo samples. Alternatively, 
For continuous forecasts, Bias is measured as
$B_t (P_t, x_t) = 1  2 * (P_t (x_t))$
where $P_t$
is the empirical cumulative distribution function of the
prediction for the observed value $x_t$
. Computationally, $P_t (x_t)$
is
just calculated as the fraction of predictive samples for $x_t$
that are smaller than $x_t$
.
For integer valued forecasts, Bias is measured as
$B_t (P_t, x_t) = 1  (P_t (x_t) + P_t (x_t + 1))$
to adjust for the integer nature of the forecasts.
In both cases, Bias can assume values between 1 and 1 and is 0 ideally.
Numeric vector of length n with the biases of the predictive samples with respect to the observed values.
The integer valued Bias function is discussed in Assessing the performance of realtime epidemic forecasts: A case study of Ebola in the Western Area region of Sierra Leone, 201415 Funk S, Camacho A, Kucharski AJ, Lowe R, Eggo RM, et al. (2019) Assessing the performance of realtime epidemic forecasts: A case study of Ebola in the Western Area region of Sierra Leone, 201415. PLOS Computational Biology 15(2): e1006785. doi:10.1371/journal.pcbi.1006785
## integer valued forecasts observed < rpois(30, lambda = 1:30) predicted < replicate(200, rpois(n = 30, lambda = 1:30)) bias_sample(observed, predicted) ## continuous forecasts observed < rnorm(30, mean = 1:30) predicted < replicate(200, rnorm(30, mean = 1:30)) bias_sample(observed, predicted)
## integer valued forecasts observed < rpois(30, lambda = 1:30) predicted < replicate(200, rpois(n = 30, lambda = 1:30)) bias_sample(observed, predicted) ## continuous forecasts observed < rnorm(30, mean = 1:30) predicted < replicate(200, rnorm(30, mean = 1:30)) bias_sample(observed, predicted)
The functions loops over the column names and checks whether they are present. If an issue is encountered, the function immediately stops and returns a message with the first issue encountered.
check_columns_present(data, columns)
check_columns_present(data, columns)
data 
A data.frame or similar to be checked 
columns 
A character vector of column names to check 
Returns TRUE if the check was successful and a string with an error message otherwise.
Function assesses whether input dimensions match. In the following, n is the number of observations / forecasts. Scalar values may be repeated to match the length of the other input. Allowed options are therefore:
observed
is vector of length 1 or length n
predicted
is:
a vector of of length 1 or length n
a matrix with n rows and 1 column
check_dims_ok_point(observed, predicted)
check_dims_ok_point(observed, predicted)
observed 
Input to be checked. Should be a factor of length n with
exactly two levels, holding the observed values.
The highest factor level is assumed to be the reference level. This means
that 
predicted 
Input to be checked. 
Returns TRUE if the check was successful and a string with an error message otherwise.
Runs get_duplicate_forecasts()
and returns a message if an issue is
encountered
check_duplicates(data)
check_duplicates(data)
data 
A data.frame as used for 
Returns TRUE if the check was successful and a string with an error message otherwise.
Function assesses whether the inputs correspond to the requirements for scoring binary forecasts.
check_input_binary(observed, predicted)
check_input_binary(observed, predicted)
observed 
Input to be checked. Should be a factor of length n with
exactly two levels, holding the observed values.
The highest factor level is assumed to be the reference level. This means
that 
predicted 
Input to be checked. 
Returns TRUE if the check was successful and a string with an error message otherwise.
Function assesses whether the inputs correspond to the requirements for scoring intervalbased forecasts.
check_input_interval(observed, lower, upper, interval_range)
check_input_interval(observed, lower, upper, interval_range)
observed 
Input to be checked. Should be a numeric vector with the observed values of size n. 
lower 
Input to be checked. Should be a numeric vector of size n that holds the predicted value for the lower bounds of the prediction intervals. 
upper 
Input to be checked. Should be a numeric vector of size n that holds the predicted value for the upper bounds of the prediction intervals. 
interval_range 
Input to be checked. Should be a vector of size n that denotes the interval range in percent. E.g. a value of 50 denotes a (25%, 75%) prediction interval. 
Returns TRUE if the check was successful and a string with an error message otherwise.
Function assesses whether the inputs correspond to the requirements for scoring point forecasts.
check_input_point(observed, predicted)
check_input_point(observed, predicted)
observed 
Input to be checked. Should be a numeric vector with the observed values of size n. 
predicted 
Input to be checked. Should be a numeric vector with the predicted values of size n. 
Returns TRUE if the check was successful and a string with an error message otherwise.
Function assesses whether the inputs correspond to the requirements for scoring quantilebased forecasts.
check_input_quantile(observed, predicted, quantile_level)
check_input_quantile(observed, predicted, quantile_level)
observed 
Input to be checked. Should be a numeric vector with the observed values of size n. 
predicted 
Input to be checked. Should be nxN matrix of predictive
quantiles, n (number of rows) being the number of data points and N
(number of columns) the number of quantiles per forecast.
If 
quantile_level 
Input to be checked. Should be a vector of size N that denotes the quantile levels corresponding to the columns of the prediction matrix. 
Returns TRUE if the check was successful and a string with an error message otherwise.
Function assesses whether the inputs correspond to the requirements for scoring samplebased forecasts.
check_input_sample(observed, predicted)
check_input_sample(observed, predicted)
observed 
Input to be checked. Should be a numeric vector with the observed values of size n. 
predicted 
Input to be checked. Should be a numeric nxN matrix of
predictive samples, n (number of rows) being the number of data points and
N (number of columns) the number of samples per forecast.
If 
Returns TRUE if the check was successful and a string with an error message otherwise.
Function checks the number of quantiles or samples per forecast. If the number of quantiles or samples is the same for all forecasts, it returns TRUE and a string with an error message otherwise.
check_number_per_forecast(data, forecast_unit)
check_number_per_forecast(data, forecast_unit)
data 
A data.frame or similar to be checked 
forecast_unit 
Character vector denoting the unit of a single forecast. 
Returns TRUE if the check was successful and a string with an error message otherwise.
Helper function to check whether an input is a numeric vector.
check_numeric_vector(x, ...)
check_numeric_vector(x, ...)
x 
input to check 
... 
Arguments passed on to

Returns TRUE if the check was successful and a string with an error message otherwise.
Tries to execute an expression. Internally, this is used to
see whether assertions fail when checking inputs (i.e. to convert an
assert_*()
statement into a check). If the expression fails, the error
message is returned. If the expression succeeds, TRUE
is returned.
check_try(expr)
check_try(expr)
expr 
an expression to be evaluated 
Returns TRUE if the check was successful and a string with an error message otherwise.
Wrapper around the crps_sample()
function from the
scoringRules package. Can be used for continuous as well as integer
valued forecasts
crps_sample(observed, predicted, ...)
crps_sample(observed, predicted, ...)
observed 
A vector with observed values of size n 
predicted 
nxN matrix of predictive samples, n (number of rows) being
the number of data points and N (number of columns) the number of Monte
Carlo samples. Alternatively, 
... 
Additional arguments passed to crps_sample() from the scoringRules package. 
Vector with scores.
Alexander Jordan, Fabian Krüger, Sebastian Lerch, Evaluating Probabilistic Forecasts with scoringRules, https://www.jstatsoft.org/article/view/v090i12
observed < rpois(30, lambda = 1:30) predicted < replicate(200, rpois(n = 30, lambda = 1:30)) crps_sample(observed, predicted)
observed < rpois(30, lambda = 1:30) predicted < replicate(200, rpois(n = 30, lambda = 1:30)) crps_sample(observed, predicted)
This function takes a metric function and additional arguments, and returns a new function that includes the additional arguments when calling the original metric function.
This is the expected way to pass additional
arguments to a metric when evaluating a forecast using score()
:
To evaluate a forecast using a metric with an additional argument, you need
to create a custom version of the scoring function with the argument
included. You then need to create an updated version of the list of scoring
functions that includes your customised metric and pass this to score()
.
customise_metric(metric, ...) customize_metric(metric, ...)
customise_metric(metric, ...) customize_metric(metric, ...)
metric 
The metric function to be customised. 
... 
Additional arguments to be included when calling the metric function. 
A customised metric function.
# Create a customised metric function custom_metric < customise_metric(mean, na.rm = TRUE) # Use the customised metric function values < c(1, 2, NA, 4, 5) custom_metric(values) # Updating metrics list to calculate 70% coverage in `score()` interval_coverage_70 < customise_metric( interval_coverage, interval_range = 70 ) updated_metrics < c( metrics_quantile(), "interval_coverage_70" = interval_coverage_70 ) score( as_forecast(example_quantile), metrics = updated_metrics )
# Create a customised metric function custom_metric < customise_metric(mean, na.rm = TRUE) # Use the customised metric function values < c(1, 2, NA, 4, 5) custom_metric(values) # Updating metrics list to calculate 70% coverage in `score()` interval_coverage_70 < customise_metric( interval_coverage, interval_range = 70 ) updated_metrics < c( metrics_quantile(), "interval_coverage_70" = interval_coverage_70 ) score( as_forecast(example_quantile), metrics = updated_metrics )
Wrapper around the dss_sample()
function from the
scoringRules package.
dss_sample(observed, predicted, ...)
dss_sample(observed, predicted, ...)
observed 
A vector with observed values of size n 
predicted 
nxN matrix of predictive samples, n (number of rows) being
the number of data points and N (number of columns) the number of Monte
Carlo samples. Alternatively, 
... 
Additional arguments passed to dss_sample() from the scoringRules package. 
Vector with scores.
Alexander Jordan, Fabian Krüger, Sebastian Lerch, Evaluating Probabilistic Forecasts with scoringRules, https://www.jstatsoft.org/article/view/v090i12
observed < rpois(30, lambda = 1:30) predicted < replicate(200, rpois(n = 30, lambda = 1:30)) dss_sample(observed, predicted)
observed < rpois(30, lambda = 1:30) predicted < replicate(200, rpois(n = 30, lambda = 1:30)) dss_sample(observed, predicted)
model
columnCheck whether the data.table has a column called model
.
If not, a column called model
is added with the value Unspecified model
.
ensure_model_column(data)
ensure_model_column(data)
data 
A data.frame (or similar) with predicted and observed values.
See the details section of 
The data.table with a column called model
A data set with binary predictions for COVID19 cases and deaths constructed from data submitted to the European Forecast Hub.
example_binary
example_binary
A data frame with the following columns:
the country for which a prediction was made
name of the country for which a prediction was made
the date for which a prediction was made
the target to be predicted (cases or deaths)
A factor with observed values
the date on which a prediction was made
name of the model that generated the forecasts
forecast horizon in weeks
predicted value
Predictions in the data set were constructed based on the continuous example data by looking at the number of samples below the mean prediction. The outcome was constructed as whether or not the actually observed value was below or above that mean prediction. This should not be understood as sound statistical practice, but rather as a practical way to create an example data set.
The data was created using the script createexampledata.R in the inst/ folder (or the top level folder in a compiled package).
A data set with predictions for COVID19 cases and deaths submitted to the European Forecast Hub. This data set is like the quantile example data, only that the median has been replaced by a point forecast.
example_point
example_point
A data frame with the following columns:
the country for which a prediction was made
the date for which a prediction was made
the target to be predicted (cases or deaths)
observed values
name of the country for which a prediction was made
the date on which a prediction was made
predicted value
name of the model that generated the forecasts
forecast horizon in weeks
The data was created using the script createexampledata.R in the inst/ folder (or the top level folder in a compiled package).
A data set with predictions for COVID19 cases and deaths submitted to the European Forecast Hub.
example_quantile
example_quantile
A data frame with the following columns:
the country for which a prediction was made
the date for which a prediction was made
the target to be predicted (cases or deaths)
Numeric: observed values
name of the country for which a prediction was made
the date on which a prediction was made
quantile level of the corresponding prediction
predicted value
name of the model that generated the forecasts
forecast horizon in weeks
The data was created using the script createexampledata.R in the inst/ folder (or the top level folder in a compiled package).
A data set with continuous predictions for COVID19 cases and deaths constructed from data submitted to the European Forecast Hub.
example_sample_continuous
example_sample_continuous
A data frame with the following columns:
the country for which a prediction was made
the date for which a prediction was made
the target to be predicted (cases or deaths)
observed values
name of the country for which a prediction was made
the date on which a prediction was made
name of the model that generated the forecasts
forecast horizon in weeks
predicted value
id for the corresponding sample
The data was created using the script createexampledata.R in the inst/ folder (or the top level folder in a compiled package).
A data set with integer predictions for COVID19 cases and deaths constructed from data submitted to the European Forecast Hub.
example_sample_discrete
example_sample_discrete
A data frame with the following columns:
the country for which a prediction was made
the date for which a prediction was made
the target to be predicted (cases or deaths)
observed values
name of the country for which a prediction was made
the date on which a prediction was made
name of the model that generated the forecasts
forecast horizon in weeks
predicted value
id for the corresponding sample
The data was created using the script createexampledata.R in the inst/ folder (or the top level folder in a compiled package).
Calculate the correlation between different metrics for a data.frame of
scores as produced by score()
.
get_correlations(scores, metrics = get_metrics(scores), ...)
get_correlations(scores, metrics = get_metrics(scores), ...)
scores 
An object of class 
metrics 
A character vector with the metrics to show. If set to

... 
Additional arguments to pass down to 
An object of class scores
(a data.table with an additional
attribute metrics
holding the names of the scores) with correlations
between different metrics
scores < score(as_forecast(example_quantile)) get_correlations(scores)
scores < score(as_forecast(example_quantile)) get_correlations(scores)
For a validated forecast object in a quantilebased format
(see as_forecast()
for more information), this function computes:
interval coverage of central prediction intervals
quantile coverage for predictive quantiles
the deviation between desired and actual coverage (both for interval and quantile coverage)
Coverage values are computed for a specific level of grouping, as specified
in the by
argument. By default, coverage values are computed per model.
Interval coverage
Interval coverage for a given interval range is defined as the proportion of observations that fall within the corresponding central prediction intervals. Central prediction intervals are symmetric around the median and formed by two quantiles that denote the lower and upper bound. For example, the 50% central prediction interval is the interval between the 0.25 and 0.75 quantiles of the predictive distribution.
Quantile coverage
Quantile coverage for a given quantile level is defined as the proportion of
observed values that are smaller than the corresponding predictive quantile.
For example, the 0.5 quantile coverage is the proportion of observed values
that are smaller than the 0.5 quantile of the predictive distribution.
Just as above, for a single observation and the quantile of a single
predictive distribution, the value will either be TRUE
or FALSE
.
Coverage deviation
The coverage deviation is the difference between the desired coverage (can be either interval or quantile coverage) and the actual coverage. For example, if the desired coverage is 90% and the actual coverage is 80%, the coverage deviation is 0.1.
get_coverage(forecast, by = "model")
get_coverage(forecast, by = "model")
forecast 
A forecast object (a validated data.table with predicted and
observed values, see 
by 
character vector that denotes the level of grouping for which the
coverage values should be computed. By default ( 
A data.table with columns as specified in by
and additional
columns for the coverage values described above
a data.table with columns "interval_coverage",
"interval_coverage_deviation", "quantile_coverage",
"quantile_coverage_deviation" and the columns specified in by
.
library(magrittr) # pipe operator example_quantile %>% as_forecast() %>% get_coverage(by = "model")
library(magrittr) # pipe operator example_quantile %>% as_forecast() %>% get_coverage(by = "model")
Helper function to identify duplicate forecasts, i.e. instances where there is more than one forecast for the same prediction target.
get_duplicate_forecasts(data, counts = FALSE)
get_duplicate_forecasts(data, counts = FALSE)
data 
A data.frame as used for 
counts 
Should the output show the number of duplicates per forecast
unit instead of the individual duplicated rows? Default is 
A data.frame with all rows for which a duplicate forecast was found
example < rbind(example_quantile, example_quantile[1000:1010]) get_duplicate_forecasts(example)
example < rbind(example_quantile, example_quantile[1000:1010]) get_duplicate_forecasts(example)
Given a data set with forecasts, this function counts the number of
available forecasts.
The level of grouping can be specified using the by
argument (e.g. to
count the number of forecasts per model, or the number of forecasts per
model and location).
This is useful to determine whether there are any missing forecasts.
get_forecast_counts( forecast, by = get_forecast_unit(forecast), collapse = c("quantile_level", "sample_id") )
get_forecast_counts( forecast, by = get_forecast_unit(forecast), collapse = c("quantile_level", "sample_id") )
forecast 
A forecast object (a validated data.table with predicted and
observed values, see 
by 
character vector or 
collapse 
character vector (default: 
A data.table with columns as specified in by
and an additional
column "count" with the number of forecasts.
get_forecast_counts( as_forecast(example_quantile), by = c("model", "target_type") )
get_forecast_counts( as_forecast(example_quantile), by = c("model", "target_type") )
Helper function to infer the forecast type based on a data.frame or similar with forecasts and observed values. See the details section below for information on the different forecast types.
get_forecast_type(data)
get_forecast_type(data)
data 
A data.frame (or similar) with predicted and observed values.
See the details section of 
Character vector of length one with either "binary", "quantile", "sample" or "point".
Various different forecast types / forecast formats are supported. At the moment, those are:
point forecasts
binary forecasts ("soft binary classification")
Probabilistic forecasts in a quantilebased format (a forecast is represented as a set of predictive quantiles)
Probabilistic forecasts in a samplebased format (a forecast is represented as a set of predictive samples)
Forecast types are determined based on the columns present in the input data. Here is an overview of the required format for each forecast type:
All forecast types require a data.frame or similar with columns observed
predicted
, and model
.
Point forecasts require a column observed
of type numeric and a column
predicted
of type numeric.
Binary forecasts require a column observed
of type factor with exactly
two levels and a column predicted
of type numeric with probabilities,
corresponding to the probability that observed
is equal to the second
factor level. See details here for more information.
Quantilebased forecasts require a column observed
of type numeric,
a column predicted
of type numeric, and a column quantile_level
of type
numeric with quantilelevels (between 0 and 1).
Samplebased forecasts require a column observed
of type numeric,
a column predicted
of type numeric, and a column sample_id
of type
numeric with sample indices.
For more information see the vignettes and the example data
(example_quantile, example_sample_continuous, example_sample_discrete,
example_point()
, and example_binary).
Helper function to get the unit of a single forecast, i.e.
the column names that define where a single forecast was made for.
This just takes all columns that are available in the data and subtracts
the columns that are protected, i.e. those returned by
get_protected_columns()
as well as the names of the metrics that were
specified during scoring, if any.
get_forecast_unit(data)
get_forecast_unit(data)
data 
A data.frame (or similar) with predicted and observed values.
See the details section of 
A character vector with the column names that define the unit of a single forecast
In order to score forecasts, scoringutils
needs to know which of the rows
of the data belong together and jointly form a single forecasts. This is
easy e.g. for point forecast, where there is one row per forecast. For
quantile or samplebased forecasts, however, there are multiple rows that
belong to a single forecast.
The forecast unit or unit of a single forecast is then described by the
combination of columns that uniquely identify a single forecast.
For example, we could have forecasts made by different models in various
locations at different time points, each for several weeks into the future.
The forecast unit could then be described as
forecast_unit = c("model", "location", "forecast_date", "forecast_horizon")
.
scoringutils
automatically tries to determine the unit of a single
forecast. It uses all existing columns for this, which means that no columns
must be present that are unrelated to the forecast unit. As a very simplistic
example, if you had an additional row, "even", that is one if the row number
is even and zero otherwise, then this would mess up scoring as scoringutils
then thinks that this column was relevant in defining the forecast unit.
In order to avoid issues, we recommend setting the forecast unit explicitly,
usually through the forecast_unit
argument in as_forecast()
. This will
drop unneeded columns, while making sure that all
necessary, 'protected columns' like "predicted" or "observed" are retained.
When applying a scoring rule via score()
, the names of the scoring rules
become column names of the
resulting data.table. In addition, an attribute metrics
will be
added to the output, holding the names of the scores as a vector.
This is done so that functions like get_forecast_unit()
or
summarise_scores()
can still identify which columns are part of the
forecast unit and which hold a score.
get_metrics()
accesses and returns the metrics
attribute. If there is no
attribute, the function will return NULL
(or, if error = TRUE
will
produce an error instead). In addition, it checks the column names of the
input for consistency with the data stored in the metrics
attribute.
Handling a missing or inconsistent metrics
attribute:
If the metrics attribute is missing or is not consistent with the column names of the data.table, you can either
run score()
again, specifying names for the scoring rules manually, or
add/update the attribute manually using
attr(scores, "metrics") < c("names", "of", "your", "scores")
(the
order does not matter).
get_metrics(scores, error = FALSE)
get_metrics(scores, error = FALSE)
scores 
A data.table with an attribute 
error 
Throw an error if there is no attribute called 
Character vector with the names of the scoring rules that were used
for scoring or NULL
if no scores were computed previously.
Compare scores obtained by different models in a pairwise tournament. All combinations of two models are compared against each other based on the overlapping set of available forecasts common to both models.
The input should be a scores
object as produced by score()
. Note that
adding additional unrelated columns can unpredictably change results, as
all present columns are taken into account when determining the set of
overlapping forecasts between two models.
The output of the pairwise comparisons is a set of mean score ratios, relative skill scores and pvalues.
The following illustrates the pairwise comparison process:
Mean score ratios
For every pair of two models, a mean score ratio is computed. This is simply the mean score of the first model divided by the mean score of the second. Mean score ratios are computed based on the set of overlapping forecasts between the two models. That means that only scores for those targets are taken into account for which both models have submitted a forecast.
(Scaled) Relative skill scores
The relative score of a model is the geometric mean of all mean score ratios which involve that model. If a baseline is provided, scaled relative skill scores will be calculated as well. Scaled relative skill scores are simply the relative skill score of a model divided by the relative skill score of the baseline model.
pvalues
In addition, the function computes pvalues for the comparison between two
models (again based on the set of overlapping forecasts). Pvalues can be
computed in two ways: based on a nonparametric Wilcoxon signedrank test
(internally using wilcox.test()
with paired = TRUE
) or based on a
permutation test. The permutation test is based on the difference in mean
scores between two models. The default null hypothesis is that the mean score
difference is zero (see permutation_test()
).
Adjusted pvalues are computed by calling p.adjust()
on the raw pvalues.
The code for the pairwise comparisons is inspired by an implementation by
Johannes Bracher.
The implementation of the permutation test follows the function
permutationTest
from the surveillance
package by Michael Höhle,
Andrea Riebler and Michaela Paul.
get_pairwise_comparisons( scores, by = "model", metric = intersect(c("wis", "crps", "brier_score"), names(scores)), baseline = NULL, ... )
get_pairwise_comparisons( scores, by = "model", metric = intersect(c("wis", "crps", "brier_score"), names(scores)), baseline = NULL, ... )
scores 
An object of class 
by 
Character vector with column names that define the grouping level
for the pairwise comparisons. By default ( 
metric 
A string with the name of the metric for which a relative skill shall be computed. By default this is either "crps", "wis" or "brier_score" if any of these are available. 
baseline 
A string with the name of a model. If a baseline is
given, then a scaled relative skill with respect to the baseline will be
returned. By default ( 
... 
Additional arguments for the comparison between two models. See

A data.table with the results of pairwise comparisons
containing the mean score ratios (mean_scores_ratio
),
unadjusted (pval
) and adjusted (adj_pval
) pvalues, and relative skill
values of each model (..._relative_skill
). If a baseline model is given
then the scaled relative skill is reported as well
(..._scaled_relative_skill
).
Nikos Bosse [email protected]
Johannes Bracher, [email protected]
scores < score(as_forecast(example_quantile)) pairwise < get_pairwise_comparisons(scores, by = "target_type") pairwise2 < get_pairwise_comparisons( scores, by = "target_type", baseline = "EuroCOVIDhubbaseline" ) library(ggplot2) plot_pairwise_comparisons(pairwise, type = "mean_scores_ratio") + facet_wrap(~target_type)
scores < score(as_forecast(example_quantile)) pairwise < get_pairwise_comparisons(scores, by = "target_type") pairwise2 < get_pairwise_comparisons( scores, by = "target_type", baseline = "EuroCOVIDhubbaseline" ) library(ggplot2) plot_pairwise_comparisons(pairwise, type = "mean_scores_ratio") + facet_wrap(~target_type)
Compute the Probability Integral Transformation (PIT) for validated forecast objects.
get_pit(forecast, by, n_replicates = 100)
get_pit(forecast, by, n_replicates = 100)
forecast 
A forecast object (a validated data.table with predicted and
observed values, see 
by 
Character vector with the columns according to which the
PIT values shall be grouped. If you e.g. have the columns 'model' and
'location' in the input data and want to have a PIT histogram for
every model and location, specify 
n_replicates 
The number of draws for the randomised PIT for discrete predictions. Will be ignored if forecasts are continuous. 
A data.table with PIT values according to the grouping specified in
by
.
Sebastian Funk, Anton Camacho, Adam J. Kucharski, Rachel Lowe, Rosalind M. Eggo, W. John Edmunds (2019) Assessing the performance of realtime epidemic forecasts: A case study of Ebola in the Western Area region of Sierra Leone, 201415, doi:10.1371/journal.pcbi.1006785
result < get_pit(as_forecast(example_sample_continuous), by = "model") plot_pit(result) # example with quantile data result < get_pit(as_forecast(example_quantile), by = "model") plot_pit(result)
result < get_pit(as_forecast(example_sample_continuous), by = "model") plot_pit(result) # example with quantile data result < get_pit(as_forecast(example_quantile), by = "model") plot_pit(result)
Internal helper function to get the type of a vector (usually of observed or predicted values). The function checks whether the input is a factor, or else whether it is integer (or can be coerced to integer) or whether it's continuous.
get_type(x)
get_type(x)
x 
Input the type should be determined for. 
Character vector of length one with either "classification", "integer", or "continuous".
Check whether the observed value is within a given central prediction interval. The prediction interval is defined by a lower and an upper bound formed by a pair of predictive quantiles. For example, a 50% prediction interval is formed by the 0.25 and 0.75 quantiles of the predictive distribution.
interval_coverage(observed, predicted, quantile_level, interval_range = 50)
interval_coverage(observed, predicted, quantile_level, interval_range = 50)
observed 
Numeric vector of size n with the observed values. 
predicted 
Numeric nxN matrix of predictive
quantiles, n (number of rows) being the number of forecasts (corresponding
to the number of observed values) and N
(number of columns) the number of quantiles per forecast.
If 
quantile_level 
Vector of of size N with the quantile levels for which predictions were made. 
interval_range 
A single number with the range of the prediction interval in percent (e.g. 50 for a 50% prediction interval) for which you want to compute interval coverage. 
A vector of length n with elements either TRUE, if the observed value is within the corresponding prediction interval, and FALSE otherwise.
observed < c(1, 15, 22) predicted < rbind( c(1, 0, 1, 2, 3), c(2, 1, 2, 2, 4), c(2, 0, 3, 3, 4) ) quantile_level < c(0.1, 0.25, 0.5, 0.75, 0.9) interval_coverage(observed, predicted, quantile_level)
observed < c(1, 15, 22) predicted < rbind( c(1, 0, 1, 2, 3), c(2, 1, 2, 2, 4), c(2, 0, 3, 3, 4) ) quantile_level < c(0.1, 0.25, 0.5, 0.75, 0.9) interval_coverage(observed, predicted, quantile_level)
Check the agreement between desired and actual interval coverage of a forecast.
The function is similar to interval_coverage()
,
but takes all provided prediction intervals into account and
compares nominal interval coverage (i.e. the desired interval coverage) with
the actual observed interval coverage.
A central symmetric prediction interval is defined by a lower and an upper bound formed by a pair of predictive quantiles. For example, a 50% prediction interval is formed by the 0.25 and 0.75 quantiles of the predictive distribution. Ideally, a forecaster should aim to cover about 50% of all observed values with their 50% prediction intervals, 90% of all observed values with their 90% prediction intervals, and so on.
For every prediction interval, the deviation is computed as the difference
between the observed interval coverage and the nominal interval coverage
For a single observed value and a single prediction interval, coverage is
always either 0 or 1 (FALSE
or TRUE
). This is not the case for a single
observed value and multiple prediction intervals,
but it still doesn't make that much
sense to compare nominal (desired) coverage and actual coverage for a single
observation. In that sense coverage deviation only really starts to make
sense as a metric when averaged across multiple observations).
Positive values of interval coverage deviation are an indication for underconfidence, i.e. the forecaster could likely have issued a narrower forecast. Negative values are an indication for overconfidence, i.e. the forecasts were too narrow.
$\textrm{interval coverage deviation} =
\mathbf{1}(\textrm{observed value falls within interval}) 
\textrm{nominal interval coverage}$
The interval coverage deviation is then averaged across all prediction intervals. The median is ignored when computing coverage deviation.
interval_coverage_deviation(observed, predicted, quantile_level)
interval_coverage_deviation(observed, predicted, quantile_level)
observed 
Numeric vector of size n with the observed values. 
predicted 
Numeric nxN matrix of predictive
quantiles, n (number of rows) being the number of forecasts (corresponding
to the number of observed values) and N
(number of columns) the number of quantiles per forecast.
If 
quantile_level 
Vector of of size N with the quantile levels for which predictions were made. 
A numeric vector of length n with the interval coverage deviation for each forecast (with the forecast itself comprising one or multiple prediction intervals).
observed < c(1, 15, 22) predicted < rbind( c(1, 0, 1, 2, 3), c(2, 1, 2, 2, 4), c(2, 0, 3, 3, 4) ) quantile_level < c(0.1, 0.25, 0.5, 0.75, 0.9) interval_coverage_deviation(observed, predicted, quantile_level)
observed < c(1, 15, 22) predicted < rbind( c(1, 0, 1, 2, 3), c(2, 1, 2, 2, 4), c(2, 0, 3, 3, 4) ) quantile_level < c(0.1, 0.25, 0.5, 0.75, 0.9) interval_coverage_deviation(observed, predicted, quantile_level)
Proper Scoring Rule to score quantile predictions, following Gneiting and Raftery (2007). Smaller values are better.
The score is computed as
$\textrm{score} = (\textrm{upper}  \textrm{lower}) + \frac{2}{\alpha}(\textrm{lower}
 \textrm{observed}) *
\mathbf{1}(\textrm{observed} < \textrm{lower}) +
\frac{2}{\alpha}(\textrm{observed}  \textrm{upper}) *
\mathbf{1}(\textrm{observed} > \textrm{upper})$
where $\mathbf{1}()$
is the indicator function and
indicates how much is outside the prediction interval.
$\alpha$
is the decimal value that indicates how much is outside
the prediction interval.
To improve usability, the user is asked to provide an interval range in
percentage terms, i.e. interval_range = 90 (percent) for a 90 percent
prediction interval. Correspondingly, the user would have to provide the
5% and 95% quantiles (the corresponding alpha would then be 0.1).
No specific distribution is assumed, but the interval has to be symmetric
around the median (i.e you can't use the 0.1 quantile
as the lower bound and the 0.7 quantile as the upper bound).
Nonsymmetric quantiles can be scored using the function quantile_score()
.
interval_score( observed, lower, upper, interval_range, weigh = TRUE, separate_results = FALSE )
interval_score( observed, lower, upper, interval_range, weigh = TRUE, separate_results = FALSE )
observed 
A vector with observed values of size n 
lower 
Vector of size n with the prediction for the lower quantile of the given interval range. 
upper 
Vector of size n with the prediction for the upper quantile of the given interval range. 
interval_range 
Numeric vector (either a single number or a vector of
size n) with the range of the prediction intervals. For example, if you're
forecasting the 0.05 and 0.95 quantile, the interval range would be 90.
The interval range corresponds to 
weigh 
Logical. If 
separate_results 
Logical. If 
Vector with the scoring values, or a list with separate entries if
separate_results
is TRUE
.
Strictly Proper Scoring Rules, Prediction,and Estimation, Tilmann Gneiting and Adrian E. Raftery, 2007, Journal of the American Statistical Association, Volume 102, 2007  Issue 477
Evaluating epidemic forecasts in an interval format, Johannes Bracher, Evan L. Ray, Tilmann Gneiting and Nicholas G. Reich, https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008618 # nolint
observed < rnorm(30, mean = 1:30) interval_range < rep(90, 30) alpha < (100  interval_range) / 100 lower < qnorm(alpha / 2, rnorm(30, mean = 1:30)) upper < qnorm((1  alpha / 2), rnorm(30, mean = 11:40)) scoringutils:::interval_score( observed = observed, lower = lower, upper = upper, interval_range = interval_range ) # gives a warning, as the interval_range should likely be 50 instead of 0.5 scoringutils:::interval_score( observed = 4, upper = 8, lower = 2, interval_range = 0.5 ) # example with missing values and separate results scoringutils:::interval_score( observed = c(observed, NA), lower = c(lower, NA), upper = c(NA, upper), separate_results = TRUE, interval_range = 90 )
observed < rnorm(30, mean = 1:30) interval_range < rep(90, 30) alpha < (100  interval_range) / 100 lower < qnorm(alpha / 2, rnorm(30, mean = 1:30)) upper < qnorm((1  alpha / 2), rnorm(30, mean = 11:40)) scoringutils:::interval_score( observed = observed, lower = lower, upper = upper, interval_range = interval_range ) # gives a warning, as the interval_range should likely be 50 instead of 0.5 scoringutils:::interval_score( observed = 4, upper = 8, lower = 2, interval_range = 0.5 ) # example with missing values and separate results scoringutils:::interval_score( observed = c(observed, NA), lower = c(lower, NA), upper = c(NA, upper), separate_results = TRUE, interval_range = 90 )
Test whether an object is a forecast object (see as_forecast()
for more
information).
You can test for a specific forecast_*
class using the appropriate
is_forecast_*
function.
is_forecast(x) is_forecast_sample(x) is_forecast_binary(x) is_forecast_point(x) is_forecast_quantile(x)
is_forecast(x) is_forecast_sample(x) is_forecast_binary(x) is_forecast_point(x) is_forecast_quantile(x)
x 
An R object. 
is_forecast
: TRUE
if the object is of class forecast
,
FALSE
otherwise.
is_forecast_*
: TRUE
if the object is of class forecast_*
in addition
to class forecast
, FALSE
otherwise.
forecast_binary < as_forecast(example_binary) is_forecast(forecast_binary)
forecast_binary < as_forecast(example_binary) is_forecast(forecast_binary)
Function that shifts a value by some offset and then applies the natural logarithm to it.
log_shift(x, offset = 0, base = exp(1))
log_shift(x, offset = 0, base = exp(1))
x 
vector of input values to be transformed 
offset 
Number to add to the input value before taking the natural logarithm. 
base 
A positive number: the base with respect to which logarithms are computed. Defaults to e = exp(1). 
The output is computed as log(x + offset)
A numeric vector with transformed values
Transformation of forecasts for evaluating predictive performance in an epidemiological context Nikos I. Bosse, Sam Abbott, Anne Cori, Edwin van Leeuwen, Johannes Bracher, Sebastian Funk medRxiv 2023.01.23.23284722 doi:10.1101/2023.01.23.23284722 https://www.medrxiv.org/content/10.1101/2023.01.23.23284722v1 # nolint
log_shift(1:10) log_shift(0:9, offset = 1) transform_forecasts( as_forecast(example_quantile)[observed > 0, ], fun = log_shift, offset = 1 )
log_shift(1:10) log_shift(0:9, offset = 1) transform_forecasts( as_forecast(example_quantile)[observed > 0, ], fun = log_shift, offset = 1 )
This function is a wrapper around the
logs_sample()
function from the
scoringRules package.
The function should be used to score continuous predictions only. While the Log Score is in theory also applicable to discrete forecasts, the problem lies in the implementation: The Log score needs a kernel density estimation, which is not well defined with integervalued Monte Carlo Samples. The Log score can be used for specific discrete probability distributions. See the scoringRules package for more details.
logs_sample(observed, predicted, ...)
logs_sample(observed, predicted, ...)
observed 
A vector with observed values of size n 
predicted 
nxN matrix of predictive samples, n (number of rows) being
the number of data points and N (number of columns) the number of Monte
Carlo samples. Alternatively, 
... 
Additional arguments passed to logs_sample() from the scoringRules package. 
Vector with scores.
Alexander Jordan, Fabian Krüger, Sebastian Lerch, Evaluating Probabilistic Forecasts with scoringRules, https://www.jstatsoft.org/article/view/v090i12
observed < rpois(30, lambda = 1:30) predicted < replicate(200, rpois(n = 30, lambda = 1:30)) logs_sample(observed, predicted)
observed < rpois(30, lambda = 1:30) predicted < replicate(200, rpois(n = 30, lambda = 1:30)) logs_sample(observed, predicted)
Sharpness is the ability of the model to generate predictions within a narrow range and dispersion is the lack thereof. It is a dataindependent measure, and is purely a feature of the forecasts themselves.
Dispersion of predictive samples corresponding to one single observed value is measured as the normalised median of the absolute deviation from the median of the predictive samples. For details, see mad() and the explanations given in Funk et al. (2019)
mad_sample(observed = NULL, predicted, ...)
mad_sample(observed = NULL, predicted, ...)
observed 
Place holder, argument will be ignored and exists only for consistency with other scoring functions. The output does not depend on any observed values. 
predicted 
nxN matrix of predictive samples, n (number of rows) being
the number of data points and N (number of columns) the number of Monte
Carlo samples. Alternatively, 
... 
Additional arguments passed to mad(). 
Vector with dispersion values.
Funk S, Camacho A, Kucharski AJ, Lowe R, Eggo RM, Edmunds WJ (2019) Assessing the performance of realtime epidemic forecasts: A case study of Ebola in the Western Area region of Sierra Leone, 201415. PLoS Comput Biol 15(2): e1006785. doi:10.1371/journal.pcbi.1006785
predicted < replicate(200, rpois(n = 30, lambda = 1:30)) mad_sample(predicted = predicted)
predicted < replicate(200, rpois(n = 30, lambda = 1:30)) mad_sample(predicted = predicted)
Helper function that returns a named list of default scoring rules suitable for binary forecasts.
The default scoring rules are:
"brier_score" = brier_score()
"log_score" = logs_binary()
metrics_binary(select = NULL, exclude = NULL)
metrics_binary(select = NULL, exclude = NULL)
select 
A character vector of scoring rules to select from the list. If

exclude 
A character vector of scoring rules to exclude from the list.
If 
A list of scoring rules.
metrics_binary() metrics_binary(select = "brier_score") metrics_binary(exclude = "log_score")
metrics_binary() metrics_binary(select = "brier_score") metrics_binary(exclude = "log_score")
Helper function that returns a named list of default scoring rules suitable for point forecasts.
The default scoring rules are:
metrics_point(select = NULL, exclude = NULL)
metrics_point(select = NULL, exclude = NULL)
select 
A character vector of scoring rules to select from the list. If

exclude 
A character vector of scoring rules to exclude from the list.
If 
A list of scoring rules.
metrics_point() metrics_point(select = "ape")
metrics_point() metrics_point(select = "ape")
Helper function that returns a named list of default scoring rules suitable for forecasts in a quantilebased format.
The default scoring rules are:
"wis" = wis
"overprediction" = overprediction()
"underprediction" = underprediction()
"dispersion" = dispersion()
"bias" = bias_quantile()
"interval_coverage_50" = interval_coverage()
"interval_coverage_90" = customise_metric( interval_coverage, interval_range = 90 )
"interval_coverage_deviation" = interval_coverage_deviation()
,
"ae_median" = ae_median_quantile()
Note: The interval_coverage_90
scoring rule is created by modifying
interval_coverage()
, making use of the function customise_metric()
.
This construct allows the function to deal with arbitrary arguments in ...
,
while making sure that only those that interval_coverage()
can
accept get passed on to it. interval_range = 90
is set in the function
definition, as passing an argument interval_range = 90
to score()
would
mean it would also get passed to interval_coverage_50
.
metrics_quantile(select = NULL, exclude = NULL)
metrics_quantile(select = NULL, exclude = NULL)
select 
A character vector of scoring rules to select from the list. If

exclude 
A character vector of scoring rules to exclude from the list.
If 
A list of scoring rules.
metrics_quantile() metrics_quantile(select = "wis")
metrics_quantile() metrics_quantile(select = "wis")
Helper function that returns a named list of default scoring rules suitable for forecasts in a samplebased format.
The default scoring rules are:
"crps" = crps_sample()
"log_score" = logs_sample()
"dss" = dss_sample()
"mad" = mad_sample()
"bias" = bias_sample()
"ae_median" = ae_median_sample()
"se_mean" = se_mean_sample()
metrics_sample(select = NULL, exclude = NULL)
metrics_sample(select = NULL, exclude = NULL)
select 
A character vector of scoring rules to select from the list. If

exclude 
A character vector of scoring rules to exclude from the list.
If 
A list of scoring rules.
metrics_sample() metrics_sample(select = "mad")
metrics_sample() metrics_sample(select = "mad")
Uses a Probability integral transformation (PIT) (or a randomised PIT for integer forecasts) to assess the calibration of predictive Monte Carlo samples.
pit_sample(observed, predicted, n_replicates = 100)
pit_sample(observed, predicted, n_replicates = 100)
observed 
A vector with observed values of size n 
predicted 
nxN matrix of predictive samples, n (number of rows) being
the number of data points and N (number of columns) the number of Monte
Carlo samples. Alternatively, 
n_replicates 
The number of draws for the randomised PIT for discrete predictions. Will be ignored if forecasts are continuous. 
Calibration or reliability of forecasts is the ability of a model to correctly identify its own uncertainty in making predictions. In a model with perfect calibration, the observed data at each time point look as if they came from the predictive probability distribution at that time.
Equivalently, one can inspect the probability integral transform of the predictive distribution at time t,
$u_t = F_t (x_t)$
where $x_t$
is the observed data point at time $t \textrm{ in } t_1,
…, t_n$
, n being the number of forecasts, and $F_t$
is
the (continuous) predictive cumulative probability distribution at time t. If
the true probability distribution of outcomes at time t is $G_t$
then the
forecasts $F_t$
are said to be ideal if $F_t = G_t$
at all times t.
In that case, the probabilities $u_t$
are distributed uniformly.
In the case of discrete outcomes such as incidence counts, the PIT is no longer uniform even when forecasts are ideal. In that case a randomised PIT can be used instead:
$u_t = P_t(k_t) + v * (P_t(k_t)  P_t(k_t  1) )$
where $k_t$
is the observed count, $P_t(x)$
is the predictive
cumulative probability of observing incidence k at time t,
$P_t (1) = 0$
by definition and v is standard uniform and independent
of k. If $P_t$
is the true cumulative
probability distribution, then $u_t$
is standard uniform.
The function checks whether integer or continuous forecasts were provided.
It then applies the (randomised) probability integral and tests
the values $u_t$
for uniformity using the
AndersonDarling test.
As a rule of thumb, there is no evidence to suggest a forecasting model is miscalibrated if the pvalue found was greater than a threshold of p >= 0.1, some evidence that it was miscalibrated if 0.01 < p < 0.1, and good evidence that it was miscalibrated if p <= 0.01. However, the ADpvalues may be overly strict and there actual usefulness may be questionable. In this context it should be noted, though, that uniformity of the PIT is a necessary but not sufficient condition of calibration.
A vector with PITvalues. For continuous forecasts, the vector will
correspond to the length of observed
. For integer forecasts, a
randomised PIT will be returned of length
length(observed) * n_replicates
.
Sebastian Funk, Anton Camacho, Adam J. Kucharski, Rachel Lowe, Rosalind M. Eggo, W. John Edmunds (2019) Assessing the performance of realtime epidemic forecasts: A case study of Ebola in the Western Area region of Sierra Leone, 201415, doi:10.1371/journal.pcbi.1006785
## continuous predictions observed < rnorm(20, mean = 1:20) predicted < replicate(100, rnorm(n = 20, mean = 1:20)) pit < pit_sample(observed, predicted) plot_pit(pit) ## integer predictions observed < rpois(20, lambda = 1:20) predicted < replicate(100, rpois(n = 20, lambda = 1:20)) pit < pit_sample(observed, predicted, n_replicates = 30) plot_pit(pit)
## continuous predictions observed < rnorm(20, mean = 1:20) predicted < replicate(100, rnorm(n = 20, mean = 1:20)) pit < pit_sample(observed, predicted) plot_pit(pit) ## integer predictions observed < rpois(20, lambda = 1:20) predicted < replicate(100, rpois(n = 20, lambda = 1:20)) pit < pit_sample(observed, predicted, n_replicates = 30) plot_pit(pit)
Plots a heatmap of correlations between different metrics.
plot_correlations(correlations, digits = NULL)
plot_correlations(correlations, digits = NULL)
correlations 
A data.table of correlations between scores as produced
by 
digits 
A number indicating how many decimal places the correlations
should be rounded to. By default ( 
A ggplot object showing a coloured matrix of correlations between metrics.
A ggplot object with a visualisation of correlations between metrics
scores < score(as_forecast(example_quantile)) correlations < get_correlations( summarise_scores(scores) ) plot_correlations(correlations, digits = 2)
scores < score(as_forecast(example_quantile)) correlations < get_correlations( summarise_scores(scores) ) plot_correlations(correlations, digits = 2)
Visualise Where Forecasts Are Available.
plot_forecast_counts( forecast_counts, x, y = "model", x_as_factor = TRUE, show_counts = TRUE )
plot_forecast_counts( forecast_counts, x, y = "model", x_as_factor = TRUE, show_counts = TRUE )
forecast_counts 
A data.table (or similar) with a column 
x 
Character vector of length one that denotes the name of the column to appear on the xaxis of the plot. 
y 
Character vector of length one that denotes the name of the column to appear on the yaxis of the plot. Default is "model". 
x_as_factor 
Logical (default is 
show_counts 
Logical (default is 
A ggplot object with a plot of forecast counts
library(ggplot2) forecast_counts < get_forecast_counts( as_forecast(example_quantile), by = c("model", "target_type", "target_end_date") ) plot_forecast_counts( forecast_counts, x = "target_end_date", show_counts = FALSE ) + facet_wrap("target_type")
library(ggplot2) forecast_counts < get_forecast_counts( as_forecast(example_quantile), by = c("model", "target_type", "target_end_date") ) plot_forecast_counts( forecast_counts, x = "target_end_date", show_counts = FALSE ) + facet_wrap("target_type")
This function can be used to create a heatmap of one metric across different groups, e.g. the interval score obtained by several forecasting models in different locations.
plot_heatmap(scores, y = "model", x, metric)
plot_heatmap(scores, y = "model", x, metric)
scores 
A data.frame of scores based on quantile forecasts as
produced by 
y 
The variable from the scores you want to show on the yAxis. The default for this is "model" 
x 
The variable from the scores you want to show on the xAxis. This could be something like "horizon", or "location" 
metric 
String, the metric that determines the value and colour shown in the tiles of the heatmap. 
A ggplot object showing a heatmap of the desired metric
scores < score(as_forecast(example_quantile)) scores < summarise_scores(scores, by = c("model", "target_type")) scores < summarise_scores( scores, by = c("model", "target_type"), fun = signif, digits = 2 ) plot_heatmap(scores, x = "target_type", metric = "bias")
scores < score(as_forecast(example_quantile)) scores < summarise_scores(scores, by = c("model", "target_type")) scores < summarise_scores( scores, by = c("model", "target_type"), fun = signif, digits = 2 ) plot_heatmap(scores, x = "target_type", metric = "bias")
Plot interval coverage values (see get_coverage()
for more information).
plot_interval_coverage(coverage, colour = "model")
plot_interval_coverage(coverage, colour = "model")
coverage 
A data frame of coverage values as produced by

colour 
According to which variable shall the graphs be coloured? Default is "model". 
ggplot object with a plot of interval coverage
coverage < get_coverage(as_forecast(example_quantile), by = "model") plot_interval_coverage(coverage)
coverage < get_coverage(as_forecast(example_quantile), by = "model") plot_interval_coverage(coverage)
Creates a heatmap of the ratios or pvalues from a pairwise comparison between models.
plot_pairwise_comparisons( comparison_result, type = c("mean_scores_ratio", "pval") )
plot_pairwise_comparisons( comparison_result, type = c("mean_scores_ratio", "pval") )
comparison_result 
A data.frame as produced by

type 
Character vector of length one that is either "mean_scores_ratio" or "pval". This denotes whether to visualise the ratio or the pvalue of the pairwise comparison. Default is "mean_scores_ratio". 
A ggplot object with a heatmap of mean score ratios from pairwise comparisons.
library(ggplot2) scores < score(as_forecast(example_quantile)) pairwise < get_pairwise_comparisons(scores, by = "target_type") plot_pairwise_comparisons(pairwise, type = "mean_scores_ratio") + facet_wrap(~target_type)
library(ggplot2) scores < score(as_forecast(example_quantile)) pairwise < get_pairwise_comparisons(scores, by = "target_type") plot_pairwise_comparisons(pairwise, type = "mean_scores_ratio") + facet_wrap(~target_type)
Make a simple histogram of the probability integral transformed values to visually check whether a uniform distribution seems likely.
plot_pit(pit, num_bins = "auto", breaks = NULL)
plot_pit(pit, num_bins = "auto", breaks = NULL)
pit 
Either a vector with the PIT values, or a data.table as
produced by 
num_bins 
The number of bins in the PIT histogram, default is "auto".
When 
breaks 
Numeric vector with the break points for the bins in the
PIT histogram. This is preferred when creating a PIT histogram based on
quantilebased data. Default is 
A ggplot object with a histogram of PIT values
# PIT histogram in vector based format observed < rnorm(30, mean = 1:30) predicted < replicate(200, rnorm(n = 30, mean = 1:30)) pit < pit_sample(observed, predicted) plot_pit(pit) # quantilebased pit pit < get_pit(as_forecast(example_quantile), by = "model") plot_pit(pit, breaks = seq(0.1, 1, 0.1)) # samplebased pit pit < get_pit(as_forecast(example_sample_discrete), by = "model") plot_pit(pit)
# PIT histogram in vector based format observed < rnorm(30, mean = 1:30) predicted < replicate(200, rnorm(n = 30, mean = 1:30)) pit < pit_sample(observed, predicted) plot_pit(pit) # quantilebased pit pit < get_pit(as_forecast(example_quantile), by = "model") plot_pit(pit, breaks = seq(0.1, 1, 0.1)) # samplebased pit pit < get_pit(as_forecast(example_sample_discrete), by = "model") plot_pit(pit)
Plot quantile coverage values (see get_coverage()
for more information).
plot_quantile_coverage(coverage, colour = "model")
plot_quantile_coverage(coverage, colour = "model")
coverage 
A data frame of coverage values as produced by

colour 
String, according to which variable shall the graphs be coloured? Default is "model". 
A ggplot object with a plot of interval coverage
coverage < get_coverage(as_forecast(example_quantile), by = "model") plot_quantile_coverage(coverage)
coverage < get_coverage(as_forecast(example_quantile), by = "model") plot_quantile_coverage(coverage)
Visualise the components of the weighted interval score: penalties for overprediction, underprediction and for high dispersion (lack of sharpness).
plot_wis(scores, x = "model", relative_contributions = FALSE, flip = FALSE)
plot_wis(scores, x = "model", relative_contributions = FALSE, flip = FALSE)
scores 
A data.table of scores based on quantile forecasts as
produced by 
x 
The variable from the scores you want to show on the xAxis. Usually this will be "model". 
relative_contributions 
Logical. Show relative contributions instead
of absolute contributions? Default is 
flip 
Boolean (default is 
A ggplot object showing a contributions from the three components of the weighted interval score.
A ggplot object with a visualisation of the WIS decomposition
Bracher J, Ray E, Gneiting T, Reich, N (2020) Evaluating epidemic forecasts in an interval format. https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008618
library(ggplot2) scores < score(as_forecast(example_quantile)) scores < summarise_scores(scores, by = c("model", "target_type")) plot_wis(scores, x = "model", relative_contributions = TRUE ) + facet_wrap(~target_type) plot_wis(scores, x = "model", relative_contributions = FALSE ) + facet_wrap(~target_type, scales = "free_x")
library(ggplot2) scores < score(as_forecast(example_quantile)) scores < summarise_scores(scores, by = c("model", "target_type")) plot_wis(scores, x = "model", relative_contributions = TRUE ) + facet_wrap(~target_type) plot_wis(scores, x = "model", relative_contributions = FALSE ) + facet_wrap(~target_type, scales = "free_x")
This function prints information about a forecast object, including "Forecast type", "Score columns", "Forecast unit".
## S3 method for class 'forecast' print(x, ...)
## S3 method for class 'forecast' print(x, ...)
x 
A forecast object (a validated data.table with predicted and
observed values, see 
... 
Additional arguments for 
Returns x
invisibly.
dat < as_forecast(example_quantile) print(dat)
dat < as_forecast(example_quantile) print(dat)
Proper Scoring Rule to score quantile predictions. Smaller values are better.
The quantile score is closely related to the interval score (see wis()
) and
is the quantile equivalent that works with single quantiles instead of
central prediction intervals.
The quantile score, also called pinball loss, for a single quantile
level $\tau$
is defined as
$\text{QS}_\tau(F, y) = 2 \cdot \{ \mathbf{1}(y \leq q_\tau)  \tau\} \cdot (q_\tau − y) =
\begin{cases}
2 \cdot (1  \tau) * q_\tau  y, & \text{if } y \leq q_\tau\\
2 \cdot \tau * q_\tau  y, & \text{if } y > q_\tau,
\end{cases}$
with $q_\tau$
being the $\tau$
quantile of the predictive
distribution $F$
, and $\mathbf{1}(\cdot)$
the indicator function.
The weighted interval score for a single prediction interval can be obtained as the average of the quantile scores for the lower and upper quantile of that prediction interval:
$\text{WIS}_\alpha(F, y) = \frac{\text{QS}_{\alpha/2}(F, y)
+ \text{QS}_{1  \alpha/2}(F, y)}{2}.$
See the SI of Bracher et al. (2021) for more details.
quantile_score()
returns the average quantile score across the quantile
levels provided. For a set of quantile levels that form pairwise central
prediction intervals, the quantile score is equivalent to the interval score.
quantile_score(observed, predicted, quantile_level, weigh = TRUE)
quantile_score(observed, predicted, quantile_level, weigh = TRUE)
observed 
Numeric vector of size n with the observed values. 
predicted 
Numeric nxN matrix of predictive
quantiles, n (number of rows) being the number of forecasts (corresponding
to the number of observed values) and N
(number of columns) the number of quantiles per forecast.
If 
quantile_level 
Vector of of size N with the quantile levels for which predictions were made. 
weigh 
Logical. If 
Numeric vector of length n with the quantile score. The scores are
averaged across quantile levels if multiple quantile levels are provided
(the result of calling rowMeans()
on the matrix of quantile scores that
is computed based on the observed and predicted values).
Strictly Proper Scoring Rules, Prediction,and Estimation, Tilmann Gneiting and Adrian E. Raftery, 2007, Journal of the American Statistical Association, Volume 102, 2007  Issue 477
Evaluating epidemic forecasts in an interval format, Johannes Bracher, Evan L. Ray, Tilmann Gneiting and Nicholas G. Reich, 2021, https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008618
observed < rnorm(10, mean = 1:10) alpha < 0.5 lower < qnorm(alpha / 2, observed) upper < qnorm((1  alpha / 2), observed) qs_lower < quantile_score(observed, predicted = matrix(lower), quantile_level = alpha / 2 ) qs_upper < quantile_score(observed, predicted = matrix(upper), quantile_level = 1  alpha / 2 ) interval_score < (qs_lower + qs_upper) / 2 interval_score2 < quantile_score( observed, predicted = cbind(lower, upper), quantile_level = c(alpha / 2, 1  alpha / 2) ) # this is the same as the following wis( observed, predicted = cbind(lower, upper), quantile_level = c(alpha / 2, 1  alpha / 2) )
observed < rnorm(10, mean = 1:10) alpha < 0.5 lower < qnorm(alpha / 2, observed) upper < qnorm((1  alpha / 2), observed) qs_lower < quantile_score(observed, predicted = matrix(lower), quantile_level = alpha / 2 ) qs_upper < quantile_score(observed, predicted = matrix(upper), quantile_level = 1  alpha / 2 ) interval_score < (qs_lower + qs_upper) / 2 interval_score2 < quantile_score( observed, predicted = cbind(lower, upper), quantile_level = c(alpha / 2, 1  alpha / 2) ) # this is the same as the following wis( observed, predicted = cbind(lower, upper), quantile_level = c(alpha / 2, 1  alpha / 2) )
This is a wrapper function designed to run a function safely when it is not completely clear what arguments could be passed to the function.
All named arguments in ...
that are not accepted by fun
are removed.
All unnamed arguments are passed on to the function. In case fun
errors,
the error will be converted to a warning and run_safely
returns NULL
.
run_safely
can be useful when constructing functions to be used as
metrics in score()
.
run_safely(..., fun, metric_name)
run_safely(..., fun, metric_name)
... 
Arguments to pass to 
fun 
A function to execute. 
metric_name 
A character string with the name of the metric. Used to
provide a more informative warning message in case 
The result of fun
or NULL
if fun
errors
f < function(x) {x} scoringutils:::run_safely(2, fun = f, metric_name = "f") scoringutils:::run_safely(2, y = 3, fun = f, metric_name = "f") scoringutils:::run_safely(fun = f, metric_name = "f") scoringutils:::run_safely(y = 3, fun = f, metric_name = "f")
f < function(x) {x} scoringutils:::run_safely(2, fun = f, metric_name = "f") scoringutils:::run_safely(2, y = 3, fun = f, metric_name = "f") scoringutils:::run_safely(fun = f, metric_name = "f") scoringutils:::run_safely(y = 3, fun = f, metric_name = "f")
Transform data from a format that is based on predictive samples to a format based on plain quantiles.
sample_to_quantile( forecast, quantile_level = c(0.05, 0.25, 0.5, 0.75, 0.95), type = 7 )
sample_to_quantile( forecast, quantile_level = c(0.05, 0.25, 0.5, 0.75, 0.95), type = 7 )
forecast 
A 
quantile_level 
A numeric vector of quantile levels for which quantiles will be computed. 
type 
Type argument passed down to the quantile function. For more
information, see 
a data.table in a long interval range format
sample_to_quantile(as_forecast(example_sample_discrete))
sample_to_quantile(as_forecast(example_sample_discrete))
score()
applies a selection of scoring metrics to a forecast
object (a data.table with forecasts and observations) as produced by
as_forecast()
.
score()
is a generic that dispatches to different methods depending on the
class of the input data.
See the details section for more information on forecast types and input formats. For additional help and examples, check out the Getting Started Vignette as well as the paper Evaluating Forecasts with scoringutils in R.
score(forecast, metrics, ...) ## S3 method for class 'forecast_binary' score(forecast, metrics = metrics_binary(), ...) ## S3 method for class 'forecast_point' score(forecast, metrics = metrics_point(), ...) ## S3 method for class 'forecast_sample' score(forecast, metrics = metrics_sample(), ...) ## S3 method for class 'forecast_quantile' score(forecast, metrics = metrics_quantile(), ...)
score(forecast, metrics, ...) ## S3 method for class 'forecast_binary' score(forecast, metrics = metrics_binary(), ...) ## S3 method for class 'forecast_point' score(forecast, metrics = metrics_point(), ...) ## S3 method for class 'forecast_sample' score(forecast, metrics = metrics_sample(), ...) ## S3 method for class 'forecast_quantile' score(forecast, metrics = metrics_quantile(), ...)
forecast 
A forecast object (a validated data.table with predicted and
observed values, see 
metrics 
A named list of scoring functions. Names will be used as
column names in the output. See 
... 
Additional arguments. Currently unused but allows for future
extensions. If you want to pass arguments to individual metrics, use

An object of class scores
. This object is a data.table with
unsummarised scores (one score per forecast) and has an additional attribute
metrics
with the names of the metrics used for scoring. See
summarise_scores()
) for information on how to summarise
scores.
Various different forecast types / forecast formats are supported. At the moment, those are:
point forecasts
binary forecasts ("soft binary classification")
Probabilistic forecasts in a quantilebased format (a forecast is represented as a set of predictive quantiles)
Probabilistic forecasts in a samplebased format (a forecast is represented as a set of predictive samples)
Forecast types are determined based on the columns present in the input data. Here is an overview of the required format for each forecast type: