Title: | Utilities for Scoring and Assessing Predictions |
---|---|
Description: | scoringutils facilitates the evaluation of forecasts in a convenient framework based on data.table. It allows user to to check their forecasts and diagnose issues, to visualise forecasts and missing data, to transform data before scoring, to handle missing forecasts, to aggregate scores, and to visualise the results of the evaluation. The package mostly focuses on the evaluation of probabilistic forecasts and allows evaluating several different forecast types and input formats. Find more information about the package in the Vignettes as well as in the accompanying paper, <doi:10.48550/arXiv.2205.07090>. |
Authors: | Nikos Bosse [aut, cre] , Sam Abbott [aut] , Hugo Gruson [aut] , Johannes Bracher [ctb] , Toshiaki Asakura [ctb] , James Mba Azam [ctb] , Sebastian Funk [aut], Michael Chirico [ctb] |
Maintainer: | Nikos Bosse <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.2.2.9000 |
Built: | 2024-09-30 23:16:21 UTC |
Source: | https://github.com/epiforecasts/scoringutils |
Adds a columns with relative skills computed by running
pairwise comparisons on the scores.
For more information on
the computation of relative skill, see get_pairwise_comparisons()
.
Relative skill will be calculated for the aggregation level specified in
by
.
add_relative_skill( scores, compare = "model", by = NULL, metric = intersect(c("wis", "crps", "brier_score"), names(scores)), baseline = NULL )
add_relative_skill( scores, compare = "model", by = NULL, metric = intersect(c("wis", "crps", "brier_score"), names(scores)), baseline = NULL )
scores |
An object of class |
compare |
Character vector with a single colum name that defines the elements for the pairwise comparison. For example, if this is set to "model" (the default), then elements of the "model" column will be compared. |
by |
Character vector with column names that define further grouping
levels for the pairwise comparisons. By default this is |
metric |
A string with the name of the metric for which a relative skill shall be computed. By default this is either "crps", "wis" or "brier_score" if any of these are available. |
baseline |
A string with the name of a model. If a baseline is
given, then a scaled relative skill with respect to the baseline will be
returned. By default ( |
Compute the absolute error of the median calculated as
The median prediction is the predicted value for which quantile_level == 0.5,
the function therefore requires 0.5 to be among the quantile levels in
quantile_level
.
ae_median_quantile(observed, predicted, quantile_level)
ae_median_quantile(observed, predicted, quantile_level)
observed |
Numeric vector of size n with the observed values. |
predicted |
Numeric nxN matrix of predictive
quantiles, n (number of rows) being the number of forecasts (corresponding
to the number of observed values) and N
(number of columns) the number of quantiles per forecast.
If |
quantile_level |
Vector of of size N with the quantile levels for which predictions were made. |
Numeric vector of length N with the absolute error of the median.
observed <- rnorm(30, mean = 1:30) predicted_values <- replicate(3, rnorm(30, mean = 1:30)) ae_median_quantile( observed, predicted_values, quantile_level = c(0.2, 0.5, 0.8) )
observed <- rnorm(30, mean = 1:30) predicted_values <- replicate(3, rnorm(30, mean = 1:30)) ae_median_quantile( observed, predicted_values, quantile_level = c(0.2, 0.5, 0.8) )
Absolute error of the median calculated as
ae_median_sample(observed, predicted)
ae_median_sample(observed, predicted)
observed |
A vector with observed values of size n |
predicted |
nxN matrix of predictive samples, n (number of rows) being
the number of data points and N (number of columns) the number of Monte
Carlo samples. Alternatively, |
vector with the scoring values
observed <- rnorm(30, mean = 1:30) predicted_values <- matrix(rnorm(30, mean = 1:30)) ae_median_sample(observed, predicted_values)
observed <- rnorm(30, mean = 1:30) predicted_values <- matrix(rnorm(30, mean = 1:30)) ae_median_sample(observed, predicted_values)
forecast
objectThere are several “as_forecast_functions to process and validate a data.frame (or similar) or similar with forecasts and observations. If the input passes all input checks, those functions will be converted to a
forecast' object. A forecast object is a 'data.table' with a
class 'forecast' and an additional class that depends on the forecast type.
Every forecast type has its own 'as_forecast_
The as_forecast_<type>()
functions give users some control over how their
data is parsed.
Using the arguments observed
, predicted
, etc. users can rename
existing columns of their input data to match the required columns for a
forecast object. Using the argument forecast_unit
, users can specify the
the columns that uniquely identify a single forecast (and remove the others,
see docs for the internal set_forecast_unit()
for details).
The following functions are available:
data |
A data.frame (or similar) with predicted and observed values.
See the details section of |
forecast_unit |
(optional) Name of the columns in |
observed |
(optional) Name of the column in |
predicted |
(optional) Name of the column in |
Depending on the forecast type, an object of the following class will be returned:
forecast_binary
for binary forecasts
forecast_point
for point forecasts
forecast_sample
for sample-based forecasts
forecast_quantile
for quantile-based forecasts
Various different forecast types / forecast formats are supported. At the moment, those are:
point forecasts
binary forecasts ("soft binary classification")
nominal forecasts ("soft classification with multiple unordered classes")
Probabilistic forecasts in a quantile-based format (a forecast is represented as a set of predictive quantiles)
Probabilistic forecasts in a sample-based format (a forecast is represented as a set of predictive samples)
Forecast types are determined based on the columns present in the input data. Here is an overview of the required format for each forecast type:
All forecast types require a data.frame or similar with columns observed
predicted
, and model
.
Point forecasts require a column observed
of type numeric and a column
predicted
of type numeric.
Binary forecasts require a column observed
of type factor with exactly
two levels and a column predicted
of type numeric with probabilities,
corresponding to the probability that observed
is equal to the second
factor level. See details here for more information.
Nominal forecasts require a column observed
of type factor with N levels,
(where N is the number of possible outcomes), a column predicted
of type
numeric with probabilities (which sum to one across all possible outcomes),
and a column predicted_label
of type factor with N levels, denoting the
outcome for which a probability is given. Forecasts must be complete, i.e.
there must be a probability assigned to every possible outcome.
Quantile-based forecasts require a column observed
of type numeric,
a column predicted
of type numeric, and a column quantile_level
of type
numeric with quantile-levels (between 0 and 1).
Sample-based forecasts require a column observed
of type numeric,
a column predicted
of type numeric, and a column sample_id
of type
numeric with sample indices.
For more information see the vignettes and the example data
(example_quantile, example_sample_continuous, example_sample_discrete,
example_point()
, example_binary, and example_nominal).
In order to score forecasts, scoringutils
needs to know which of the rows
of the data belong together and jointly form a single forecasts. This is
easy e.g. for point forecast, where there is one row per forecast. For
quantile or sample-based forecasts, however, there are multiple rows that
belong to a single forecast.
The forecast unit or unit of a single forecast is then described by the
combination of columns that uniquely identify a single forecast.
For example, we could have forecasts made by different models in various
locations at different time points, each for several weeks into the future.
The forecast unit could then be described as
forecast_unit = c("model", "location", "forecast_date", "forecast_horizon")
.
scoringutils
automatically tries to determine the unit of a single
forecast. It uses all existing columns for this, which means that no columns
must be present that are unrelated to the forecast unit. As a very simplistic
example, if you had an additional row, "even", that is one if the row number
is even and zero otherwise, then this would mess up scoring as scoringutils
then thinks that this column was relevant in defining the forecast unit.
In order to avoid issues, we recommend setting the forecast unit explicitly,
usually through the forecast_unit
argument in the as_forecast()
functions. This will drop unneeded columns, while making sure that all
necessary, 'protected columns' like "predicted" or "observed" are retained.
Other functions to create forecast objects:
as_forecast_binary()
,
as_forecast_nominal()
,
as_forecast_point()
,
as_forecast_quantile()
,
as_forecast_sample()
as_forecast_binary(example_binary) as_forecast_quantile( example_quantile, forecast_unit = c("model", "target_type", "target_end_date", "horizon", "location") )
as_forecast_binary(example_binary) as_forecast_quantile( example_quantile, forecast_unit = c("model", "target_type", "target_end_date", "horizon", "location") )
forecast
object for binary forecastsCreate a forecast
object for binary forecasts. See more information on
forecast types and expected input formats by calling ?
as_forecast()
.
as_forecast_binary( data, forecast_unit = NULL, observed = NULL, predicted = NULL )
as_forecast_binary( data, forecast_unit = NULL, observed = NULL, predicted = NULL )
data |
A data.frame (or similar) with predicted and observed values.
See the details section of |
forecast_unit |
(optional) Name of the columns in |
observed |
(optional) Name of the column in |
predicted |
(optional) Name of the column in |
Other functions to create forecast objects:
as_forecast
,
as_forecast_nominal()
,
as_forecast_point()
,
as_forecast_quantile()
,
as_forecast_sample()
as_forecast_<type>
functionsCommon functionality for as_forecast_<type>
functions
as_forecast_generic( data, forecast_unit = NULL, observed = NULL, predicted = NULL )
as_forecast_generic( data, forecast_unit = NULL, observed = NULL, predicted = NULL )
data |
A data.frame (or similar) with predicted and observed values.
See the details section of |
forecast_unit |
(optional) Name of the columns in |
observed |
(optional) Name of the column in |
predicted |
(optional) Name of the column in |
This function splits out part of the functionality of
as_forecast_<type>
that is the same for all as_forecast_<type>
functions.
It renames the required columns, where appropriate, and sets the forecast
unit.
forecast
object for nominal forecastsNominal forecasts are a form of categorical forecasts where the possible outcomes that the observed values can assume are not ordered. In that sense, Nominal forecasts represent a generalisation of binary forecasts.
as_forecast_nominal( data, forecast_unit = NULL, observed = NULL, predicted = NULL, predicted_label = NULL )
as_forecast_nominal( data, forecast_unit = NULL, observed = NULL, predicted = NULL, predicted_label = NULL )
data |
A data.frame (or similar) with predicted and observed values.
See the details section of |
forecast_unit |
(optional) Name of the columns in |
observed |
(optional) Name of the column in |
predicted |
(optional) Name of the column in |
predicted_label |
(optional) Name of the column in |
Other functions to create forecast objects:
as_forecast
,
as_forecast_binary()
,
as_forecast_point()
,
as_forecast_quantile()
,
as_forecast_sample()
forecast
object for point forecastsCreate a forecast
object for point forecasts. See more information on
forecast types and expected input formats by calling ?
as_forecast()
.
When converting a forecast_quantile
object into a forecast_point
object,
the 0.5 quantile is extracted and returned as the point forecast.
as_forecast_point(data, ...) ## Default S3 method: as_forecast_point( data, forecast_unit = NULL, observed = NULL, predicted = NULL, ... ) ## S3 method for class 'forecast_quantile' as_forecast_point(data, ...)
as_forecast_point(data, ...) ## Default S3 method: as_forecast_point( data, forecast_unit = NULL, observed = NULL, predicted = NULL, ... ) ## S3 method for class 'forecast_quantile' as_forecast_point(data, ...)
data |
A data.frame (or similar) with predicted and observed values.
See the details section of |
... |
Unused |
forecast_unit |
(optional) Name of the columns in |
observed |
(optional) Name of the column in |
predicted |
(optional) Name of the column in |
Other functions to create forecast objects:
as_forecast
,
as_forecast_binary()
,
as_forecast_nominal()
,
as_forecast_quantile()
,
as_forecast_sample()
forecast
object for quantile-based forecastsCreate a forecast
object for quantile-based forecasts. See more information
on forecast types and expected input formats by calling ?
as_forecast()
.
When creating a forecast_quantile
object from a forecast_sample
object,
the quantiles are estimated by computing empircal quantiles from the samples
via quantile()
. Note that empirical quantiles are a biased estimator for
the true quantiles in particular in the tails of the distribution and
when the number of available samples is low.
as_forecast_quantile(data, ...) ## Default S3 method: as_forecast_quantile( data, forecast_unit = NULL, observed = NULL, predicted = NULL, quantile_level = NULL, ... ) ## S3 method for class 'forecast_sample' as_forecast_quantile( data, probs = c(0.05, 0.25, 0.5, 0.75, 0.95), type = 7, ... )
as_forecast_quantile(data, ...) ## Default S3 method: as_forecast_quantile( data, forecast_unit = NULL, observed = NULL, predicted = NULL, quantile_level = NULL, ... ) ## S3 method for class 'forecast_sample' as_forecast_quantile( data, probs = c(0.05, 0.25, 0.5, 0.75, 0.95), type = 7, ... )
data |
A data.frame (or similar) with predicted and observed values.
See the details section of |
... |
Unused |
forecast_unit |
(optional) Name of the columns in |
observed |
(optional) Name of the column in |
predicted |
(optional) Name of the column in |
quantile_level |
(optional) Name of the column in |
probs |
A numeric vector of quantile levels for which
quantiles will be computed. Corresponds to the |
type |
Type argument passed down to the quantile function. For more
information, see |
Other functions to create forecast objects:
as_forecast
,
as_forecast_binary()
,
as_forecast_nominal()
,
as_forecast_point()
,
as_forecast_sample()
forecast
object for sample-based forecastsCreate a forecast
object for sample-based forecasts
as_forecast_sample( data, forecast_unit = NULL, observed = NULL, predicted = NULL, sample_id = NULL )
as_forecast_sample( data, forecast_unit = NULL, observed = NULL, predicted = NULL, sample_id = NULL )
data |
A data.frame (or similar) with predicted and observed values.
See the details section of |
forecast_unit |
(optional) Name of the columns in |
observed |
(optional) Name of the column in |
predicted |
(optional) Name of the column in |
sample_id |
(optional) Name of the column in |
Other functions to create forecast objects:
as_forecast
,
as_forecast_binary()
,
as_forecast_nominal()
,
as_forecast_point()
,
as_forecast_quantile()
Function assesses whether input dimensions match. In the following, n is the number of observations / forecasts. Scalar values may be repeated to match the length of the other input. Allowed options are therefore:
observed
is vector of length 1 or length n
predicted
is:
a vector of of length 1 or length n
a matrix with n rows and 1 column
assert_dims_ok_point(observed, predicted)
assert_dims_ok_point(observed, predicted)
observed |
Input to be checked. Should be a factor of length n with
exactly two levels, holding the observed values.
The highest factor level is assumed to be the reference level. This means
that |
predicted |
Input to be checked. |
Returns NULL invisibly if the assertion was successful and throws an error otherwise.
Assert that an object is a forecast object (i.e. a data.table
with a class
forecast
and an additional class forecast_*
corresponding to the forecast
type).
assert_forecast(forecast, forecast_type = NULL, verbose = TRUE, ...) ## Default S3 method: assert_forecast(forecast, forecast_type = NULL, verbose = TRUE, ...) ## S3 method for class 'forecast_binary' assert_forecast(forecast, forecast_type = NULL, verbose = TRUE, ...) ## S3 method for class 'forecast_point' assert_forecast(forecast, forecast_type = NULL, verbose = TRUE, ...) ## S3 method for class 'forecast_quantile' assert_forecast(forecast, forecast_type = NULL, verbose = TRUE, ...) ## S3 method for class 'forecast_sample' assert_forecast(forecast, forecast_type = NULL, verbose = TRUE, ...)
assert_forecast(forecast, forecast_type = NULL, verbose = TRUE, ...) ## Default S3 method: assert_forecast(forecast, forecast_type = NULL, verbose = TRUE, ...) ## S3 method for class 'forecast_binary' assert_forecast(forecast, forecast_type = NULL, verbose = TRUE, ...) ## S3 method for class 'forecast_point' assert_forecast(forecast, forecast_type = NULL, verbose = TRUE, ...) ## S3 method for class 'forecast_quantile' assert_forecast(forecast, forecast_type = NULL, verbose = TRUE, ...) ## S3 method for class 'forecast_sample' assert_forecast(forecast, forecast_type = NULL, verbose = TRUE, ...)
forecast |
A forecast object (a validated data.table with predicted and
observed values, see |
forecast_type |
(optional) The forecast type you expect the forecasts
to have. If the forecast type as determined by |
verbose |
Logical. If |
... |
Currently unused. You cannot pass additional arguments to scoring
functions via |
Returns NULL
invisibly.
Various different forecast types / forecast formats are supported. At the moment, those are:
point forecasts
binary forecasts ("soft binary classification")
nominal forecasts ("soft classification with multiple unordered classes")
Probabilistic forecasts in a quantile-based format (a forecast is represented as a set of predictive quantiles)
Probabilistic forecasts in a sample-based format (a forecast is represented as a set of predictive samples)
Forecast types are determined based on the columns present in the input data. Here is an overview of the required format for each forecast type:
All forecast types require a data.frame or similar with columns observed
predicted
, and model
.
Point forecasts require a column observed
of type numeric and a column
predicted
of type numeric.
Binary forecasts require a column observed
of type factor with exactly
two levels and a column predicted
of type numeric with probabilities,
corresponding to the probability that observed
is equal to the second
factor level. See details here for more information.
Nominal forecasts require a column observed
of type factor with N levels,
(where N is the number of possible outcomes), a column predicted
of type
numeric with probabilities (which sum to one across all possible outcomes),
and a column predicted_label
of type factor with N levels, denoting the
outcome for which a probability is given. Forecasts must be complete, i.e.
there must be a probability assigned to every possible outcome.
Quantile-based forecasts require a column observed
of type numeric,
a column predicted
of type numeric, and a column quantile_level
of type
numeric with quantile-levels (between 0 and 1).
Sample-based forecasts require a column observed
of type numeric,
a column predicted
of type numeric, and a column sample_id
of type
numeric with sample indices.
For more information see the vignettes and the example data
(example_quantile, example_sample_continuous, example_sample_discrete,
example_point()
, example_binary, and example_nominal).
forecast <- as_forecast_binary(example_binary) assert_forecast(forecast)
forecast <- as_forecast_binary(example_binary) assert_forecast(forecast)
The function runs input checks that apply to all input data, regardless of forecast type. The function
asserts that the forecast is a data.table which has columns observed
and
predicted
checks the forecast type and forecast unit
checks there are no duplicate forecasts
if appropriate, checks the number of samples / quantiles is the same for all forecasts.
assert_forecast_generic(data, verbose = TRUE)
assert_forecast_generic(data, verbose = TRUE)
data |
A data.table with forecasts and observed values that should be validated. |
verbose |
Logical. If |
returns the input
Assert that forecast type is as expected
assert_forecast_type(data, actual = get_forecast_type(data), desired = NULL)
assert_forecast_type(data, actual = get_forecast_type(data), desired = NULL)
data |
A forecast object (see |
actual |
The actual forecast type of the data |
desired |
The desired forecast type of the data |
Returns NULL invisibly if the assertion was successful and throws an error otherwise.
Function assesses whether the inputs correspond to the requirements for scoring binary forecasts.
assert_input_binary(observed, predicted)
assert_input_binary(observed, predicted)
observed |
Input to be checked. Should be a factor of length n with
exactly two levels, holding the observed values.
The highest factor level is assumed to be the reference level. This means
that |
predicted |
Input to be checked. |
Returns NULL invisibly if the assertion was successful and throws an error otherwise.
Function assesses whether the inputs correspond to the requirements for scoring interval-based forecasts.
assert_input_interval(observed, lower, upper, interval_range)
assert_input_interval(observed, lower, upper, interval_range)
observed |
Input to be checked. Should be a numeric vector with the observed values of size n. |
lower |
Input to be checked. Should be a numeric vector of size n that holds the predicted value for the lower bounds of the prediction intervals. |
upper |
Input to be checked. Should be a numeric vector of size n that holds the predicted value for the upper bounds of the prediction intervals. |
interval_range |
Input to be checked. Should be a vector of size n that denotes the interval range in percent. E.g. a value of 50 denotes a (25%, 75%) prediction interval. |
Returns NULL invisibly if the assertion was successful and throws an error otherwise.
Function assesses whether the inputs correspond to the requirements for scoring nominal forecasts.
assert_input_nominal(observed, predicted, predicted_label)
assert_input_nominal(observed, predicted, predicted_label)
observed |
Input to be checked. Should be a factor of length n with N levels holding the observed values. n is the number of observations and N is the number of possible outcomes the observed values can assume. output) |
predicted |
Input to be checked. |
predicted_label |
Factor of length N with N levels, where N is the number of possible outcomes the observed values can assume. |
Returns NULL invisibly if the assertion was successful and throws an error otherwise.
Function assesses whether the inputs correspond to the requirements for scoring point forecasts.
assert_input_point(observed, predicted)
assert_input_point(observed, predicted)
observed |
Input to be checked. Should be a numeric vector with the observed values of size n. |
predicted |
Input to be checked. Should be a numeric vector with the predicted values of size n. |
Returns NULL invisibly if the assertion was successful and throws an error otherwise.
Function assesses whether the inputs correspond to the requirements for scoring quantile-based forecasts.
assert_input_quantile( observed, predicted, quantile_level, unique_quantile_levels = TRUE )
assert_input_quantile( observed, predicted, quantile_level, unique_quantile_levels = TRUE )
observed |
Input to be checked. Should be a numeric vector with the observed values of size n. |
predicted |
Input to be checked. Should be nxN matrix of predictive
quantiles, n (number of rows) being the number of data points and N
(number of columns) the number of quantiles per forecast.
If |
quantile_level |
Input to be checked. Should be a vector of size N that denotes the quantile levels corresponding to the columns of the prediction matrix. |
unique_quantile_levels |
Whether the quantile levels are required to be
unique ( |
Returns NULL invisibly if the assertion was successful and throws an error otherwise.