Title: | Flexible Hierarchical Nowcasting |
---|---|
Description: | Tools to enable flexible and efficient hierarchical nowcasting of right-truncated epidemiological time-series using a semi-mechanistic Bayesian model with support for a range of reporting and generative processes. Nowcasting, in this context, is gaining situational awareness using currently available observations and the reporting patterns of historical observations. This can be useful when tracking the spread of infectious disease in real-time: without nowcasting, changes in trends can be obfuscated by partial reporting or their detection may be delayed due to the use of simpler methods like truncation. While the package has been designed with epidemiological applications in mind, it could be applied to any set of right-truncated time-series count data. |
Authors: | Sam Abbott [aut, cre] , Adrian Lison [aut] , Sebastian Funk [aut], Carl Pearson [aut] , Hugo Gruson [aut] , Felix Guenther [aut] , Michael DeWitt [aut] , Hannah Choi [ctb], Pratik Gupte [ctb] , Joel Hellewell [ctb] , Luis Rivas [ctb], Sang Woo Park [ctb] , Nathan McIntosh [ctb], James Mba Azam [ctb] , Kath Sherratt [ctb] , Nikos Bosse [ctb] |
Maintainer: | Sam Abbott <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.3.0 |
Built: | 2024-12-14 05:54:02 UTC |
Source: | https://github.com/epiforecasts/epinowcast |
This function calculates and adds the maximum observed delay for each group and reference date in the provided dataset. It first checks the validity of the observation indicator and then computes the maximum delay. If an observation indicator is provided, it further adjusts the maximum observed delay for unobserved data to be negative 1 (indicating no maximum observed).
add_max_observed_delay(new_confirm, observation_indicator = NULL)
add_max_observed_delay(new_confirm, observation_indicator = NULL)
new_confirm |
A data.table containing the columns: "reference_date",
"delay", ".group", "new_confirm", and "max_obs_delay".
As produced by |
observation_indicator |
A character string specifying the column name
in |
A data.table with the original columns of new_confirm
and an
additional "max_obs_delay" column representing the maximum observed delay
for each group and reference date. If an observation indicator is provided,
unobserved data will have a "max_obs_delay" value of -1.
Helper functions for model modules
add_pmfs()
,
convolution_matrix()
,
enw_reference_by_report()
,
enw_reps_with_complete_refs()
,
extract_obs_metadata()
,
extract_sparse_matrix()
,
latest_obs_as_matrix()
,
simulate_double_censored_pmf()
This function allows the addition of probability mass functions (PMFs) to produce a new PMF. This is useful for example in the context of reporting delays where the PMF of the sum of two Poisson distributions is the convolution of the PMFs.
add_pmfs(pmfs)
add_pmfs(pmfs)
pmfs |
A list of vectors describing the probability mass functions to |
A vector describing the probability mass function of the sum of the
Helper functions for model modules
add_max_observed_delay()
,
convolution_matrix()
,
enw_reference_by_report()
,
enw_reps_with_complete_refs()
,
extract_obs_metadata()
,
extract_sparse_matrix()
,
latest_obs_as_matrix()
,
simulate_double_censored_pmf()
# Sample and analytical PMFs for two Poisson distributions x <- rpois(10000, 5) xpmf <- dpois(0:20, 5) y <- rpois(10000, 7) ypmf <- dpois(0:20, 7) # Add sampled Poisson distributions up to get combined distribution z <- x + y # Analytical convolution of PMFs conv_pmf <- add_pmfs(list(xpmf, ypmf)) conv_cdf <- cumsum(conv_pmf) # Empirical convolution of PMFs cdf <- ecdf(z)(0:42) # Compare sampled and analytical CDFs plot(conv_cdf) lines(cdf, col = "black")
# Sample and analytical PMFs for two Poisson distributions x <- rpois(10000, 5) xpmf <- dpois(0:20, 5) y <- rpois(10000, 7) ypmf <- dpois(0:20, 7) # Add sampled Poisson distributions up to get combined distribution z <- x + y # Analytical convolution of PMFs conv_pmf <- add_pmfs(list(xpmf, ypmf)) conv_cdf <- cumsum(conv_pmf) # Empirical convolution of PMFs cdf <- ecdf(z)(0:42) # Compare sampled and analytical CDFs plot(conv_cdf) lines(cdf, col = "black")
This function takes a data.table and applies a rolling sum over a given timestep, aggregating by specified columns. It's particularly useful for aggregating observations over certain periods.
aggregate_rolling_sum(dt, internal_timestep, by = NULL)
aggregate_rolling_sum(dt, internal_timestep, by = NULL)
dt |
A |
internal_timestep |
An integer indicating the period over which to aggregate. |
by |
A character vector specifying the columns to aggregate by. |
A modified data.table with aggregated observations.
Utility functions
coerce_date()
,
coerce_dt()
,
date_to_numeric_modulus()
,
get_internal_timestep()
,
is.Date()
,
stan_fns_as_string()
Converts formulas to strings
as_string_formula(formula)
as_string_formula(formula)
formula |
A model formula that may use standard fixed
effects, random effects using lme4 syntax (see |
A character string of the supplied formula
Functions used to help convert formulas into model designs
construct_re()
,
construct_rw()
,
enw_formula()
,
enw_manual_formula()
,
parse_formula()
,
remove_rw_terms()
,
re()
,
rw_terms()
,
rw()
,
split_formula_to_terms()
epinowcast:::as_string_formula(~ 1 + age_group)
epinowcast:::as_string_formula(~ 1 + age_group)
This function verifies if the difference in calendar dates in the provided observations corresponds to the provided timestep of "month".
check_calendar_timestep(dates, date_var, exact = TRUE)
check_calendar_timestep(dates, date_var, exact = TRUE)
dates |
Vector of Date class representing dates. |
date_var |
The variable in |
exact |
Logical, if |
This function is used for its side effect of stopping if the check fails. If the check passes, the function returns invisibly.
Functions used for checking inputs
check_group_date_unique()
,
check_group()
,
check_max_delay()
,
check_modules_compatible()
,
check_module()
,
check_numeric_timestep()
,
check_observation_indicator()
,
check_quantiles()
,
check_timestep_by_date()
,
check_timestep_by_group()
,
check_timestep()
Check observations for reserved grouping variables
check_group(obs)
check_group(obs)
obs |
An object that will be |
The obs
object, which will be modifiable in place.
Functions used for checking inputs
check_calendar_timestep()
,
check_group_date_unique()
,
check_max_delay()
,
check_modules_compatible()
,
check_module()
,
check_numeric_timestep()
,
check_observation_indicator()
,
check_quantiles()
,
check_timestep_by_date()
,
check_timestep_by_group()
,
check_timestep()
reference_date
and report_date
This function checks that the input data is stratified by
reference_date
, report_date
, and .group.
It does this by counting the
number of observations for each combination of these variables, and
throwing a warning if any combination has more than one observation.
check_group_date_unique(obs)
check_group_date_unique(obs)
obs |
An object that will be |
Functions used for checking inputs
check_calendar_timestep()
,
check_group()
,
check_max_delay()
,
check_modules_compatible()
,
check_module()
,
check_numeric_timestep()
,
check_observation_indicator()
,
check_quantiles()
,
check_timestep_by_date()
,
check_timestep_by_group()
,
check_timestep()
Check if maximum delay specified by the user is long enough and raise potential warnings. This is achieved by computing the share of reference dates where the cumulative case count is below some aspired coverage.
check_max_delay( data, max_delay = data$max_delay, cum_coverage = 0.8, maxdelay_quantile_outlier = 0.97, warn = TRUE, warn_internal = FALSE )
check_max_delay( data, max_delay = data$max_delay, cum_coverage = 0.8, maxdelay_quantile_outlier = 0.97, warn = TRUE, warn_internal = FALSE )
data |
Output from |
max_delay |
The maximum number of days to model in the delay
distribution. Must be an integer greater than or equal to 1. Observations
with delays larger then the maximum delay will be dropped. If the specified
maximum delay is too short, nowcasts can be biased as important parts of the
true delay distribution are cut off. At the same time, computational cost
scales non-linearly with this setting, so you want the maximum delay to be as
long as necessary, but not much longer. Consider what delays are realistic
for your application, and when in doubt, check if increasing the maximum
delay noticeably changes the delay distribution or nowcasts as estimated by
epinowcast. If it does, your maximum delay may still be too short.
Note that delays are zero indexed and so include the reference date and
|
cum_coverage |
The aspired percentage of cases that the maximum delay should cover. Defaults to 0.8 (80%). |
maxdelay_quantile_outlier |
Only reference dates sufficiently far in the past, determined based on the maximum observed delay, are included (see details). Instead of the overall maximum observed delay, a quantile of the maximum observed delay over all reference dates is used. This is more robust against outliers. Defaults to 0.97 (97%). |
warn |
Should a warning be issued if the cumulative case count is
below |
warn_internal |
Should only be |
The coverage is with respect to the maximum observed case count for the corresponding reference date. As the maximum observed case count is likely smaller than the true overall case count for not yet fully observed reference dates (due to right truncation), only reference dates that are more than the maximum observed delay ago are included. Still, because we can only use the maximum observed delay, not the unknown true maximum delay, the computed coverage values should be interpreted with care, as they are only proxies for the true coverage.
A data.table
with the share of reference dates where the
cumulative case count is below cum_coverage
, stratified by group.
Functions used for checking inputs
check_calendar_timestep()
,
check_group_date_unique()
,
check_group()
,
check_modules_compatible()
,
check_module()
,
check_numeric_timestep()
,
check_observation_indicator()
,
check_quantiles()
,
check_timestep_by_date()
,
check_timestep_by_group()
,
check_timestep()
pobs <- enw_example(type = "preprocessed_observations") check_max_delay(pobs, max_delay = 20, cum_coverage = 0.8)
pobs <- enw_example(type = "preprocessed_observations") check_max_delay(pobs, max_delay = 20, cum_coverage = 0.8)
Check a model module contains the required components
check_module(module)
check_module(module)
module |
A model module. For example |
Functions used for checking inputs
check_calendar_timestep()
,
check_group_date_unique()
,
check_group()
,
check_max_delay()
,
check_modules_compatible()
,
check_numeric_timestep()
,
check_observation_indicator()
,
check_quantiles()
,
check_timestep_by_date()
,
check_timestep_by_group()
,
check_timestep()
Check that model modules have compatible specifications
check_modules_compatible(modules)
check_modules_compatible(modules)
modules |
A list of model modules. |
Functions used for checking inputs
check_calendar_timestep()
,
check_group_date_unique()
,
check_group()
,
check_max_delay()
,
check_module()
,
check_numeric_timestep()
,
check_observation_indicator()
,
check_quantiles()
,
check_timestep_by_date()
,
check_timestep_by_group()
,
check_timestep()
This function verifies if the difference in numeric dates in the provided observations corresponds to the provided timestep.
check_numeric_timestep(dates, date_var, timestep, exact = TRUE)
check_numeric_timestep(dates, date_var, timestep, exact = TRUE)
dates |
Vector of Date class representing dates. |
date_var |
The variable in |
timestep |
Numeric timestep for date difference. |
exact |
Logical, if |
This function is used for its side effect of stopping if the check fails. If the check passes, the function returns invisibly.
Functions used for checking inputs
check_calendar_timestep()
,
check_group_date_unique()
,
check_group()
,
check_max_delay()
,
check_modules_compatible()
,
check_module()
,
check_observation_indicator()
,
check_quantiles()
,
check_timestep_by_date()
,
check_timestep_by_group()
,
check_timestep()
This function verifies if the observation_indicator
within the provided
new_confirm
observations is logical. The check is performed to ensure
that the observation_indicator
is of the correct type.
check_observation_indicator(new_confirm, observation_indicator = NULL)
check_observation_indicator(new_confirm, observation_indicator = NULL)
new_confirm |
A data frame containing the observations to be checked. |
observation_indicator |
A character string specifying the column name
in |
This function is used for its side effect of checking the observation
indicator in new_confirm
. If the check passes, the function returns
invisibly. Otherwise, it stops and returns an error message.
Functions used for checking inputs
check_calendar_timestep()
,
check_group_date_unique()
,
check_group()
,
check_max_delay()
,
check_modules_compatible()
,
check_module()
,
check_numeric_timestep()
,
check_quantiles()
,
check_timestep_by_date()
,
check_timestep_by_group()
,
check_timestep()
Check required quantiles are present
check_quantiles(posterior, req_probs = c(0.5, 0.95, 0.2, 0.8))
check_quantiles(posterior, req_probs = c(0.5, 0.95, 0.2, 0.8))
posterior |
A |
req_probs |
A numeric vector of required probabilities. Default: c(0.5, 0.95, 0.2, 0.8). |
Functions used for checking inputs
check_calendar_timestep()
,
check_group_date_unique()
,
check_group()
,
check_max_delay()
,
check_modules_compatible()
,
check_module()
,
check_numeric_timestep()
,
check_observation_indicator()
,
check_timestep_by_date()
,
check_timestep_by_group()
,
check_timestep()
This function verifies if the difference in dates in the provided
observations corresponds to the provided timestep. If the exact
argument
is set to TRUE, the function checks if all differences exactly match the
timestep; otherwise, it checks if the sum of the differences modulo the
timestep equals zero. If the check fails, the function stops and returns an
error message.
check_timestep( obs, date_var, timestep = "day", exact = TRUE, check_nrow = TRUE )
check_timestep( obs, date_var, timestep = "day", exact = TRUE, check_nrow = TRUE )
obs |
Any of the types supported by |
date_var |
The variable in |
timestep |
The timestep to used. This can be a string ("day", "week", "month") or a numeric whole number representing the number of days. |
exact |
Logical, if |
check_nrow |
Logical, if |
This function is used for its side effect of stopping if the check fails. If the check passes, the function returns invisibly.
Functions used for checking inputs
check_calendar_timestep()
,
check_group_date_unique()
,
check_group()
,
check_max_delay()
,
check_modules_compatible()
,
check_module()
,
check_numeric_timestep()
,
check_observation_indicator()
,
check_quantiles()
,
check_timestep_by_date()
,
check_timestep_by_group()
This function verifies if the difference in dates within each date in the
provided observations corresponds to the provided timestep. This check is
performed for both report_date
and reference_date
and for each group in
obs
.
check_timestep_by_date(obs, timestep = "day", exact = TRUE)
check_timestep_by_date(obs, timestep = "day", exact = TRUE)
obs |
Any of the types supported by |
timestep |
The timestep to used. This can be a string ("day", "week", "month") or a numeric whole number representing the number of days. |
exact |
Logical, if |
This function is used for its side effect of checking the timestep
by date in obs
. If the check passes for all dates, the function
returns invisibly. Otherwise, it stops and returns an error message.
Functions used for checking inputs
check_calendar_timestep()
,
check_group_date_unique()
,
check_group()
,
check_max_delay()
,
check_modules_compatible()
,
check_module()
,
check_numeric_timestep()
,
check_observation_indicator()
,
check_quantiles()
,
check_timestep_by_group()
,
check_timestep()
This function verifies if the difference in dates within each group in the
provided observations corresponds to the provided timestep. This check is
performed for the specified date_var
and for each group in obs
.
check_timestep_by_group(obs, date_var, timestep = "day", exact = TRUE)
check_timestep_by_group(obs, date_var, timestep = "day", exact = TRUE)
obs |
Any of the types supported by |
date_var |
The variable in |
timestep |
The timestep to used. This can be a string ("day", "week", "month") or a numeric whole number representing the number of days. |
exact |
Logical, if |
This function is used for its side effect of checking the timestep
by group in obs
. If the check passes for all groups, the function
returns invisibly. Otherwise, it stops and returns an error message.
Functions used for checking inputs
check_calendar_timestep()
,
check_group_date_unique()
,
check_group()
,
check_max_delay()
,
check_modules_compatible()
,
check_module()
,
check_numeric_timestep()
,
check_observation_indicator()
,
check_quantiles()
,
check_timestep_by_date()
,
check_timestep()
Provides consistent coercion of inputs to IDate with error handling
coerce_date(dates = NULL)
coerce_date(dates = NULL)
dates |
A vector-like input, which the function attempts
to coerce via |
If any of the elements of dates
cannot be coerced,
this function will result in an error, indicating all indices
which cannot be coerced to IDate.
Internal methods of epinowcast assume dates are represented as IDate.
An IDate vector.
Utility functions
aggregate_rolling_sum()
,
coerce_dt()
,
date_to_numeric_modulus()
,
get_internal_timestep()
,
is.Date()
,
stan_fns_as_string()
# works coerce_date(c("2020-05-28", "2020-05-29")) # does not, indicates index 2 is problem tryCatch( coerce_date(c("2020-05-28", "2020-o5-29")), error = function(e) { print(e) } )
# works coerce_date(c("2020-05-28", "2020-05-29")) # does not, indicates index 2 is problem tryCatch( coerce_date(c("2020-05-28", "2020-o5-29")), error = function(e) { print(e) } )
data.table
sProvides consistent coercion of inputs to data.table with error handling, column checking, and optional selection.
coerce_dt( data, select = NULL, required_cols = select, forbidden_cols = NULL, group = FALSE, dates = FALSE, copy = TRUE, msg_required = "The following columns are required: ", msg_forbidden = "The following columns are forbidden: " )
coerce_dt( data, select = NULL, required_cols = select, forbidden_cols = NULL, group = FALSE, dates = FALSE, copy = TRUE, msg_required = "The following columns are required: ", msg_forbidden = "The following columns are forbidden: " )
data |
Any of the types supported by |
select |
An optional character vector of columns to return; unchecked
n.b. it is an error to include ".group"; use |
required_cols |
An optional character vector of required columns |
forbidden_cols |
An optional character vector of forbidden columns |
group |
A logical; ensure the presence of a |
dates |
A logical; ensure the presence of |
copy |
A logical; if |
msg_required |
A character string; for |
msg_forbidden |
A character string; for |
This function provides a single-point function for getting a "local"
version of data provided by the user, in the internally used data.table
format. It also enables selectively copying versus not, as well as checking
for the presence and/or absence of various columns.
While it is intended to address garbage in from the user, it does not generally attempt to address garbage in from the developer - e.g. if asking for overlapping required and forbidden columns (though that will lead to an always-error condition).
A data.table
; the returned object will be a copy, unless
copy = FALSE
, in which case modifications are made in-place
Utility functions
aggregate_rolling_sum()
,
coerce_date()
,
date_to_numeric_modulus()
,
get_internal_timestep()
,
is.Date()
,
stan_fns_as_string()
Constructs random effect terms
construct_re(re, data)
construct_re(re, data)
re |
A random effect as defined using |
data |
A |
A list containing the transformed data ("data"),
fixed effects terms ("terms") and a data.frame
specifying
the random effect structure between these terms (effects
). Note
that if the specified random effect was not a factor it will have been
converted into one.
Functions used to help convert formulas into model designs
as_string_formula()
,
construct_rw()
,
enw_formula()
,
enw_manual_formula()
,
parse_formula()
,
remove_rw_terms()
,
re()
,
rw_terms()
,
rw()
,
split_formula_to_terms()
# Simple examples form <- epinowcast:::parse_formula(~ 1 + (1 | day_of_week)) data <- enw_example("prepr")$metareference[[1]] random_effect <- re(form$random[[1]]) epinowcast:::construct_re(random_effect, data) # A more complex example form <- epinowcast:::parse_formula( ~ 1 + disp + (1 + gear | cyl) + (0 + wt | am) ) random_effect <- re(form$random[[1]]) epinowcast:::construct_re(random_effect, mtcars) random_effect2 <- re(form$random[[2]]) epinowcast:::construct_re(random_effect2, mtcars)
# Simple examples form <- epinowcast:::parse_formula(~ 1 + (1 | day_of_week)) data <- enw_example("prepr")$metareference[[1]] random_effect <- re(form$random[[1]]) epinowcast:::construct_re(random_effect, data) # A more complex example form <- epinowcast:::parse_formula( ~ 1 + disp + (1 + gear | cyl) + (0 + wt | am) ) random_effect <- re(form$random[[1]]) epinowcast:::construct_re(random_effect, mtcars) random_effect2 <- re(form$random[[2]]) epinowcast:::construct_re(random_effect2, mtcars)
This function takes random walks as defined
by rw()
, produces the required additional variables
(denoted using a "c" prefix and constructed using
enw_add_cumulative_membership()
), and then returns the
extended data.frame
along with the new fixed effects and the
random effect structure.
construct_rw(rw, data)
construct_rw(rw, data)
rw |
A random walk term as defined by |
data |
A |
A list containing the following:
data
: The input data.frame
with the addition of the new variables
required by the specified random walk. These are added using
enw_add_cumulative_membership()
.
-terms
: A character vector of new fixed effects terms to add to a model
formula.
effects
: A data.frame
describing the random effect structure of the
new effects.
Functions used to help convert formulas into model designs
as_string_formula()
,
construct_re()
,
enw_formula()
,
enw_manual_formula()
,
parse_formula()
,
remove_rw_terms()
,
re()
,
rw_terms()
,
rw()
,
split_formula_to_terms()
data <- enw_example("preproc")$metareference[[1]] epinowcast:::construct_rw(rw(week), data) epinowcast:::construct_rw(rw(week, day_of_week), data)
data <- enw_example("preproc")$metareference[[1]] epinowcast:::construct_rw(rw(week), data) epinowcast:::construct_rw(rw(week, day_of_week), data)
This function allows the construction of convolution matrices which can be be combined with a vector of primary events to produce a vector of secondary events for example in the form of a renewal equation or to simulate reporting delays. Time-varying delays are supported as well as distribution padding (to allow for use in renewal equation like approaches).
convolution_matrix(dist, t, include_partial = FALSE)
convolution_matrix(dist, t, include_partial = FALSE)
dist |
A vector of list of vectors describing the distribution to be convolved as a probability mass function. |
t |
Integer value indicating the number of time steps to convolve over. |
include_partial |
Logical, defaults to FALSE. If TRUE, the convolution include partially complete secondary events. |
A matrix with each column indicating a primary event and each row indicating a secondary event.
Helper functions for model modules
add_max_observed_delay()
,
add_pmfs()
,
enw_reference_by_report()
,
enw_reps_with_complete_refs()
,
extract_obs_metadata()
,
extract_sparse_matrix()
,
latest_obs_as_matrix()
,
simulate_double_censored_pmf()
# Simple convolution matrix with a static distribution convolution_matrix(c(1, 2, 3), 10) # Include partially reported convolutions convolution_matrix(c(1, 2, 3), 10, include_partial = TRUE) # Use a list of distributions convolution_matrix(rep(list(c(1, 2, 3)), 10), 10) # Use a time-varying list of distributions convolution_matrix(c(rep(list(c(1, 2, 3)), 10), list(c(4, 5, 6))), 11)
# Simple convolution matrix with a static distribution convolution_matrix(c(1, 2, 3), 10) # Include partially reported convolutions convolution_matrix(c(1, 2, 3), 10, include_partial = TRUE) # Use a list of distributions convolution_matrix(rep(list(c(1, 2, 3)), 10), 10) # Use a time-varying list of distributions convolution_matrix(c(rep(list(c(1, 2, 3)), 10), list(c(4, 5, 6))), 11)
This function processes a date column in a data.table
, converting it to a
numeric representation and then computing the modulus with the provided
timestep.
date_to_numeric_modulus(dt, date_column, timestep)
date_to_numeric_modulus(dt, date_column, timestep)
dt |
A data.table. |
date_column |
A character string representing the name of the date column in dt. |
timestep |
An integer representing the internal timestep. |
A modified data.table with two new columns: one for the numeric representation of the date minus the minimum date and another for its modulus with the timestep.
Utility functions
aggregate_rolling_sum()
,
coerce_date()
,
coerce_dt()
,
get_internal_timestep()
,
is.Date()
,
stan_fns_as_string()
Calculate cumulative reported cases from incidence of new reports
enw_add_cumulative(obs, by = NULL, copy = TRUE)
enw_add_cumulative(obs, by = NULL, copy = TRUE)
obs |
A |
by |
A character vector describing the stratification of observations. This defaults to no grouping. This should be used when modelling multiple time series in order to identify them for downstream modelling |
copy |
Should |
The input data.frame
with a new variable confirm
.
Data converters
enw_add_incidence()
,
enw_aggregate_cumulative()
,
enw_cumulative_to_incidence()
,
enw_incidence_to_cumulative()
,
enw_incidence_to_linelist()
,
enw_linelist_to_incidence()
# Default reconstruct incidence dt <- germany_covid19_hosp[location == "DE"][age_group == "00+"] dt <- enw_add_incidence(dt) dt <- dt[, confirm := NULL] enw_add_cumulative(dt) # Make use of maximum reported to calculate empirical daily reporting enw_add_cumulative(dt)
# Default reconstruct incidence dt <- germany_covid19_hosp[location == "DE"][age_group == "00+"] dt <- enw_add_incidence(dt) dt <- dt[, confirm := NULL] enw_add_cumulative(dt) # Make use of maximum reported to calculate empirical daily reporting enw_add_cumulative(dt)
data.frame
This function adds a cumulative membership effect to a data
frame. This is useful for specifying models such as random walks (using
rw()
) where these features can be used in the design matrix with the
appropriate formula. Supports grouping via the optional .group
column.
Note that cumulative membership is indexed to start with zero (i.e. the
first observation is assigned a cumulative membership of zero).
enw_add_cumulative_membership(metaobs, feature, copy = TRUE)
enw_add_cumulative_membership(metaobs, feature, copy = TRUE)
metaobs |
A |
feature |
The name of the column in |
copy |
Should |
A data.frame
with a new columns cfeature$
that contain the
cumulative membership effect for each value of feature
. For example if the
original feature
was week
(with numeric entries 1, 2, 3
) then the new
columns will be cweek1
, cweek2
, and cweek3
.
Functions used to formulate models
enw_add_pooling_effect()
,
enw_design()
,
enw_effects_metadata()
,
enw_one_hot_encode_feature()
metaobs <- data.frame(week = 1:2) enw_add_cumulative_membership(metaobs, "week") metaobs <- data.frame(week = 1:3, .group = c(1,1,2)) enw_add_cumulative_membership(metaobs, "week")
metaobs <- data.frame(week = 1:2) enw_add_cumulative_membership(metaobs, "week") metaobs <- data.frame(week = 1:3, .group = c(1,1,2)) enw_add_cumulative_membership(metaobs, "week")
This helper function takes a data.frame
or data.table
of
observations and adds the delay (numeric, in days) between reference_date
and report_date
for each observation.
enw_add_delay(obs, timestep = "day", copy = TRUE)
enw_add_delay(obs, timestep = "day", copy = TRUE)
obs |
A |
timestep |
The timestep to used. This can be a string ("day", "week", "month") or a numeric whole number representing the number of days. |
copy |
Should |
A data.table
of observations with a new column delay
.
Preprocessing functions
enw_add_max_reported()
,
enw_add_metaobs_features()
,
enw_assign_group()
,
enw_complete_dates()
,
enw_construct_data()
,
enw_extend_date()
,
enw_filter_delay()
,
enw_filter_reference_dates()
,
enw_filter_report_dates()
,
enw_flag_observed_observations()
,
enw_impute_na_observations()
,
enw_latest_data()
,
enw_metadata_delay()
,
enw_metadata()
,
enw_missing_reference()
,
enw_preprocess_data()
,
enw_reporting_triangle_to_long()
,
enw_reporting_triangle()
obs <- data.frame(report_date = as.Date("2021-01-01") + -2:0) obs$reference_date <- as.Date("2021-01-01") enw_add_delay(obs)
obs <- data.frame(report_date = as.Date("2021-01-01") + -2:0) obs$reference_date <- as.Date("2021-01-01") enw_add_delay(obs)
Calculate incidence of new reports from cumulative reports
enw_add_incidence(obs, set_negatives_to_zero = TRUE, by = NULL, copy = TRUE)
enw_add_incidence(obs, set_negatives_to_zero = TRUE, by = NULL, copy = TRUE)
obs |
A |
set_negatives_to_zero |
Logical, defaults to TRUE. Should negative
counts (for calculated incidence of observations) be set to zero? Currently
downstream modelling does not support negative counts and so setting must be
TRUE if intending to use |
by |
A character vector describing the stratification of observations. This defaults to no grouping. This should be used when modelling multiple time series in order to identify them for downstream modelling |
copy |
Should |
The input data.frame
with a new variable new_confirm
. If
max_confirm
is present in the data.frame
, then the proportion
reported on each day (prop_reported
) will also be added.
Data converters
enw_add_cumulative()
,
enw_aggregate_cumulative()
,
enw_cumulative_to_incidence()
,
enw_incidence_to_cumulative()
,
enw_incidence_to_linelist()
,
enw_linelist_to_incidence()
# Default reconstruct incidence dt <- germany_covid19_hosp[location == "DE"][age_group == "00+"] enw_add_incidence(dt) # Make use of maximum reported to calculate empirical daily reporting dt <- enw_add_max_reported(dt) enw_add_incidence(dt)
# Default reconstruct incidence dt <- germany_covid19_hosp[location == "DE"][age_group == "00+"] enw_add_incidence(dt) # Make use of maximum reported to calculate empirical daily reporting dt <- enw_add_max_reported(dt) enw_add_incidence(dt)
Add the latest observations to the nowcast output. This is useful for plotting the nowcast against the latest observations.
enw_add_latest_obs_to_nowcast(nowcast, obs)
enw_add_latest_obs_to_nowcast(nowcast, obs)
nowcast |
A |
obs |
An observation |
A data.frame
of nowcast output with the latest observations
added.
Functions used for postprocessing of model fits
enw_nowcast_samples()
,
enw_nowcast_summary()
,
enw_posterior()
,
enw_pp_summary()
,
enw_quantiles_to_long()
,
enw_summarise_samples()
fit <- enw_example("nowcast") obs <- enw_example("obs") nowcast <- summary(fit, type = "nowcast") enw_add_latest_obs_to_nowcast(nowcast, obs)
fit <- enw_example("nowcast") obs <- enw_example("obs") nowcast <- summary(fit, type = "nowcast") enw_add_latest_obs_to_nowcast(nowcast, obs)
reference_date
This is a helper function which adds the maximum (in the sense of latest observed) number of reported cases for each reference_date and computes the proportion of already reported cases for each combination of reference_date and report_date.
enw_add_max_reported(obs, copy = TRUE)
enw_add_max_reported(obs, copy = TRUE)
obs |
A |
copy |
Should |
A data.table with new columns max_confirm
and cum_prop_reported
.
max_confirm
is the maximum number of cases reported for a certain
reference_date. cum_prop_reported
is the proportion of cases for a certain
reference_date that are reported until a given report_day, relative to all
cases so far observed for this reference_date.
Preprocessing functions
enw_add_delay()
,
enw_add_metaobs_features()
,
enw_assign_group()
,
enw_complete_dates()
,
enw_construct_data()
,
enw_extend_date()
,
enw_filter_delay()
,
enw_filter_reference_dates()
,
enw_filter_report_dates()
,
enw_flag_observed_observations()
,
enw_impute_na_observations()
,
enw_latest_data()
,
enw_metadata_delay()
,
enw_metadata()
,
enw_missing_reference()
,
enw_preprocess_data()
,
enw_reporting_triangle_to_long()
,
enw_reporting_triangle()
obs <- data.frame(report_date = as.Date("2021-01-01") + 0:2) obs$reference_date <- as.Date("2021-01-01") obs$confirm <- 1:3 enw_add_max_reported(obs)
obs <- data.frame(report_date = as.Date("2021-01-01") + 0:2) obs$reference_date <- as.Date("2021-01-01") obs$confirm <- 1:3 enw_add_max_reported(obs)
If not already present, annotates time series data with metadata commonly used in models: day of week, and days, weeks, and months since start of time series.
enw_add_metaobs_features( metaobs, holidays = NULL, holidays_to = "Sunday", datecol = "date" )
enw_add_metaobs_features( metaobs, holidays = NULL, holidays_to = "Sunday", datecol = "date" )
metaobs |
Raw data, coercible via |
holidays |
a (potentially empty) vector of dates (or input
coercible to such; see |
holidays_to |
A character string to assign to holidays, when |
datecol |
The column in |
Effects models often need to include covariates for time-based features, such as day of the week (e.g. to reflect different care-seeking and/or reporting behaviour).
This function is called from within enw_preprocess_data()
to systematically
annotate metaobs
with these commonly used metadata, if not already present.
However, it can also be used directly on other data.
A copy of the metaobs
input, with additional columns:
day_of_week
, a factor of values as output from weekdays()
and
possibly as holiday_to
if distinct from weekdays values
day
, numeric, 0 based from start of time series
week
, numeric, 0 based from start of time series
month
, numeric, 0 based from start of time series
Preprocessing functions
enw_add_delay()
,
enw_add_max_reported()
,
enw_assign_group()
,
enw_complete_dates()
,
enw_construct_data()
,
enw_extend_date()
,
enw_filter_delay()
,
enw_filter_reference_dates()
,
enw_filter_report_dates()
,
enw_flag_observed_observations()
,
enw_impute_na_observations()
,
enw_latest_data()
,
enw_metadata_delay()
,
enw_metadata()
,
enw_missing_reference()
,
enw_preprocess_data()
,
enw_reporting_triangle_to_long()
,
enw_reporting_triangle()
# make some example date nat_germany_hosp <- subset( germany_covid19_hosp, location == "DE" & age_group == "80+" )[1:40] basemeta <- enw_add_metaobs_features( nat_germany_hosp, datecol = "report_date" ) basemeta # with holidays - n.b.: holidays not found are silently ignored holidaymeta <- enw_add_metaobs_features( nat_germany_hosp, datecol = "report_date", holidays = c( "2021-04-04", "2021-04-05", "2021-05-01", "2021-05-13", "2021-05-24" ), holidays_to = "Holiday" ) holidaymeta subset(holidaymeta, day_of_week == "Holiday")
# make some example date nat_germany_hosp <- subset( germany_covid19_hosp, location == "DE" & age_group == "80+" )[1:40] basemeta <- enw_add_metaobs_features( nat_germany_hosp, datecol = "report_date" ) basemeta # with holidays - n.b.: holidays not found are silently ignored holidaymeta <- enw_add_metaobs_features( nat_germany_hosp, datecol = "report_date", holidays = c( "2021-04-04", "2021-04-05", "2021-05-01", "2021-05-13", "2021-05-24" ), holidays_to = "Holiday" ) holidaymeta subset(holidaymeta, day_of_week == "Holiday")
This function adds a pooling effect to the metadata
returned by enw_effects_metadata()
. It does this updating the
fixed
column to 0 for the effects that match the string
argument and
adding a new column var_name
that is 1 for the effects that match the
string
argument and 0 otherwise.
enw_add_pooling_effect(effects, var_name = "sd", finder_fn = startsWith, ...)
enw_add_pooling_effect(effects, var_name = "sd", finder_fn = startsWith, ...)
effects |
A
This is the output of |
var_name |
The name of the new column that will be added to the
|
finder_fn |
A function that will be used to find the effects that
match the string. Defaults to |
... |
Additional arguments to |
A data.table
with the following columns:
effects: the name of the effect
fixed: a logical indicating whether the effect is fixed (1) or random (0).
Argument supplied to var_name
: a logical indicating whether the effect
should be pooled (1) or not (0).
Functions used to formulate models
enw_add_cumulative_membership()
,
enw_design()
,
enw_effects_metadata()
,
enw_one_hot_encode_feature()
data <- data.frame(a = 1:3, b = as.character(1:3), c = c(1,1,2)) design <- enw_design(a ~ b + c, data)$design effects <- enw_effects_metadata(design) enw_add_pooling_effect(effects, prefix = "b")
data <- data.frame(a = 1:3, b = as.character(1:3), c = c(1,1,2)) design <- enw_design(a ~ b + c, data)$design effects <- enw_effects_metadata(design) enw_add_pooling_effect(effects, prefix = "b")
This function aggregates observations over a specified timestep,
ensuring alignment on the same day of week for report and reference dates.
It is useful for aggregating data to a weekly timestep, for example which
may be desirable if testing using a weekly timestep or if you are very
concerned about runtime. Note that the start of the timestep will be
determined by min_date
+ a single timestep (i.e. the
first timestep will be "2022-10-23" if the minimum reference date is
"2022-10-16").
enw_aggregate_cumulative( obs, timestep = "day", by = NULL, min_reference_date = min(obs$reference_date, na.rm = TRUE), copy = TRUE )
enw_aggregate_cumulative( obs, timestep = "day", by = NULL, min_reference_date = min(obs$reference_date, na.rm = TRUE), copy = TRUE )
obs |
An object coercible to a |
timestep |
The timestep to used. This can be a string ("day", "week", "month") or a numeric whole number representing the number of days. |
by |
A character vector of variables to also aggregate by (i.e. as well
as using the |
min_reference_date |
The minimum reference date to start the
aggregation from. Note that the timestep will start from the minimum
reference date + a single time step (i.e. the first timestep will be
"2022-10-23" if the minimum reference date is "2022-10-16"). The default
is the minimum reference date in the |
copy |
Should |
A data.table with aggregated observations.
Data converters
enw_add_cumulative()
,
enw_add_incidence()
,
enw_cumulative_to_incidence()
,
enw_incidence_to_cumulative()
,
enw_incidence_to_linelist()
,
enw_linelist_to_incidence()
nat_hosp <- germany_covid19_hosp[location == "DE"][age_group == "00+"] enw_aggregate_cumulative(nat_hosp, timestep = "week")
nat_hosp <- germany_covid19_hosp[location == "DE"][age_group == "00+"] enw_aggregate_cumulative(nat_hosp, timestep = "week")
Assign a group to each row of a data.table. If by
is
specified, then each unique combination of the columns in by
will
be assigned a unique group. If by
is not specified, then all rows
will be assigned to the same group.
enw_assign_group(obs, by = NULL, copy = TRUE)
enw_assign_group(obs, by = NULL, copy = TRUE)
obs |
A |
by |
A character vector of column names to group by. Defaults to an empty vector. |
copy |
A logical; make a copy (default) of |
A data.table
with a .group
column added ordered by .group
and the existing key of obs
.
Preprocessing functions
enw_add_delay()
,
enw_add_max_reported()
,
enw_add_metaobs_features()
,
enw_complete_dates()
,
enw_construct_data()
,
enw_extend_date()
,
enw_filter_delay()
,
enw_filter_reference_dates()
,
enw_filter_report_dates()
,
enw_flag_observed_observations()
,
enw_impute_na_observations()
,
enw_latest_data()
,
enw_metadata_delay()
,
enw_metadata()
,
enw_missing_reference()
,
enw_preprocess_data()
,
enw_reporting_triangle_to_long()
,
enw_reporting_triangle()
obs <- data.frame(x = 1:3, y = 1:3) enw_assign_group(obs) enw_assign_group(obs, by = "x")
obs <- data.frame(x = 1:3, y = 1:3) enw_assign_group(obs) enw_assign_group(obs, by = "x")
Ensures that all reference and report dates are present for
all groups based on the maximum and minimum dates found in the data.
This function may be of use to users when preprocessing their data. In
general all features that you may consider using as grouping variables
or as covariates need to be included in the by
variable.
enw_complete_dates( obs, by = NULL, max_delay, min_date = min(obs$reference_date, na.rm = TRUE), max_date = max(obs$report_date, na.rm = TRUE), timestep = "day", missing_reference = TRUE, completion_beyond_max_report = FALSE, flag_observation = FALSE )
enw_complete_dates( obs, by = NULL, max_delay, min_date = min(obs$reference_date, na.rm = TRUE), max_date = max(obs$report_date, na.rm = TRUE), timestep = "day", missing_reference = TRUE, completion_beyond_max_report = FALSE, flag_observation = FALSE )
obs |
A |
by |
A character vector describing the stratification of observations. This defaults to no grouping. This should be used when modelling multiple time series in order to identify them for downstream modelling |
max_delay |
The maximum number of days to model in the delay
distribution. Must be an integer greater than or equal to 1. Observations
with delays larger then the maximum delay will be dropped. If the specified
maximum delay is too short, nowcasts can be biased as important parts of the
true delay distribution are cut off. At the same time, computational cost
scales non-linearly with this setting, so you want the maximum delay to be as
long as necessary, but not much longer. Consider what delays are realistic
for your application, and when in doubt, check if increasing the maximum
delay noticeably changes the delay distribution or nowcasts as estimated by
epinowcast. If it does, your maximum delay may still be too short.
Note that delays are zero indexed and so include the reference date and
|
min_date |
The minimum date to include in the data. Defaults to the minimum reference date found in the data. |
max_date |
The maximum date to include in the data. Defaults to the maximum report date found in the data. |
timestep |
The timestep to used. This can be a string ("day", "week", "month") or a numeric whole number representing the number of days. |
missing_reference |
Logical, should entries for cases with missing reference date be completed as well?, Default: TRUE |
completion_beyond_max_report |
Logical, should entries be completed beyond the maximum date found in the data? Default: FALSE |
flag_observation |
Logical, should observations that have been
imputed as missing be flagged as not observed?. Makes use of
|
A data.table
with completed entries for all combinations of
reference dates, groups and possible report dates.
Preprocessing functions
enw_add_delay()
,
enw_add_max_reported()
,
enw_add_metaobs_features()
,
enw_assign_group()
,
enw_construct_data()
,
enw_extend_date()
,
enw_filter_delay()
,
enw_filter_reference_dates()
,
enw_filter_report_dates()
,
enw_flag_observed_observations()
,
enw_impute_na_observations()
,
enw_latest_data()
,
enw_metadata_delay()
,
enw_metadata()
,
enw_missing_reference()
,
enw_preprocess_data()
,
enw_reporting_triangle_to_long()
,
enw_reporting_triangle()
obs <- data.frame( report_date = c("2021-10-01", "2021-10-03"), reference_date = "2021-10-01", confirm = 1 ) enw_complete_dates(obs) # Allow completion beyond the maximum date found in the data enw_complete_dates(obs, completion_beyond_max_report = TRUE, max_delay = 10)
obs <- data.frame( report_date = c("2021-10-01", "2021-10-03"), reference_date = "2021-10-01", confirm = 1 ) enw_complete_dates(obs) # Allow completion beyond the maximum date found in the data enw_complete_dates(obs, completion_beyond_max_report = TRUE, max_delay = 10)
This function is used internally by enw_preprocess_data()
to combine
various pieces of processed observed data into a single object. It
is exposed to the user in order to allow for modular data preprocessing
though this is not currently recommended. See documentation and code
of enw_preprocess_data()
for more on the expected inputs.
enw_construct_data( obs, new_confirm, latest, missing_reference, reporting_triangle, metareport, metareference, metadelay, max_delay, timestep, by )
enw_construct_data( obs, new_confirm, latest, missing_reference, reporting_triangle, metareport, metareference, metadelay, max_delay, timestep, by )
obs |
Observations with the addition of empirical reporting proportions and and restricted to the specified maximum delay. |
new_confirm |
Incidence of notifications by reference and report date. Empirical reporting distributions are also added. |
latest |
The latest available observations. |
missing_reference |
A |
reporting_triangle |
Incident observations by report and reference date in the standard reporting triangle matrix format. |
metareport |
Metadata for report dates. |
metareference |
Metadata reference dates derived from observations. |
metadelay |
Metadata for reporting delays produced using
|
max_delay |
Maximum delay to be modelled by epinowcast. |
timestep |
The timestep to used in the process model (i.e. the
reference date model). This can be a string ("day", "week", "month") or a
numeric whole number representing the number of days. If your data does not
have this timestep then you may wish to make use of
|
by |
A character vector describing the stratification of observations. This defaults to no grouping. This should be used when modelling multiple time series in order to identify them for downstream modelling |
A data.table containing processed observations as a series of nested data.frames as well as variables containing metadata. These are:
obs
: (observations with the addition of empirical reporting proportions
and restricted to the specified maximum delay).
new_confirm
: Incidence of notifications by reference and report date.
Empirical reporting distributions are also added.
latest
: The latest available observations.
missing_reference
: Observations missing reference dates.
reporting_triangle
: Incident observations by report and reference date in
the standard reporting triangle matrix format.
metareference
: Metadata reference dates derived from observations.
metrareport
: Metadata for report dates.
metadelay
: Metadata for reporting delays produced using
enw_metadata_delay()
.
max_delay
: Maximum delay to be modelled by epinowcast.
time
: Numeric, number of timepoints in the data.
snapshots
: Numeric, number of available data snapshots to use for
nowcasting.
groups
: Numeric, Number of groups/strata in the supplied observations
(set using by
).
max_date
: The maximum available report date.
Preprocessing functions
enw_add_delay()
,
enw_add_max_reported()
,
enw_add_metaobs_features()
,
enw_assign_group()
,
enw_complete_dates()
,
enw_extend_date()
,
enw_filter_delay()
,
enw_filter_reference_dates()
,
enw_filter_report_dates()
,
enw_flag_observed_observations()
,
enw_impute_na_observations()
,
enw_latest_data()
,
enw_metadata_delay()
,
enw_metadata()
,
enw_missing_reference()
,
enw_preprocess_data()
,
enw_reporting_triangle_to_long()
,
enw_reporting_triangle()
pobs <- enw_example("preprocessed") enw_construct_data( obs = pobs$obs[[1]], new_confirm = pobs$new_confirm[[1]], latest = pobs$latest[[1]], missing_reference = pobs$missing_reference[[1]], reporting_triangle = pobs$reporting_triangle[[1]], metareport = pobs$metareport[[1]], metareference = pobs$metareference[[1]], metadelay = pobs$metadelay[[1]], max_delay = pobs$max_delay, timestep = pobs$timestep[[1]], by = c() )
pobs <- enw_example("preprocessed") enw_construct_data( obs = pobs$obs[[1]], new_confirm = pobs$new_confirm[[1]], latest = pobs$latest[[1]], missing_reference = pobs$missing_reference[[1]], reporting_triangle = pobs$reporting_triangle[[1]], metareport = pobs$metareport[[1]], metareference = pobs$metareference[[1]], metadelay = pobs$metadelay[[1]], max_delay = pobs$max_delay, timestep = pobs$timestep[[1]], by = c() )
This function is a wrapper around stats::model.matrix()
that
can optionally return a sparse design matrix defined as the unique
number of rows in the design matrix and an index vector that
allows the full design matrix to be reconstructed. This is useful
for models that have many repeated rows in the design matrix and that
are computationally expensive to fit. This function also allows
for the specification of contrasts for categorical variables.
enw_design(formula, data, no_contrasts = FALSE, sparse = TRUE, ...)
enw_design(formula, data, no_contrasts = FALSE, sparse = TRUE, ...)
formula |
An R formula. |
data |
A |
no_contrasts |
A vector of variable names that should not be
converted to contrasts. If |
sparse |
Logical, if TRUE return a sparse design matrix. Defaults to TRUE. |
... |
Arguments passed on to |
A list containing the formula, the design matrix, and the index.
Functions used to formulate models
enw_add_cumulative_membership()
,
enw_add_pooling_effect()
,
enw_effects_metadata()
,
enw_one_hot_encode_feature()
data <- data.frame(a = 1:3, b = as.character(1:3), c = c(1,1,2)) enw_design(a ~ b + c, data) enw_design(a ~ b + c, data, no_contrasts = TRUE) enw_design(a ~ b + c, data, no_contrasts = c("b")) enw_design(a ~ c, data, sparse = TRUE) enw_design(a ~ c, data, sparse = FALSE)
data <- data.frame(a = 1:3, b = as.character(1:3), c = c(1,1,2)) enw_design(a ~ b + c, data) enw_design(a ~ b + c, data, no_contrasts = TRUE) enw_design(a ~ b + c, data, no_contrasts = c("b")) enw_design(a ~ c, data, sparse = TRUE) enw_design(a ~ c, data, sparse = FALSE)
This function extracts metadata from a design matrix and returns a data.table with the following columns:
effects: the name of the effect
fixed: a logical indicating whether the effect is fixed (1) or random (0).
It automatically drops the intercept (defined as "(Intercept)").
This function is useful for constructing a model design object for random
effects when used in combination with ewn_add_pooling_effect
.
enw_effects_metadata(design)
enw_effects_metadata(design)
design |
A design matrix as returned by |
A data.table with the following columns:
effects: the name of the effect
fixed: a logical indicating whether the effect is fixed (1) or random (0)
Functions used to formulate models
enw_add_cumulative_membership()
,
enw_add_pooling_effect()
,
enw_design()
,
enw_one_hot_encode_feature()
data <- data.frame(a = 1:3, b = as.character(1:3), c = c(1,1,2)) design <- enw_design(a ~ b + c, data)$design enw_effects_metadata(design)
data <- data.frame(a = 1:3, b = as.character(1:3), c = c(1,1,2)) design <- enw_design(a ~ b + c, data)$design enw_effects_metadata(design)
Loads examples of nowcasts produced using example scripts. Used to streamline
examples, in package tests and to enable users to explore package
functionality without needing to install cmdstanr
.
enw_example( type = c("nowcast", "preprocessed_observations", "observations", "script") )
enw_example( type = c("nowcast", "preprocessed_observations", "observations", "script") )
type |
A character string indicating the example to load. Supported options are
|
Depending on type
, a data.table
of the requested output OR
the file name(s) to generate these outputs (type
= "script")
Package data sets
germany_covid19_hosp
# Load the nowcast enw_example(type = "nowcast") # Load the preprocessed observations enw_example(type = "preprocessed_observations") # Load the latest observations enw_example(type = "observations") # Load the script used to generate these examples # Optionally source this script to regenerate the example readLines(enw_example(type = "script"))
# Load the nowcast enw_example(type = "nowcast") # Load the preprocessed observations enw_example(type = "preprocessed_observations") # Load the latest observations enw_example(type = "observations") # Load the script used to generate these examples # Optionally source this script to regenerate the example readLines(enw_example(type = "script"))
Expectation model module
enw_expectation( r = ~0 + (1 | day:.group), generation_time = 1, observation = ~1, latent_reporting_delay = 1, data, ... )
enw_expectation( r = ~0 + (1 | day:.group), generation_time = 1, observation = ~1, latent_reporting_delay = 1, data, ... )
r |
A formula (as implemented in |
generation_time |
A numeric vector that sums to 1 and defaults to 1. Describes the weighting to apply to previous generations (i.e as part of a renewal equation). When set to 1 (the default) this corresponds to modelling the daily growth rate. |
observation |
A formula (as implemented in |
latent_reporting_delay |
A numeric vector that defaults to 1. Describes the weighting to apply to past and current latent expected observations (from most recent to least). This can be used both to convolve based on some assumed reporting delay and to rescale observations (by multiplying a probability mass function by some fraction) to account ascertainment etc. A list of PMFs can be provided to allow for time-varying PMFs. This should be the same length as the modelled time period plus the length of the generation time if supplied. |
data |
Output from |
... |
Additional parameters passed to |
A list containing the supplied formulas, data passed into a list
describing the models, a data.frame
describing the priors used, and a
function that takes the output data and priors and returns a function that
can be used to sample from a tightened version of the prior distribution.
Model modules
enw_fit_opts()
,
enw_missing()
,
enw_obs()
,
enw_reference()
,
enw_report()
enw_expectation(data = enw_example("preprocessed"))
enw_expectation(data = enw_example("preprocessed"))
Extend a time series with additional dates. This is useful when extending the report dates of a time series to include future dates for nowcasting purposes or to include additional dates for backcasting when using a renewal process as the expectation model.
enw_extend_date( metaobs, days = 20, direction = c("end", "start"), timestep = "day" )
enw_extend_date( metaobs, days = 20, direction = c("end", "start"), timestep = "day" )
metaobs |
A |
days |
Number of days to add to the time series. Defaults to 20. |
direction |
Should new dates be added at the beginning or end of the data. Default is "end" with "start" also available. |
timestep |
The timestep to used. This can be a string ("day", "week", "month") or a numeric whole number representing the number of days. |
A data.table with the same columns as metaobs
but with
additional rows for each date in the range of date
to date + days
(or date - days
if direction = "start"
). An additional variable
observed is added with a value of FALSE for all new dates and TRUE
for all existing dates.
Preprocessing functions
enw_add_delay()
,
enw_add_max_reported()
,
enw_add_metaobs_features()
,
enw_assign_group()
,
enw_complete_dates()
,
enw_construct_data()
,
enw_filter_delay()
,
enw_filter_reference_dates()
,
enw_filter_report_dates()
,
enw_flag_observed_observations()
,
enw_impute_na_observations()
,
enw_latest_data()
,
enw_metadata_delay()
,
enw_metadata()
,
enw_missing_reference()
,
enw_preprocess_data()
,
enw_reporting_triangle_to_long()
,
enw_reporting_triangle()
metaobs <- data.frame(date = as.Date("2021-01-01") + 0:4) enw_extend_date(metaobs, days = 2) enw_extend_date(metaobs, days = 2, direction = "start")
metaobs <- data.frame(date = as.Date("2021-01-01") + 0:4) enw_extend_date(metaobs, days = 2) enw_extend_date(metaobs, days = 2, direction = "start")
This is a helper function which allows users to filter datasets
by reference date. This is useful, for example, when evaluating nowcast
performance against fully observed data. Users may wish to combine this
function with enw_filter_report_dates()
. Note that by definition it is
assumed that report dates must be equal or greater than the corresponding
reference date (i.e a report cannot happen before the event being reported
occurs). This means that this function will also filter out any report dates
that are earlier than their corresponding reference date.
enw_filter_reference_dates( obs, earliest_date, include_days, latest_date, remove_days )
enw_filter_reference_dates( obs, earliest_date, include_days, latest_date, remove_days )
obs |
A |
earliest_date |
earliest reference date to include in the data set |
include_days |
if |
latest_date |
Date, the latest reference date to include in the returned dataset. |
remove_days |
Integer, if |
A data.table
filtered by report date
Preprocessing functions
enw_add_delay()
,
enw_add_max_reported()
,
enw_add_metaobs_features()
,
enw_assign_group()
,
enw_complete_dates()
,
enw_construct_data()
,
enw_extend_date()
,
enw_filter_delay()
,
enw_filter_report_dates()
,
enw_flag_observed_observations()
,
enw_impute_na_observations()
,
enw_latest_data()
,
enw_metadata_delay()
,
enw_metadata()
,
enw_missing_reference()
,
enw_preprocess_data()
,
enw_reporting_triangle_to_long()
,
enw_reporting_triangle()
# Filter by date enw_filter_reference_dates( germany_covid19_hosp, earliest_date = "2021-09-01", latest_date = "2021-10-01" ) # # Filter by days enw_filter_reference_dates( germany_covid19_hosp, include_days = 10, remove_days = 10 )
# Filter by date enw_filter_reference_dates( germany_covid19_hosp, earliest_date = "2021-09-01", latest_date = "2021-10-01" ) # # Filter by days enw_filter_reference_dates( germany_covid19_hosp, include_days = 10, remove_days = 10 )
This is a helper function which allows users to create
truncated data sets at past time points from a given larger data set.
This is useful when evaluating nowcast performance against fully
observed data. Users may wish to combine this function with
enw_filter_reference_dates()
.
enw_filter_report_dates(obs, latest_date, remove_days)
enw_filter_report_dates(obs, latest_date, remove_days)
obs |
A |
latest_date |
Date, the latest report date to include in the returned dataset. |
remove_days |
Integer, if |
A data.table filtered by report date
Preprocessing functions
enw_add_delay()
,
enw_add_max_reported()
,
enw_add_metaobs_features()
,
enw_assign_group()
,
enw_complete_dates()
,
enw_construct_data()
,
enw_extend_date()
,
enw_filter_delay()
,
enw_filter_reference_dates()
,
enw_flag_observed_observations()
,
enw_impute_na_observations()
,
enw_latest_data()
,
enw_metadata_delay()
,
enw_metadata()
,
enw_missing_reference()
,
enw_preprocess_data()
,
enw_reporting_triangle_to_long()
,
enw_reporting_triangle()
# Filter by date enw_filter_report_dates(germany_covid19_hosp, latest_date = "2021-09-01") # Filter by days enw_filter_report_dates(germany_covid19_hosp, remove_days = 10)
# Filter by date enw_filter_report_dates(germany_covid19_hosp, latest_date = "2021-09-01") # Filter by days enw_filter_report_dates(germany_covid19_hosp, remove_days = 10)
Format model fitting options for use with stan
enw_fit_opts( sampler = epinowcast::enw_sample, nowcast = TRUE, pp = FALSE, likelihood = TRUE, likelihood_aggregation = c("snapshots", "groups"), threads_per_chain = 1L, debug = FALSE, output_loglik = FALSE, ... )
enw_fit_opts( sampler = epinowcast::enw_sample, nowcast = TRUE, pp = FALSE, likelihood = TRUE, likelihood_aggregation = c("snapshots", "groups"), threads_per_chain = 1L, debug = FALSE, output_loglik = FALSE, ... )
sampler |
A function that creates an object that be used to extract
posterior samples from the specified model. By default this is |
nowcast |
Logical, defaults to |
pp |
Logical, defaults to |
likelihood |
Logical, defaults to |
likelihood_aggregation |
Character string, aggregation over which
stratify the likelihood when
Note that some model modules override this setting depending on model
requirements. For example, the |
threads_per_chain |
Integer, defaults to |
debug |
Logical, defaults to |
output_loglik |
Logical, defaults to |
... |
Additional arguments to pass to the fitting function being used
by |
A list containing the specified sampler function, data as a list specifying the fitting options to use, and additional arguments to pass to the sampler function when it is called.
Model modules
enw_expectation()
,
enw_missing()
,
enw_obs()
,
enw_reference()
,
enw_report()
# Default options along with settings to pass to enw_sample enw_fit_opts(iter_sampling = 1000, iter_warmup = 1000)
# Default options along with settings to pass to enw_sample enw_fit_opts(iter_sampling = 1000, iter_warmup = 1000)
Flags observations based on the 'confirm' column.
If the '.observed' column does not exist, it is created. Observations are
flagged as observed (TRUE
) if 'confirm' is not NA.
enw_flag_observed_observations(obs, copy = TRUE)
enw_flag_observed_observations(obs, copy = TRUE)
obs |
A |
copy |
A logical; if |
A data.table
with an additional column '.observed' indicating
observed observations.
Preprocessing functions
enw_add_delay()
,
enw_add_max_reported()
,
enw_add_metaobs_features()
,
enw_assign_group()
,
enw_complete_dates()
,
enw_construct_data()
,
enw_extend_date()
,
enw_filter_delay()
,
enw_filter_reference_dates()
,
enw_filter_report_dates()
,
enw_impute_na_observations()
,
enw_latest_data()
,
enw_metadata_delay()
,
enw_metadata()
,
enw_missing_reference()
,
enw_preprocess_data()
,
enw_reporting_triangle_to_long()
,
enw_reporting_triangle()
dt <- data.frame(id = 1:3, confirm = c(NA, 1, 2)) enw_flag_observed_observations(dt)
dt <- data.frame(id = 1:3, confirm = c(NA, 1, 2)) enw_flag_observed_observations(dt)
This function allows models to be defined using a flexible formula interface that supports fixed effects, random effects (using lme4 syntax). Note that the returned fixed effects design matrix is sparse and so the index supplied is required to link observations to the appropriate design matrix row.
enw_formula(formula, data, sparse = TRUE)
enw_formula(formula, data, sparse = TRUE)
formula |
A model formula that may use standard fixed
effects, random effects using lme4 syntax (see |
data |
A |
sparse |
Logical, defaults to |
A list containing the following:
formula
: The user supplied formula
parsed_formula
: The formula as parsed by parse_formula()
extended_formula
: The flattened version of the formula with
both user supplied terms and terms added for the user supplied
complex model components.
fixed
: A list containing the fixed effect formula, sparse design
matrix, and the index linking the design matrix with observations.
random
: A list containing the random effect formula, sparse design
matrix, and the index linking the design matrix with random effects.
Functions used to help convert formulas into model designs
as_string_formula()
,
construct_re()
,
construct_rw()
,
enw_manual_formula()
,
parse_formula()
,
remove_rw_terms()
,
re()
,
rw_terms()
,
rw()
,
split_formula_to_terms()
# Use meta data for references dates from the Germany COVID-19 # hospitalisation data. obs <- enw_filter_report_dates( germany_covid19_hosp[location == "DE"], remove_days = 40 ) obs <- enw_filter_reference_dates(obs, include_days = 40) pobs <- enw_preprocess_data( obs, by = c("age_group", "location"), max_delay = 20 ) data <- pobs$metareference[[1]] # Model with fixed effects for age group enw_formula(~ 1 + age_group, data) # Model with random effects for age group enw_formula(~ 1 + (1 | age_group), data) # Model with a random effect for age group and a random walk enw_formula(~ 1 + (1 | age_group) + rw(week), data) # Model defined without a sparse fixed effects design matrix enw_formula(~1, data[1:20, ]) # Model using an interaction in the right hand side of a random effect # to specify an independent random effect per strata. enw_formula(~ (1 + day | week:month), data = data)
# Use meta data for references dates from the Germany COVID-19 # hospitalisation data. obs <- enw_filter_report_dates( germany_covid19_hosp[location == "DE"], remove_days = 40 ) obs <- enw_filter_reference_dates(obs, include_days = 40) pobs <- enw_preprocess_data( obs, by = c("age_group", "location"), max_delay = 20 ) data <- pobs$metareference[[1]] # Model with fixed effects for age group enw_formula(~ 1 + age_group, data) # Model with random effects for age group enw_formula(~ 1 + (1 | age_group), data) # Model with a random effect for age group and a random walk enw_formula(~ 1 + (1 | age_group) + rw(week), data) # Model defined without a sparse fixed effects design matrix enw_formula(~1, data[1:20, ]) # Model using an interaction in the right hand side of a random effect # to specify an independent random effect per strata. enw_formula(~ (1 + day | week:month), data = data)
Format formula data for use with stan
enw_formula_as_data_list(formula, prefix, drop_intercept = FALSE)
enw_formula_as_data_list(formula, prefix, drop_intercept = FALSE)
formula |
The output of |
prefix |
A character string indicating variable label to use as a prefix. |
drop_intercept |
Logical, defaults to |
A list defining the model formula. This includes:
prefix_fintercept:
Is an intercept present for the fixed effects design
matrix.
prefix_fdesign
: The fixed effects design matrix
prefix_fnrow
: The number of rows of the fixed design matrix
prefix_findex
: The index linking design matrix rows to observations
prefix_fnindex
: The length of the index
prefix_fncol
: The number of columns (i.e effects) in the fixed effect
design matrix (minus 1 if drop_intercept = TRUE
).
prefix_rdesign
: The random effects design matrix
prefix_rncol
: The number of columns (i.e random effects) in the random
effect design matrix (minus 1 as the intercept is dropped).
Functions used to help convert models into the format required for stan
enw_get_cache()
,
enw_model()
,
enw_priors_as_data_list()
,
enw_replace_priors()
,
enw_sample()
,
enw_set_cache()
,
enw_stan_to_r()
,
enw_unset_cache()
,
remove_profiling()
,
write_stan_files_no_profile()
f <- enw_formula(~ 1 + (1 | cyl), mtcars) enw_formula_as_data_list(f, "mtcars") # A missing formula produces the default list enw_formula_as_data_list(prefix = "missing")
f <- enw_formula(~ 1 + (1 | cyl), mtcars) enw_formula_as_data_list(f, "mtcars") # A missing formula produces the default list enw_formula_as_data_list(prefix = "missing")
Retrieves the user set cache location for Stan models. This
path can be set through the enw_cache_location
function call.
If no environmental variable is available the output from
tempdir()
will be returned.
enw_get_cache()
enw_get_cache()
A string representing the file path for the cache location
Functions used to help convert models into the format required for stan
enw_formula_as_data_list()
,
enw_model()
,
enw_priors_as_data_list()
,
enw_replace_priors()
,
enw_sample()
,
enw_set_cache()
,
enw_stan_to_r()
,
enw_unset_cache()
,
remove_profiling()
,
write_stan_files_no_profile()
Imputes NA values in the 'confirm' column. NA values are replaced with the last available observation or 0.
enw_impute_na_observations(obs, by = NULL, copy = TRUE)
enw_impute_na_observations(obs, by = NULL, copy = TRUE)
obs |
A |
by |
A character vector of column names to group by. Defaults to an empty vector. |
copy |
A logical; if |
A data.table
with imputed 'confirm' column where NA values have
been replaced with zero.
Preprocessing functions
enw_add_delay()
,
enw_add_max_reported()
,
enw_add_metaobs_features()
,
enw_assign_group()
,
enw_complete_dates()
,
enw_construct_data()
,
enw_extend_date()
,
enw_filter_delay()
,
enw_filter_reference_dates()
,
enw_filter_report_dates()
,
enw_flag_observed_observations()
,
enw_latest_data()
,
enw_metadata_delay()
,
enw_metadata()
,
enw_missing_reference()
,
enw_preprocess_data()
,
enw_reporting_triangle_to_long()
,
enw_reporting_triangle()
dt <- data.frame( id = 1:3, confirm = c(NA, 1, 2), reference_date = as.Date("2021-01-01") ) enw_impute_na_observations(dt)
dt <- data.frame( id = 1:3, confirm = c(NA, 1, 2), reference_date = as.Date("2021-01-01") ) enw_impute_na_observations(dt)
This function takes a data.table
of aggregate counts or
something coercible to a data.table
(such as a data.frame
) and converts
it to a line list where each row represents a case.
enw_incidence_to_linelist( obs, reference_date = "reference_date", report_date = "report_date" )
enw_incidence_to_linelist( obs, reference_date = "reference_date", report_date = "report_date" )
obs |
An object coercible to a |
reference_date |
A character string of the variable name to use
for the |
report_date |
A character string of the variable name to use
for the |
A data.table
with the following variables: id
, reference_date
,
report_date
, and any other variables in the obs
object. Rows in obs
will be duplicated based on the new_confirm
column. reference_date
and
report_date
may be renamed if reference_date
and report_date
are
supplied.
Data converters
enw_add_cumulative()
,
enw_add_incidence()
,
enw_aggregate_cumulative()
,
enw_cumulative_to_incidence()
,
enw_incidence_to_cumulative()
,
enw_linelist_to_incidence()
incidence <- enw_add_incidence(germany_covid19_hosp) incidence <- enw_filter_reference_dates( incidence[location == "DE"], include_days = 10 ) enw_incidence_to_linelist(incidence, reference_date = "onset_date")
incidence <- enw_add_incidence(germany_covid19_hosp) incidence <- enw_filter_reference_dates( incidence[location == "DE"], include_days = 10 ) enw_incidence_to_linelist(incidence, reference_date = "onset_date")
Filter observations for the latest available reported data for each reference date. Note this is not the same as filtering for the maximum report date in all cases as data may only be updated up to some maximum number of days.
enw_latest_data(obs)
enw_latest_data(obs)
obs |
A |
A data.table
of observations filtered for the latest available data
for each reference date.
Preprocessing functions
enw_add_delay()
,
enw_add_max_reported()
,
enw_add_metaobs_features()
,
enw_assign_group()
,
enw_complete_dates()
,
enw_construct_data()
,
enw_extend_date()
,
enw_filter_delay()
,
enw_filter_reference_dates()
,
enw_filter_report_dates()
,
enw_flag_observed_observations()
,
enw_impute_na_observations()
,
enw_metadata_delay()
,
enw_metadata()
,
enw_missing_reference()
,
enw_preprocess_data()
,
enw_reporting_triangle_to_long()
,
enw_reporting_triangle()
# Filter for latest reported data enw_latest_data(germany_covid19_hosp)
# Filter for latest reported data enw_latest_data(germany_covid19_hosp)
This function takes a line list (i.e. tabular data where each
row represents a case) and aggregates to a count (new_confirm
) of cases by
user-specified reference_date
s and report_date
s. This is enables the use
of enw_preprocess_data()
and other epinowcast()
preprocessing functions.
enw_linelist_to_incidence( linelist, reference_date = "reference_date", report_date = "report_date", by = NULL, max_delay, completion_beyond_max_report = FALSE, copy = TRUE )
enw_linelist_to_incidence( linelist, reference_date = "reference_date", report_date = "report_date", by = NULL, max_delay, completion_beyond_max_report = FALSE, copy = TRUE )
linelist |
An object coercible to a |
reference_date |
A date or a variable that can be coerced to a date
that represents the date of interest for the case. For example, if the
|
report_date |
A date or a variable that can be coerced to a date that represents the date the case was reported. The default is "report_date". |
by |
A character vector of variables to also aggregate by (i.e. as well
as using the |
max_delay |
The maximum number of days between the |
completion_beyond_max_report |
Logical, should entries be completed beyond the maximum date found in the data? Default: FALSE |
copy |
Should |
A data.table
with the following variables: reference_date
,
report_date
, new_confirm
, confirm
, delay
, and
any variables specified in by
.
Data converters
enw_add_cumulative()
,
enw_add_incidence()
,
enw_aggregate_cumulative()
,
enw_cumulative_to_incidence()
,
enw_incidence_to_cumulative()
,
enw_incidence_to_linelist()
linelist <- data.frame( onset_date = as.Date(c("2021-01-02", "2021-01-03", "2021-01-02")), report_date = as.Date(c("2021-01-03", "2021-01-05", "2021-01-04")) ) enw_linelist_to_incidence(linelist, reference_date = "onset_date") # Specify a custom maximum delay and allow completion beyond the maximum # observed delay enw_linelist_to_incidence( linelist, reference_date = "onset_date", max_delay = 5, completion_beyond_max_report = TRUE )
linelist <- data.frame( onset_date = as.Date(c("2021-01-02", "2021-01-03", "2021-01-02")), report_date = as.Date(c("2021-01-03", "2021-01-05", "2021-01-04")) ) enw_linelist_to_incidence(linelist, reference_date = "onset_date") # Specify a custom maximum delay and allow completion beyond the maximum # observed delay enw_linelist_to_incidence( linelist, reference_date = "onset_date", max_delay = 5, completion_beyond_max_report = TRUE )
For most typical use cases enw_formula()
should
provide sufficient flexibility to allow models to be defined. However,
there may be some instances where more manual model specification is
required. This function supports this by allowing the user to supply
vectors of fixed, random, and customised random effects (where they are
not first treated as fixed effect terms). Prior to 1.0.0
this was the
main interface for specifying models and it is still used internally to
handle some parts of the model specification process.
enw_manual_formula( data, fixed = NULL, random = NULL, custom_random = NULL, no_contrasts = FALSE, add_intercept = TRUE )
enw_manual_formula( data, fixed = NULL, random = NULL, custom_random = NULL, no_contrasts = FALSE, add_intercept = TRUE )
data |
A |
fixed |
A character vector of fixed effects. |
random |
A character vector of random effects. Random effects specified here will be added to the fixed effects. |
custom_random |
A vector of random effects. Random effects added here will not be added to the vector of fixed effects. This can be used to random effects for fixed effects that only have a partial name match. |
no_contrasts |
Logical, defaults to |
add_intercept |
Logical, defaults to |
A list specifying the fixed effects (formula, design matrix, and design matrix index), and random effects (formula and design matrix).
Functions used to help convert formulas into model designs
as_string_formula()
,
construct_re()
,
construct_rw()
,
enw_formula()
,
parse_formula()
,
remove_rw_terms()
,
re()
,
rw_terms()
,
rw()
,
split_formula_to_terms()
data <- enw_example("prep")$metareference[[1]] enw_manual_formula(data, fixed = "week", random = "day_of_week")
data <- enw_example("prep")$metareference[[1]] enw_manual_formula(data, fixed = "week", random = "day_of_week")
Extract metadata from raw data, either
by reference or by report date. For the target date chosen
(reference or report), confirm
, max_confirm``, and
cum_prop_reported'
are dropped and the first observation for each group and date is retained.
enw_metadata(obs, target_date = c("reference_date", "report_date"))
enw_metadata(obs, target_date = c("reference_date", "report_date"))
obs |
A |
target_date |
A character string, either "reference_date" or "report_date". The column corresponding to this string will be used as the target date for metadata extraction. |
A data.table with columns:
date
, a Date column
.group
, a grouping column
and the first observation for each group and date.
The data.table is sorted by .group
and date
.
Preprocessing functions
enw_add_delay()
,
enw_add_max_reported()
,
enw_add_metaobs_features()
,
enw_assign_group()
,
enw_complete_dates()
,
enw_construct_data()
,
enw_extend_date()
,
enw_filter_delay()
,
enw_filter_reference_dates()
,
enw_filter_report_dates()
,
enw_flag_observed_observations()
,
enw_impute_na_observations()
,
enw_latest_data()
,
enw_metadata_delay()
,
enw_missing_reference()
,
enw_preprocess_data()
,
enw_reporting_triangle_to_long()
,
enw_reporting_triangle()
obs <- data.frame( reference_date = as.Date("2021-01-01"), report_date = as.Date("2022-01-01"), x = 1:10 ) enw_metadata(obs, target_date = "reference_date")
obs <- data.frame( reference_date = as.Date("2021-01-01"), report_date = as.Date("2022-01-01"), x = 1:10 ) enw_metadata(obs, target_date = "reference_date")
Calculate delay metadata based on the supplied maximum delay and independent
of other metadata or date indexing. These data are meant to be used in
conjunction with metadata on the date of reference. Users can build
additional features with this data.frame
or regenerate it using this
function in the output of enw_preprocess_data()
.
enw_metadata_delay(max_delay = 20, breaks = 4, timestep = "day")
enw_metadata_delay(max_delay = 20, breaks = 4, timestep = "day")
max_delay |
The maximum number of days to model in the delay
distribution. Must be an integer greater than or equal to 1. Observations
with delays larger then the maximum delay will be dropped. If the specified
maximum delay is too short, nowcasts can be biased as important parts of the
true delay distribution are cut off. At the same time, computational cost
scales non-linearly with this setting, so you want the maximum delay to be as
long as necessary, but not much longer. Consider what delays are realistic
for your application, and when in doubt, check if increasing the maximum
delay noticeably changes the delay distribution or nowcasts as estimated by
epinowcast. If it does, your maximum delay may still be too short.
Note that delays are zero indexed and so include the reference date and
|
breaks |
Numeric, defaults to 4. The number of breaks to use when constructing a categorised version of numeric delays. |
timestep |
The timestep to used. This can be a string ("day", "week", "month") or a numeric whole number representing the number of days. |
A data.frame
of delay metadata. This includes:
delay
: The numeric delay from reference date to report.
delay_cat
: The categorised delay. This may be useful for model building.
delay_week
: The numeric week since the delay was reported. This again
may be useful for model building.
delay_head
: A logical variable defining if the delay is in the lower
25% of the potential delays. This may be particularly useful when building
models that assume a parametric distribution in order to increase the weight
of the head of the reporting distribution in a pragmatic way.
delay_tail
: A logical variable defining if the delay is in the upper
75% of the potential delays. This may be particularly useful when building
models that assume a parametric distribution in order to increase the weight
of the tail of the reporting distribution in a pragmatic way.
Preprocessing functions
enw_add_delay()
,
enw_add_max_reported()
,
enw_add_metaobs_features()
,
enw_assign_group()
,
enw_complete_dates()
,
enw_construct_data()
,
enw_extend_date()
,
enw_filter_delay()
,
enw_filter_reference_dates()
,
enw_filter_report_dates()
,
enw_flag_observed_observations()
,
enw_impute_na_observations()
,
enw_latest_data()
,
enw_metadata()
,
enw_missing_reference()
,
enw_preprocess_data()
,
enw_reporting_triangle_to_long()
,
enw_reporting_triangle()
enw_metadata_delay(max_delay = 20, breaks = 4)
enw_metadata_delay(max_delay = 20, breaks = 4)
Missing reference data model module
enw_missing(formula = ~1, data)
enw_missing(formula = ~1, data)
formula |
A formula (as implemented in |
data |
Output from |
A list containing the supplied formulas, data passed into a list
describing the models, a data.frame
describing the priors used, and a
function that takes the output data and priors and returns a function that
can be used to sample from a tightened version of the prior distribution.
Model modules
enw_expectation()
,
enw_fit_opts()
,
enw_obs()
,
enw_reference()
,
enw_report()
# Missingness model with a fixed intercept only enw_missing(data = enw_example("preprocessed")) # No missingness model specified enw_missing(~0, data = enw_example("preprocessed"))
# Missingness model with a fixed intercept only enw_missing(data = enw_example("preprocessed")) # No missingness model specified enw_missing(~0, data = enw_example("preprocessed"))
Returns reports with missing reference dates as well as calculating the proportion of reports for a given reference date that were missing.
enw_missing_reference(obs)
enw_missing_reference(obs)
obs |
A |
A data.table
of missing counts and proportions by report date and
group.
Preprocessing functions
enw_add_delay()
,
enw_add_max_reported()
,
enw_add_metaobs_features()
,
enw_assign_group()
,
enw_complete_dates()
,
enw_construct_data()
,
enw_extend_date()
,
enw_filter_delay()
,
enw_filter_reference_dates()
,
enw_filter_report_dates()
,
enw_flag_observed_observations()
,
enw_impute_na_observations()
,
enw_latest_data()
,
enw_metadata_delay()
,
enw_metadata()
,
enw_preprocess_data()
,
enw_reporting_triangle_to_long()
,
enw_reporting_triangle()
obs <- data.frame( report_date = c("2021-10-01", "2021-10-03"), reference_date = "2021-10-01", confirm = 1 ) obs <- rbind( obs, data.frame(report_date = "2021-10-04", reference_date = NA, confirm = 4) ) obs <- enw_complete_dates(obs) obs <- enw_assign_group(obs) obs <- enw_add_incidence(obs) enw_missing_reference(obs)
obs <- data.frame( report_date = c("2021-10-01", "2021-10-03"), reference_date = "2021-10-01", confirm = 1 ) obs <- rbind( obs, data.frame(report_date = "2021-10-04", reference_date = NA, confirm = 4) ) obs <- enw_complete_dates(obs) obs <- enw_assign_group(obs) obs <- enw_add_incidence(obs) enw_missing_reference(obs)
Load and compile the nowcasting model
enw_model( model = system.file("stan", "epinowcast.stan", package = "epinowcast"), include = system.file("stan", package = "epinowcast"), compile = TRUE, threads = TRUE, profile = FALSE, target_dir = epinowcast::enw_get_cache(), stanc_options = list(), cpp_options = list(), verbose = TRUE, ... )
enw_model( model = system.file("stan", "epinowcast.stan", package = "epinowcast"), include = system.file("stan", package = "epinowcast"), compile = TRUE, threads = TRUE, profile = FALSE, target_dir = epinowcast::enw_get_cache(), stanc_options = list(), cpp_options = list(), verbose = TRUE, ... )
model |
A character string indicating the path to the model. If not supplied the package default model is used. |
include |
A character string specifying the path to any stan files to include in the model. If missing the package default is used. |
compile |
Logical, defaults to |
threads |
Logical, defaults to |
profile |
Logical, defaults to |
target_dir |
The path to a directory in which the manipulated .stan
files without profiling statements should be stored. To avoid overriding of
the original .stan files, this should be different from the directory of the
original model and the |
stanc_options |
A list of options to pass to the |
cpp_options |
A list of options to pass to the |
verbose |
Logical, defaults to |
... |
Additional arguments passed to |
A cmdstanr
model.
Functions used to help convert models into the format required for stan
enw_formula_as_data_list()
,
enw_get_cache()
,
enw_priors_as_data_list()
,
enw_replace_priors()
,
enw_sample()
,
enw_set_cache()
,
enw_stan_to_r()
,
enw_unset_cache()
,
remove_profiling()
,
write_stan_files_no_profile()
mod <- enw_model()
mod <- enw_model()
A generic wrapper around posterior::draws_df()
with
opinionated defaults to extract the posterior samples for the
nowcast ("pp_inf_obs"
from the stan
code). The functionality of
this function can be used directly on the output of epinowcast()
using
the supplied summary.epinowcast()
method.
enw_nowcast_samples(fit, obs, max_delay = NULL, timestep = "day")
enw_nowcast_samples(fit, obs, max_delay = NULL, timestep = "day")
fit |
A |
obs |
An observation |
max_delay |
Maximum delay to which nowcasts should be summarised. Must be equal (default) or larger than the modelled maximum delay. If it is larger, then nowcasts for unmodelled dates are added by assuming that case counts beyond the modelled maximum delay are fully observed. |
timestep |
The timestep to used. This can be a string ("day", "week", "month") or a numeric whole number representing the number of days. |
A data.frame
of posterior samples for the nowcast prediction.
This uses observed data where available and the posterior prediction
where not.
Functions used for postprocessing of model fits
enw_add_latest_obs_to_nowcast()
,
enw_nowcast_summary()
,
enw_posterior()
,
enw_pp_summary()
,
enw_quantiles_to_long()
,
enw_summarise_samples()
fit <- enw_example("nowcast") enw_nowcast_samples( fit$fit[[1]], fit$latest[[1]], fit$max_delay, "day" )
fit <- enw_example("nowcast") enw_nowcast_samples( fit$fit[[1]], fit$latest[[1]], fit$max_delay, "day" )
A generic wrapper around enw_posterior()
with
opinionated defaults to extract the posterior prediction for the
nowcast ("pp_inf_obs"
from the stan
code). The functionality of
this function can be used directly on the output of epinowcast()
using
the supplied summary.epinowcast()
method.
enw_nowcast_summary( fit, obs, max_delay = NULL, timestep = "day", probs = c(0.05, 0.2, 0.35, 0.5, 0.65, 0.8, 0.95) )
enw_nowcast_summary( fit, obs, max_delay = NULL, timestep = "day", probs = c(0.05, 0.2, 0.35, 0.5, 0.65, 0.8, 0.95) )
fit |
A |
obs |
An observation |
max_delay |
Maximum delay to which nowcasts should be summarised. Must be equal (default) or larger than the modelled maximum delay. If it is larger, then nowcasts for unmodelled dates are added by assuming that case counts beyond the modelled maximum delay are fully observed. |
timestep |
The timestep to used. This can be a string ("day", "week", "month") or a numeric whole number representing the number of days. |
probs |
A vector of numeric probabilities to produce quantile summaries for. By default these are the 5%, 20%, 80%, and 95% quantiles which are also the minimum set required for plotting functions to work. |
A data.frame
summarising the model posterior nowcast prediction.
This uses observed data where available and the posterior prediction
where not.
Functions used for postprocessing of model fits
enw_add_latest_obs_to_nowcast()
,
enw_nowcast_samples()
,
enw_posterior()
,
enw_pp_summary()
,
enw_quantiles_to_long()
,
enw_summarise_samples()
fit <- enw_example("nowcast") enw_nowcast_summary( fit$fit[[1]], fit$latest[[1]], fit$max_delay )
fit <- enw_example("nowcast") enw_nowcast_summary( fit$fit[[1]], fit$latest[[1]], fit$max_delay )
Setup observation model and data
enw_obs(family = c("negbin", "poisson"), observation_indicator = NULL, data)
enw_obs(family = c("negbin", "poisson"), observation_indicator = NULL, data)
family |
Character string, the observation model to use in the
likelihood; enforced by |
observation_indicator |
A character string, the name of the column in
the data that indicates whether an observation is observed or not (using a
logical variable) and therefore whether or not it should be used in the
likelihood. This variable should be present in the data input to
|
data |
Output from |
A list as required by stan.
Model modules
enw_expectation()
,
enw_fit_opts()
,
enw_missing()
,
enw_reference()
,
enw_report()
enw_obs(data = enw_example("preprocessed"))
enw_obs(data = enw_example("preprocessed"))
This function takes a data.frame and a categorical variable, performs one-hot encoding, and column-binds the encoded variables back to the data.frame.
enw_one_hot_encode_feature(metaobs, feature, contrasts = FALSE)
enw_one_hot_encode_feature(metaobs, feature, contrasts = FALSE)
metaobs |
A data.frame containing the data to be encoded. |
feature |
The name of the categorical variable to one-hot encode as a character string. |
contrasts |
Logical. If TRUE, create one-hot encoded variables with contrasts; if FALSE, create them without contrasts. Defaults to FALSE. |
Functions used to formulate models
enw_add_cumulative_membership()
,
enw_add_pooling_effect()
,
enw_design()
,
enw_effects_metadata()
metaobs <- data.frame(week = 1:2) enw_one_hot_encode_feature(metaobs, "week") enw_one_hot_encode_feature(metaobs, "week", contrasts = TRUE) metaobs <- data.frame(week = 1:6) enw_one_hot_encode_feature(metaobs, "week") enw_one_hot_encode_feature(metaobs, "week", contrasts = TRUE)
metaobs <- data.frame(week = 1:2) enw_one_hot_encode_feature(metaobs, "week") enw_one_hot_encode_feature(metaobs, "week", contrasts = TRUE) metaobs <- data.frame(week = 1:6) enw_one_hot_encode_feature(metaobs, "week") enw_one_hot_encode_feature(metaobs, "week", contrasts = TRUE)
Plot nowcast quantiles
enw_plot_nowcast_quantiles(nowcast, latest_obs = NULL, log = FALSE, ...)
enw_plot_nowcast_quantiles(nowcast, latest_obs = NULL, log = FALSE, ...)
nowcast |
A |
latest_obs |
A |
log |
Logical, defaults to |
... |
Additional arguments passed to |
A ggplot2
plot.
Plotting functions
enw_plot_obs()
,
enw_plot_pp_quantiles()
,
enw_plot_quantiles()
,
enw_plot_theme()
,
plot.epinowcast()
nowcast <- enw_example("nowcast") nowcast <- summary(nowcast, probs = c(0.05, 0.2, 0.8, 0.95)) enw_plot_nowcast_quantiles(nowcast)
nowcast <- enw_example("nowcast") nowcast <- summary(nowcast, probs = c(0.05, 0.2, 0.8, 0.95)) enw_plot_nowcast_quantiles(nowcast)
Generic quantile plot
enw_plot_obs(obs, latest_obs = NULL, log = TRUE, ...)
enw_plot_obs(obs, latest_obs = NULL, log = TRUE, ...)
obs |
A |
latest_obs |
A |
log |
Logical, defaults to |
... |
Additional arguments passed to |
A ggplot2
plot.
Plotting functions
enw_plot_nowcast_quantiles()
,
enw_plot_pp_quantiles()
,
enw_plot_quantiles()
,
enw_plot_theme()
,
plot.epinowcast()
nowcast <- enw_example("nowcast") obs <- enw_example("obs") # Plot observed data by reference date enw_plot_obs(obs, x = reference_date) # Plot observed data by reference date with more recent data enw_plot_obs(nowcast$latest[[1]], obs, x = reference_date)
nowcast <- enw_example("nowcast") obs <- enw_example("obs") # Plot observed data by reference date enw_plot_obs(obs, x = reference_date) # Plot observed data by reference date with more recent data enw_plot_obs(nowcast$latest[[1]], obs, x = reference_date)
Plot posterior prediction quantiles
enw_plot_pp_quantiles(pp, log = FALSE, ...)
enw_plot_pp_quantiles(pp, log = FALSE, ...)
pp |
A |
log |
Logical, defaults to |
... |
Additional arguments passed to |
A ggplot2
plot.
Plotting functions
enw_plot_nowcast_quantiles()
,
enw_plot_obs()
,
enw_plot_quantiles()
,
enw_plot_theme()
,
plot.epinowcast()
nowcast <- enw_example("nowcast") nowcast <- summary( nowcast, type = "posterior_prediction", probs = c(0.05, 0.2, 0.8, 0.95) ) enw_plot_pp_quantiles(nowcast) + ggplot2::facet_wrap(ggplot2::vars(reference_date), scales = "free")
nowcast <- enw_example("nowcast") nowcast <- summary( nowcast, type = "posterior_prediction", probs = c(0.05, 0.2, 0.8, 0.95) ) enw_plot_pp_quantiles(nowcast) + ggplot2::facet_wrap(ggplot2::vars(reference_date), scales = "free")
Generic quantile plot
enw_plot_quantiles(posterior, latest_obs = NULL, log = FALSE, ...)
enw_plot_quantiles(posterior, latest_obs = NULL, log = FALSE, ...)
posterior |
A |
latest_obs |
A |
log |
Logical, defaults to |
... |
Additional arguments passed to |
A ggplot2
plot.
enw_plot_nowcast_quantiles()
, enw_plot_pp_quantiles()
Plotting functions
enw_plot_nowcast_quantiles()
,
enw_plot_obs()
,
enw_plot_pp_quantiles()
,
enw_plot_theme()
,
plot.epinowcast()
nowcast <- enw_example("nowcast") nowcast <- summary(nowcast, probs = c(0.05, 0.2, 0.8, 0.95)) enw_plot_quantiles(nowcast, x = reference_date)
nowcast <- enw_example("nowcast") nowcast <- summary(nowcast, probs = c(0.05, 0.2, 0.8, 0.95)) enw_plot_quantiles(nowcast, x = reference_date)
Package plot theme
enw_plot_theme(plot)
enw_plot_theme(plot)
plot |
|
ggplot2
plot object.
Plotting functions
enw_plot_nowcast_quantiles()
,
enw_plot_obs()
,
enw_plot_pp_quantiles()
,
enw_plot_quantiles()
,
plot.epinowcast()
A generic wrapper around posterior::summarise_draws()
with
opinionated defaults.
enw_posterior(fit, variables = NULL, probs = c(0.05, 0.2, 0.8, 0.95), ...)
enw_posterior(fit, variables = NULL, probs = c(0.05, 0.2, 0.8, 0.95), ...)
fit |
A |
variables |
A character vector of variables to return posterior summaries for. By default summaries for all parameters are returned. |
probs |
A vector of numeric probabilities to produce quantile summaries for. By default these are the 5%, 20%, 80%, and 95% quantiles which are also the minimum set required for plotting functions to work. |
... |
Additional arguments that may be passed but will not be used. |
A data.frame
summarising the model posterior.
Functions used for postprocessing of model fits
enw_add_latest_obs_to_nowcast()
,
enw_nowcast_samples()
,
enw_nowcast_summary()
,
enw_pp_summary()
,
enw_quantiles_to_long()
,
enw_summarise_samples()
fit <- enw_example("nowcast") enw_posterior(fit$fit[[1]], variables = "expr_beta")
fit <- enw_example("nowcast") enw_posterior(fit$fit[[1]], variables = "expr_beta")
This function summarises posterior predictives
for observed data (by report and reference date). The functionality of
this function can be used directly on the output of epinowcast()
using
the supplied summary.epinowcast()
method.
enw_pp_summary(fit, diff_obs, probs = c(0.05, 0.2, 0.35, 0.5, 0.65, 0.8, 0.95))
enw_pp_summary(fit, diff_obs, probs = c(0.05, 0.2, 0.35, 0.5, 0.65, 0.8, 0.95))
fit |
A |
diff_obs |
A |
probs |
A vector of numeric probabilities to produce quantile summaries for. By default these are the 5%, 20%, 80%, and 95% quantiles which are also the minimum set required for plotting functions to work. |
A data.table summarising the posterior predictions.
Functions used for postprocessing of model fits
enw_add_latest_obs_to_nowcast()
,
enw_nowcast_samples()
,
enw_nowcast_summary()
,
enw_posterior()
,
enw_quantiles_to_long()
,
enw_summarise_samples()
fit <- enw_example("nowcast") enw_pp_summary(fit$fit[[1]], fit$new_confirm[[1]], probs = c(0.5))
fit <- enw_example("nowcast") enw_pp_summary(fit$fit[[1]], fit$new_confirm[[1]], probs = c(0.5))
This function preprocesses raw observations under the
assumption they are reported as cumulative counts by a reference and
report date and is used to assign groups. It also constructs data objects
used by visualisation and modelling functions including the
observed empirical probability of a report on a given day, the cumulative
probability of report, the latest available observations, incidence of
observations, and metadata about the date of reference and report (used to
construct models). This function wraps other preprocessing functions that may
be instead used individually if required. Note that internally reports
beyond the user specified delay are dropped for modelling purposes with the
cum_prop_reported
and max_confirm
variables allowing the user to check
the impact this may have (if cum_prop_reported
is significantly below 1 a
longer max_delay
may be appropriate). Also note that if missing reference
or report dates are suspected to occur in your data then these need to be
completed with enw_complete_dates()
.
enw_preprocess_data( obs, by = NULL, max_delay, timestep = "day", set_negatives_to_zero = TRUE, ..., copy = TRUE )
enw_preprocess_data( obs, by = NULL, max_delay, timestep = "day", set_negatives_to_zero = TRUE, ..., copy = TRUE )
obs |
A |
by |
A character vector describing the stratification of observations. This defaults to no grouping. This should be used when modelling multiple time series in order to identify them for downstream modelling |
max_delay |
The maximum number of days to model in the delay distribution. If not specified the maximum observed delay is assumed to be the true maximum delay in the model. Otherwise, an integer greater than or equal to 1 can be specified. Observations with delays larger then the maximum delay will be dropped. If the specified maximum delay is too short, nowcasts can be biased as important parts of the true delay distribution are cut off. At the same time, computational cost scales non-linearly with this setting, so you want the maximum delay to be as long as necessary, but not much longer. Steps to take to determine the maximum delay:
Note that delays are zero indexed and so include the reference date and
|
timestep |
The timestep to used in the process model (i.e. the
reference date model). This can be a string ("day", "week", "month") or a
numeric whole number representing the number of days. If your data does not
have this timestep then you may wish to make use of
|
set_negatives_to_zero |
Logical, defaults to TRUE. Should negative
counts (for calculated incidence of observations) be set to zero? Currently
downstream modelling does not support negative counts and so setting must be
TRUE if intending to use |
... |
Other arguments to |
copy |
A logical; if |
If max_delay
is numeric, it will be internally coerced to integer
using as.integer()
).
A data.table containing processed observations as a series of nested data.frames as well as variables containing metadata. These are:
obs
: (observations with the addition of empirical reporting proportions
and restricted to the specified maximum delay).
new_confirm
: Incidence of notifications by reference and report date.
Empirical reporting distributions are also added.
latest
: The latest available observations.
missing_reference
: Observations missing reference dates.
reporting_triangle
: Incident observations by report and reference date in
the standard reporting triangle matrix format.
metareference
: Metadata reference dates derived from observations.
metrareport
: Metadata for report dates.
metadelay
: Metadata for reporting delays produced using
enw_metadata_delay()
.
max_delay
: Maximum delay to be modelled by epinowcast.
time
: Numeric, number of timepoints in the data.
snapshots
: Numeric, number of available data snapshots to use for
nowcasting.
groups
: Numeric, Number of groups/strata in the supplied observations
(set using by
).
max_date
: The maximum available report date.
Preprocessing functions
enw_add_delay()
,
enw_add_max_reported()
,
enw_add_metaobs_features()
,
enw_assign_group()
,
enw_complete_dates()
,
enw_construct_data()
,
enw_extend_date()
,
enw_filter_delay()
,
enw_filter_reference_dates()
,
enw_filter_report_dates()
,
enw_flag_observed_observations()
,
enw_impute_na_observations()
,
enw_latest_data()
,
enw_metadata_delay()
,
enw_metadata()
,
enw_missing_reference()
,
enw_reporting_triangle_to_long()
,
enw_reporting_triangle()
library(data.table) # Filter example hospitalisation data to be national and over all ages nat_germany_hosp <- germany_covid19_hosp[location == "DE"] nat_germany_hosp <- nat_germany_hosp[age_group == "00+"] # Preprocess with default settings pobs <- enw_preprocess_data(nat_germany_hosp) pobs
library(data.table) # Filter example hospitalisation data to be national and over all ages nat_germany_hosp <- germany_covid19_hosp[location == "DE"] nat_germany_hosp <- nat_germany_hosp[age_group == "00+"] # Preprocess with default settings pobs <- enw_preprocess_data(nat_germany_hosp) pobs
data.frame
to listConverts priors defined in a data.frame
into a list
format for use by stan. In addition it adds "_p" to all
variable names in order too allow them to be distinguished from
their standard usage within modelling code.
enw_priors_as_data_list(priors)
enw_priors_as_data_list(priors)
priors |
A |
A named list with each entry specifying a prior as a length two vector (specifying the mean and standard deviation of the prior).
Functions used to help convert models into the format required for stan
enw_formula_as_data_list()
,
enw_get_cache()
,
enw_model()
,
enw_replace_priors()
,
enw_sample()
,
enw_set_cache()
,
enw_stan_to_r()
,
enw_unset_cache()
,
remove_profiling()
,
write_stan_files_no_profile()
priors <- data.frame(variable = "x", mean = 1, sd = 2) enw_priors_as_data_list(priors)
priors <- data.frame(variable = "x", mean = 1, sd = 2) enw_priors_as_data_list(priors)
Convert summarised quantiles from wide to long format
enw_quantiles_to_long(posterior)
enw_quantiles_to_long(posterior)
posterior |
A |
A data.frame
of quantiles in long format.
Functions used for postprocessing of model fits
enw_add_latest_obs_to_nowcast()
,
enw_nowcast_samples()
,
enw_nowcast_summary()
,
enw_posterior()
,
enw_pp_summary()
,
enw_summarise_samples()
fit <- enw_example("nowcast") posterior <- enw_posterior(fit$fit[[1]], var = "expr_lelatent_int[1,1]") enw_quantiles_to_long(posterior)
fit <- enw_example("nowcast") posterior <- enw_posterior(fit$fit[[1]], var = "expr_lelatent_int[1,1]") enw_quantiles_to_long(posterior)
Reference date logit hazard reporting model module
enw_reference( parametric = ~1, distribution = c("lognormal", "none", "exponential", "gamma", "loglogistic"), non_parametric = ~0, data )
enw_reference( parametric = ~1, distribution = c("lognormal", "none", "exponential", "gamma", "loglogistic"), non_parametric = ~0, data )
parametric |
A formula (as implemented in |
distribution |
A character vector describing the parametric delay distribution to use. Current options are: "none", "lognormal", "gamma", "exponential", and "loglogistic", with the default being "lognormal". |
non_parametric |
A formula (as implemented in |
data |
Output from |
A list containing the supplied formulas, data passed into a list
describing the models, a data.frame
describing the priors used, and a
function that takes the output data and priors and returns a function that
can be used to sample from a tightened version of the prior distribution.
Model modules
enw_expectation()
,
enw_fit_opts()
,
enw_missing()
,
enw_obs()
,
enw_report()
# Parametric model with a lognormal distribution enw_reference( parametric = ~1, distribution = "lognormal", data = enw_example("preprocessed") ) # Non-parametric model with a random effect per delay enw_reference( parametric = ~ 0, non_parametric = ~ 1 + (1 | delay), data = enw_example("preprocessed") ) # Combined parametric and non-parametric model enw_reference( parametric = ~ 1, non_parametric = ~ 0 + (1 | delay_cat), data = enw_example("preprocessed") )
# Parametric model with a lognormal distribution enw_reference( parametric = ~1, distribution = "lognormal", data = enw_example("preprocessed") ) # Non-parametric model with a random effect per delay enw_reference( parametric = ~ 0, non_parametric = ~ 1 + (1 | delay), data = enw_example("preprocessed") ) # Combined parametric and non-parametric model enw_reference( parametric = ~ 1, non_parametric = ~ 0 + (1 | delay_cat), data = enw_example("preprocessed") )
Construct a lookup of references dates by report
enw_reference_by_report( missing_reference, reps_with_complete_refs, metareference, max_delay )
enw_reference_by_report( missing_reference, reps_with_complete_refs, metareference, max_delay )
missing_reference |
|
reps_with_complete_refs |
A |
metareference |
|
max_delay |
The maximum number of days to model in the delay
distribution. Must be an integer greater than or equal to 1. Observations
with delays larger then the maximum delay will be dropped. If the specified
maximum delay is too short, nowcasts can be biased as important parts of the
true delay distribution are cut off. At the same time, computational cost
scales non-linearly with this setting, so you want the maximum delay to be as
long as necessary, but not much longer. Consider what delays are realistic
for your application, and when in doubt, check if increasing the maximum
delay noticeably changes the delay distribution or nowcasts as estimated by
epinowcast. If it does, your maximum delay may still be too short.
Note that delays are zero indexed and so include the reference date and
|
A wide data.frame
with each row being a complete report date and'
the columns being the observation index for each reporting delay
Helper functions for model modules
add_max_observed_delay()
,
add_pmfs()
,
convolution_matrix()
,
enw_reps_with_complete_refs()
,
extract_obs_metadata()
,
extract_sparse_matrix()
,
latest_obs_as_matrix()
,
simulate_double_censored_pmf()
This function is used internally by epinowcast to replace
default model priors with users specified ones (restricted to
normal priors with specified mean and standard deviations). A common
use would be extracting the posterior from a previous epinowcast()
run (using summary(nowcast, type = fit)
) and using this a prior.
enw_replace_priors(priors, custom_priors)
enw_replace_priors(priors, custom_priors)
priors |
A |
custom_priors |
A |
A data.table of prior definitions (variable, mean and sd).
Functions used to help convert models into the format required for stan
enw_formula_as_data_list()
,
enw_get_cache()
,
enw_model()
,
enw_priors_as_data_list()
,
enw_sample()
,
enw_set_cache()
,
enw_stan_to_r()
,
enw_unset_cache()
,
remove_profiling()
,
write_stan_files_no_profile()
# Update priors from a data.frame priors <- data.frame(variable = c("x", "y"), mean = c(1, 2), sd = c(1, 2)) custom_priors <- data.frame(variable = "x[1]", mean = 10, sd = 2) enw_replace_priors(priors, custom_priors) # Update priors from a previous model fit default_priors <- enw_reference( distribution = "lognormal", data = enw_example("preprocessed"), )$priors print(default_priors) fit_priors <- summary( enw_example("nowcast"), type = "fit", variables = c("refp_mean_int", "refp_sd_int", "sqrt_phi") ) fit_priors enw_replace_priors(default_priors, fit_priors)
# Update priors from a data.frame priors <- data.frame(variable = c("x", "y"), mean = c(1, 2), sd = c(1, 2)) custom_priors <- data.frame(variable = "x[1]", mean = 10, sd = 2) enw_replace_priors(priors, custom_priors) # Update priors from a previous model fit default_priors <- enw_reference( distribution = "lognormal", data = enw_example("preprocessed"), )$priors print(default_priors) fit_priors <- summary( enw_example("nowcast"), type = "fit", variables = c("refp_mean_int", "refp_sd_int", "sqrt_phi") ) fit_priors enw_replace_priors(default_priors, fit_priors)
Report date logit hazard reporting model module
enw_report(non_parametric = ~0, structural = ~0, data)
enw_report(non_parametric = ~0, structural = ~0, data)
non_parametric |
A formula (as implemented in |
structural |
A formula with fixed effects and using only binary
variables, and factors describing the known reporting structure (i.e weekday
only reporting). The base case (i.e the first factor entry) should describe
the dates for which reporting is possible. Internally dates with a non-zero
element in the design matrix have their hazard set to 0. This can use
features defined by report date as defined in |
data |
Output from |
A list containing the supplied formulas, data passed into a list
describing the models, a data.frame
describing the priors used, and a
function that takes the output data and priors and returns a function that
can be used to sample from a tightened version of the prior distribution.
Model modules
enw_expectation()
,
enw_fit_opts()
,
enw_missing()
,
enw_obs()
,
enw_reference()
enw_report(data = enw_example("preprocessed"))
enw_report(data = enw_example("preprocessed"))
Constructs the reporting triangle with each row representing a reference date and columns being observations by report date
enw_reporting_triangle(obs)
enw_reporting_triangle(obs)
obs |
A |
A data.frame
with each row being a reference date, and columns
being observations by reporting delay.
Preprocessing functions
enw_add_delay()
,
enw_add_max_reported()
,
enw_add_metaobs_features()
,
enw_assign_group()
,
enw_complete_dates()
,
enw_construct_data()
,
enw_extend_date()
,
enw_filter_delay()
,
enw_filter_reference_dates()
,
enw_filter_report_dates()
,
enw_flag_observed_observations()
,
enw_impute_na_observations()
,
enw_latest_data()
,
enw_metadata_delay()
,
enw_metadata()
,
enw_missing_reference()
,
enw_preprocess_data()
,
enw_reporting_triangle_to_long()
obs <- enw_example("preprocessed")$new_confirm enw_reporting_triangle(obs)
obs <- enw_example("preprocessed")$new_confirm enw_reporting_triangle(obs)
Recast the reporting triangle from wide to long format
enw_reporting_triangle_to_long(obs)
enw_reporting_triangle_to_long(obs)
obs |
A |
A long format reporting triangle as a data.frame
with additional
variables new_confirm
and delay
.
Preprocessing functions
enw_add_delay()
,
enw_add_max_reported()
,
enw_add_metaobs_features()
,
enw_assign_group()
,
enw_complete_dates()
,
enw_construct_data()
,
enw_extend_date()
,
enw_filter_delay()
,
enw_filter_reference_dates()
,
enw_filter_report_dates()
,
enw_flag_observed_observations()
,
enw_impute_na_observations()
,
enw_latest_data()
,
enw_metadata_delay()
,
enw_metadata()
,
enw_missing_reference()
,
enw_preprocess_data()
,
enw_reporting_triangle()
obs <- enw_example("preprocessed")$new_confirm rt <- enw_reporting_triangle(obs) enw_reporting_triangle_to_long(rt)
obs <- enw_example("preprocessed")$new_confirm rt <- enw_reporting_triangle(obs) enw_reporting_triangle_to_long(rt)
Identify report dates with complete (i.e up to the maximum delay) reference dates
enw_reps_with_complete_refs(new_confirm, max_delay, by = NULL, copy = TRUE)
enw_reps_with_complete_refs(new_confirm, max_delay, by = NULL, copy = TRUE)
new_confirm |
|
max_delay |
The maximum number of days to model in the delay
distribution. Must be an integer greater than or equal to 1. Observations
with delays larger then the maximum delay will be dropped. If the specified
maximum delay is too short, nowcasts can be biased as important parts of the
true delay distribution are cut off. At the same time, computational cost
scales non-linearly with this setting, so you want the maximum delay to be as
long as necessary, but not much longer. Consider what delays are realistic
for your application, and when in doubt, check if increasing the maximum
delay noticeably changes the delay distribution or nowcasts as estimated by
epinowcast. If it does, your maximum delay may still be too short.
Note that delays are zero indexed and so include the reference date and
|
by |
A character vector describing the stratification of observations. This defaults to no grouping. This should be used when modelling multiple time series in order to identify them for downstream modelling |
copy |
A logical; if |
A data.frame
containing a report_date
variable, and grouping
variables specified for report dates that have complete reporting.
Helper functions for model modules
add_max_observed_delay()
,
add_pmfs()
,
convolution_matrix()
,
enw_reference_by_report()
,
extract_obs_metadata()
,
extract_sparse_matrix()
,
latest_obs_as_matrix()
,
simulate_double_censored_pmf()
Fit a CmdStan model using NUTS
enw_sample(data, model = epinowcast::enw_model(), diagnostics = TRUE, ...)
enw_sample(data, model = epinowcast::enw_model(), diagnostics = TRUE, ...)
data |
A list of data as produced by model modules (for example
|
model |
A |
diagnostics |
Logical, defaults to |
... |
Additional parameters passed to the |
A data.frame
containing the cmdstanr
fit, the input data, the
fitting arguments, and optionally summary diagnostics.
Functions used to help convert models into the format required for stan
enw_formula_as_data_list()
,
enw_get_cache()
,
enw_model()
,
enw_priors_as_data_list()
,
enw_replace_priors()
,
enw_set_cache()
,
enw_stan_to_r()
,
enw_unset_cache()
,
remove_profiling()
,
write_stan_files_no_profile()
Acts as a wrapper to scoringutils::score()
. In particular,
handling filtering nowcast summary output and linking this output to
observed data. See the documentation for the scoringutils
package for more
on forecast scoring.
enw_score_nowcast( nowcast, latest_obs, log = FALSE, check = FALSE, round_to = 3, ... )
enw_score_nowcast( nowcast, latest_obs, log = FALSE, check = FALSE, round_to = 3, ... )
nowcast |
A posterior nowcast or posterior prediction as returned by
|
latest_obs |
A |
log |
Logical, defaults to FALSE. Should scores be calculated on the log scale (with a 0.01 shift) for both observations and nowcasts. Scoring in this way can be thought of as a relative score vs the more usual absolute measure. It may be useful when targets are on very different scales or when the forecaster is more interested in good all round performance versus good performance for targets with large values. |
check |
Logical, defaults to FALSE. Should
|
round_to |
Integer defaults to 3. Number of digits to round scoring output to. |
... |
Arguments passed on to
|
A data.table
as returned by scoringutils::score()
.
library(data.table) library(scoringutils) # Summarise example nowcast nowcast <- enw_example("nowcast") summarised_nowcast <- summary(nowcast) # Load latest available observations obs <- enw_example("observations") # Keep the last 7 days of data obs <- obs[reference_date > (max(reference_date) - 7)] # score on the absolute scale scores <- enw_score_nowcast(summarised_nowcast, obs) summarise_scores(scores, by = "location") # score overall on a log scale log_scores <- enw_score_nowcast(summarised_nowcast, obs, log = TRUE) summarise_scores(log_scores, by = "location")
library(data.table) library(scoringutils) # Summarise example nowcast nowcast <- enw_example("nowcast") summarised_nowcast <- summary(nowcast) # Load latest available observations obs <- enw_example("observations") # Keep the last 7 days of data obs <- obs[reference_date > (max(reference_date) - 7)] # score on the absolute scale scores <- enw_score_nowcast(summarised_nowcast, obs) summarise_scores(scores, by = "location") # score overall on a log scale log_scores <- enw_score_nowcast(summarised_nowcast, obs, log = TRUE) summarise_scores(log_scores, by = "location")
This function allows the user to set a cache location for Stan models rather than a temporary directory. This can reduce the need for model compilation on every new model run across sessions or within a session. For R version 4.0.0 and above, it's recommended to use the persistent cache as shown in the example.
enw_set_cache(path, type = c("session", "persistent", "all"))
enw_set_cache(path, type = c("session", "persistent", "all"))
path |
A valid filepath representing the desired cache location. If the directory does not exist it will be created. |
type |
A character string specifying the cache type. It can be one of
"session", "persistent", or "all". Default is "session".
"session" sets the cache for the current session, "persistent" writes the
cache location to the user’s |
The string of the filepath set.
Functions used to help convert models into the format required for stan
enw_formula_as_data_list()
,
enw_get_cache()
,
enw_model()
,
enw_priors_as_data_list()
,
enw_replace_priors()
,
enw_sample()
,
enw_stan_to_r()
,
enw_unset_cache()
,
remove_profiling()
,
write_stan_files_no_profile()
# Set to local directory my_enw_cache <- enw_set_cache(file.path(tempdir(), "test")) enw_get_cache() ## Not run: # Use the package cache in R >= 4.0 if (R.version.string >= "4.0.0") { enw_set_cache( tools::R_user_dir(package = "epinowcast", "cache"), type = "all" ) } ## End(Not run)
# Set to local directory my_enw_cache <- enw_set_cache(file.path(tempdir(), "test")) enw_get_cache() ## Not run: # Use the package cache in R >= 4.0 if (R.version.string >= "4.0.0") { enw_set_cache( tools::R_user_dir(package = "epinowcast", "cache"), type = "all" ) } ## End(Not run)
A simple binomial simulator of missing data by reference date using simulated or observed data as an input. This function may be used to validate missing data models, as part of examples and case studies, or to explore the implications of missing data for your use case.
enw_simulate_missing_reference(obs, proportion = 0.2, by = NULL)
enw_simulate_missing_reference(obs, proportion = 0.2, by = NULL)
obs |
A |
proportion |
Numeric, the proportion of observations that are missing a reference date, indexed by reference date. Currently only a fixed proportion are supported and this defaults to 0.2. |
by |
A character vector describing the stratification of observations. This defaults to no grouping. This should be used when modelling multiple time series in order to identify them for downstream modelling |
A data.table
of the same format as the input but with a simulated
proportion of observations now having a missing reference date.
# Load and filter germany hospitalisations nat_germany_hosp <- subset( germany_covid19_hosp, location == "DE" & age_group == "00+" ) nat_germany_hosp <- enw_filter_report_dates( nat_germany_hosp, latest_date = "2021-08-01" ) # Make sure observations are complete nat_germany_hosp <- enw_complete_dates( nat_germany_hosp, by = c("location", "age_group"), missing_reference = FALSE ) # Simulate enw_simulate_missing_reference( nat_germany_hosp, proportion = 0.35, by = c("location", "age_group") )
# Load and filter germany hospitalisations nat_germany_hosp <- subset( germany_covid19_hosp, location == "DE" & age_group == "00+" ) nat_germany_hosp <- enw_filter_report_dates( nat_germany_hosp, latest_date = "2021-08-01" ) # Make sure observations are complete nat_germany_hosp <- enw_complete_dates( nat_germany_hosp, by = c("location", "age_group"), missing_reference = FALSE ) # Simulate enw_simulate_missing_reference( nat_germany_hosp, proportion = 0.35, by = c("location", "age_group") )
epinowcast
stan functions in RThis function facilitates the exposure of Stan functions from
the epinowcast package in R. It utilizes the expose_functions()
method
of cmdstanr::CmdStanModel or this purpose. This function is useful for
developers and contributors to the epinowcast package, as well as for
users interested in exploring and prototyping with model functionalities.
enw_stan_to_r( files = list.files(include), include = system.file("stan", "functions", package = "epinowcast"), global = TRUE, verbose = TRUE, ... )
enw_stan_to_r( files = list.files(include), include = system.file("stan", "functions", package = "epinowcast"), global = TRUE, verbose = TRUE, ... )
files |
A character vector specifying the names of Stan files to be
exposed. These must be in the |
include |
A character string specifying the directory containing Stan
files. Defaults to the 'stan/functions' directory of the |
global |
A logical value indicating whether to expose the functions
globally. Defaults to |
verbose |
Logical, defaults to |
... |
Additional arguments passed to enw_model. |
An object of class CmdStanModel
with functions from the model
exposed for use in R.
Functions used to help convert models into the format required for stan
enw_formula_as_data_list()
,
enw_get_cache()
,
enw_model()
,
enw_priors_as_data_list()
,
enw_replace_priors()
,
enw_sample()
,
enw_set_cache()
,
enw_unset_cache()
,
remove_profiling()
,
write_stan_files_no_profile()
# Compile functions in stan/functions/hazard.stan stan_functions <- enw_stan_to_r("hazard.stan") # These functions can now be used in R stan_functions$functions$prob_to_hazard(c(0.5, 0.1, 0.1)) # or exposed globally and used directly prob_to_hazard(c(0.5, 0.1, 0.1))
# Compile functions in stan/functions/hazard.stan stan_functions <- enw_stan_to_r("hazard.stan") # These functions can now be used in R stan_functions$functions$prob_to_hazard(c(0.5, 0.1, 0.1)) # or exposed globally and used directly prob_to_hazard(c(0.5, 0.1, 0.1))
This function summarises posterior samples for arbitrary strata. It optionally holds out the observed data (variables that are not ".draw", ".iteration", ".sample", ".chain" ) joins this to the summarised posterior.
enw_summarise_samples( samples, probs = c(0.05, 0.2, 0.35, 0.5, 0.65, 0.8, 0.95), by = c("reference_date", ".group"), link_with_obs = TRUE )
enw_summarise_samples( samples, probs = c(0.05, 0.2, 0.35, 0.5, 0.65, 0.8, 0.95), by = c("reference_date", ".group"), link_with_obs = TRUE )
samples |
A |
probs |
A vector of numeric probabilities to produce quantile summaries for. By default these are the 5%, 20%, 80%, and 95% quantiles which are also the minimum set required for plotting functions to work. |
by |
A character vector of variables to summarise by. Defaults to
|
link_with_obs |
Logical, should the observed data be linked to the
posterior summary? This is useful for plotting the posterior against the
observed data. Defaults to |
A data.frame
summarising the posterior samples.
Functions used for postprocessing of model fits
enw_add_latest_obs_to_nowcast()
,
enw_nowcast_samples()
,
enw_nowcast_summary()
,
enw_posterior()
,
enw_pp_summary()
,
enw_quantiles_to_long()
fit <- enw_example("nowcast") samples <- summary(fit, type = "nowcast_sample") enw_summarise_samples(samples, probs = c(0.05, 0.5, 0.95))
fit <- enw_example("nowcast") samples <- summary(fit, type = "nowcast_sample") enw_summarise_samples(samples, probs = c(0.05, 0.5, 0.95))
Optionally removes the enw_cache_location
environment variable from
the user .Renviron file and/or removes it from the local
environment. If you unset the local cache and want to switch
back to using the persistent cache, you can reload the
.Renviron
file using readRenviron("~/.Renviron")
.
enw_unset_cache(type = c("session", "persistent", "all"))
enw_unset_cache(type = c("session", "persistent", "all"))
type |
A character string specifying the type of cache to unset.
It can be one of "session", "persistent", or "all". Default is "session".
"session" unsets the cache for the current session, "persistent" removes the
cache location from the user’s |
The prior cache location, if it existed otherwise NULL
.
Functions used to help convert models into the format required for stan
enw_formula_as_data_list()
,
enw_get_cache()
,
enw_model()
,
enw_priors_as_data_list()
,
enw_replace_priors()
,
enw_sample()
,
enw_set_cache()
,
enw_stan_to_r()
,
remove_profiling()
,
write_stan_files_no_profile()
enw_unset_cache()
enw_unset_cache()
Provides a user friendly interface around package functionality to produce a nowcast from observed preprocessed data, and a series of user defined models. By default a model that assumes a fixed parametric reporting distribution with a flexible expectation model is used. Explore the individual model components for additional documentation and see the package case studies for example model specifications for different tasks.
epinowcast( data, reference = epinowcast::enw_reference(parametric = ~1, distribution = "lognormal", non_parametric = ~0, data = data), report = epinowcast::enw_report(non_parametric = ~0, structural = ~0, data = data), expectation = epinowcast::enw_expectation(r = ~0 + (1 | day:.group), generation_time = 1, observation = ~1, latent_reporting_delay = 1, data = data), missing = epinowcast::enw_missing(formula = ~0, data = data), obs = epinowcast::enw_obs(family = "negbin", data = data), fit = epinowcast::enw_fit_opts(sampler = epinowcast::enw_sample, nowcast = TRUE, pp = FALSE, likelihood = TRUE, debug = FALSE, output_loglik = FALSE), model = epinowcast::enw_model(), priors, ... )
epinowcast( data, reference = epinowcast::enw_reference(parametric = ~1, distribution = "lognormal", non_parametric = ~0, data = data), report = epinowcast::enw_report(non_parametric = ~0, structural = ~0, data = data), expectation = epinowcast::enw_expectation(r = ~0 + (1 | day:.group), generation_time = 1, observation = ~1, latent_reporting_delay = 1, data = data), missing = epinowcast::enw_missing(formula = ~0, data = data), obs = epinowcast::enw_obs(family = "negbin", data = data), fit = epinowcast::enw_fit_opts(sampler = epinowcast::enw_sample, nowcast = TRUE, pp = FALSE, likelihood = TRUE, debug = FALSE, output_loglik = FALSE), model = epinowcast::enw_model(), priors, ... )
data |
Output from |
reference |
The reference date indexed reporting process model
specification as defined using |
report |
The report date indexed reporting process model
specification as defined using |
expectation |
The expectation model specification as defined using
|
missing |
The missing reference date model specification as defined
using |
obs |
The observation model as defined by |
fit |
Model fit options as defined using |
model |
The model to use within |
priors |
A |
... |
Additional model modules to pass to |
A object of the class "epinowcast" which inherits from
enw_preprocess_data()
and data.table
, and combines the input data,
priors, and output from the sampler specified in enw_fit_opts()
.
Other epinowcast:
plot.epinowcast()
,
summary.epinowcast()
# Load data.table and ggplot2 library(data.table) library(ggplot2) # Use 2 cores options(mc.cores = 2) # Load and filter germany hospitalisations nat_germany_hosp <- germany_covid19_hosp[location == "DE"][age_group == "00+"] nat_germany_hosp <- enw_filter_report_dates( nat_germany_hosp, latest_date = "2021-10-01" ) # Make sure observations are complete nat_germany_hosp <- enw_complete_dates( nat_germany_hosp, by = c("location", "age_group") ) # Make a retrospective dataset retro_nat_germany <- enw_filter_report_dates( nat_germany_hosp, remove_days = 40 ) retro_nat_germany <- enw_filter_reference_dates( retro_nat_germany, include_days = 40 ) # Get latest observations for the same time period latest_obs <- enw_latest_data(nat_germany_hosp) latest_obs <- enw_filter_reference_dates( latest_obs, remove_days = 40, include_days = 20 ) # Preprocess observations (note this maximum delay is likely too short) pobs <- enw_preprocess_data(retro_nat_germany, max_delay = 20) # Fit the default nowcast model and produce a nowcast # Note that we have reduced samples for this example to reduce runtimes nowcast <- epinowcast(pobs, fit = enw_fit_opts( save_warmup = FALSE, pp = TRUE, chains = 2, iter_warmup = 500, iter_sampling = 500 ) ) nowcast # plot the nowcast vs latest available observations plot(nowcast, latest_obs = latest_obs) # plot posterior predictions for the delay distribution by date plot(nowcast, type = "posterior") + facet_wrap(vars(reference_date), scale = "free")
# Load data.table and ggplot2 library(data.table) library(ggplot2) # Use 2 cores options(mc.cores = 2) # Load and filter germany hospitalisations nat_germany_hosp <- germany_covid19_hosp[location == "DE"][age_group == "00+"] nat_germany_hosp <- enw_filter_report_dates( nat_germany_hosp, latest_date = "2021-10-01" ) # Make sure observations are complete nat_germany_hosp <- enw_complete_dates( nat_germany_hosp, by = c("location", "age_group") ) # Make a retrospective dataset retro_nat_germany <- enw_filter_report_dates( nat_germany_hosp, remove_days = 40 ) retro_nat_germany <- enw_filter_reference_dates( retro_nat_germany, include_days = 40 ) # Get latest observations for the same time period latest_obs <- enw_latest_data(nat_germany_hosp) latest_obs <- enw_filter_reference_dates( latest_obs, remove_days = 40, include_days = 20 ) # Preprocess observations (note this maximum delay is likely too short) pobs <- enw_preprocess_data(retro_nat_germany, max_delay = 20) # Fit the default nowcast model and produce a nowcast # Note that we have reduced samples for this example to reduce runtimes nowcast <- epinowcast(pobs, fit = enw_fit_opts( save_warmup = FALSE, pp = TRUE, chains = 2, iter_warmup = 500, iter_sampling = 500 ) ) nowcast # plot the nowcast vs latest available observations plot(nowcast, latest_obs = latest_obs) # plot posterior predictions for the delay distribution by date plot(nowcast, type = "posterior") + facet_wrap(vars(reference_date), scale = "free")
This function extracts metadata from the provided dataset to be used in the observation model.
extract_obs_metadata(new_confirm, observation_indicator = NULL)
extract_obs_metadata(new_confirm, observation_indicator = NULL)
new_confirm |
A data.table containing the columns: "reference_date",
"delay", ".group", "new_confirm", and "max_obs_delay".
As produced by |
observation_indicator |
A character string specifying the column name
in |
A list containing:
st
: time index of each snapshot (snapshot time).
ts
: snapshot index by time and group.
sl
: number of reported observations per snapshot (snapshot
length).
csl
: cumulative version of sl.
lsl
: number of consecutive reported observations per
snapshot accounting for missing data.
clsl
: cumulative version of lsl.
nsl
: number of observed observations per snapshot (snapshot
length).
cnsl
: cumulative version of nsl.
sg
: group index of each snapshot (snapshot group).
Helper functions for model modules
add_max_observed_delay()
,
add_pmfs()
,
convolution_matrix()
,
enw_reference_by_report()
,
enw_reps_with_complete_refs()
,
extract_sparse_matrix()
,
latest_obs_as_matrix()
,
simulate_double_censored_pmf()
This helper function allows the extraction of a sparse matrix from a matrix
using a similar approach to that implemented in
rstan::extract_sparse_parts()
and returns these elements in a named
list for use in stan. This function is used in the construction of the
expectation model (see enw_expectation()
).
extract_sparse_matrix(mat, prefix = "")
extract_sparse_matrix(mat, prefix = "")
mat |
A matrix to extract the sparse matrix from. |
prefix |
A character string to prefix the names of the returned list. |
A list representing the sparse matrix, containing:
nw
: Count of non-zero elements in mat
.
w
: Vector of non-zero elements in mat
. Equivalent to the numeric
values from mat
excluding zeros.
nv
: Length of v.
v
: Vector of row indices corresponding to each non-zero element in w
.
Indicates the row location in mat
for each non-zero value.
nu
: Length of u.
u
: Vector indicating the starting indices in w
for non-zero elements
of each row in mat
. Helps identify the partition of w
into different
rows of mat
.
Helper functions for model modules
add_max_observed_delay()
,
add_pmfs()
,
convolution_matrix()
,
enw_reference_by_report()
,
enw_reps_with_complete_refs()
,
extract_obs_metadata()
,
latest_obs_as_matrix()
,
simulate_double_censored_pmf()
mat <- matrix(1:12, nrow = 4) mat[2, 2] <- 0 mat[3, 1] <- 0 extract_sparse_matrix(mat)
mat <- matrix(1:12, nrow = 4) mat[2, 2] <- 0 mat[3, 1] <- 0 extract_sparse_matrix(mat)
Hospitalisations in Germany by date of report and reference
germany_covid19_hosp
germany_covid19_hosp
An object of class data.table
(inherits from data.frame
) with 1536885 rows and 5 columns.
A data.table
Package data sets
enw_example()
This function converts the string representation of the timestep to its corresponding numeric value or returns the numeric input (if it is a whole number). For "day", "week", it returns 1 and 7 respectively. For "month", it returns "month" as months are not a fixed number of days. If the input is a numeric whole number, it is returned as is.
get_internal_timestep(timestep)
get_internal_timestep(timestep)
timestep |
The timestep to used. This can be a string ("day", "week", "month") or a numeric whole number representing the number of days. |
A numeric value representing the number of days for "day" and "week", "month" for "month", or the input value if it is a numeric whole number.
Utility functions
aggregate_rolling_sum()
,
coerce_date()
,
coerce_dt()
,
date_to_numeric_modulus()
,
is.Date()
,
stan_fns_as_string()
Checks that an object is a date
is.Date(x)
is.Date(x)
x |
An object |
A logical
Utility functions
aggregate_rolling_sum()
,
coerce_date()
,
coerce_dt()
,
date_to_numeric_modulus()
,
get_internal_timestep()
,
stan_fns_as_string()
Convert latest observed data to a matrix
latest_obs_as_matrix(latest)
latest_obs_as_matrix(latest)
latest |
|
A matrix with each column being a group and each row a reference date
Helper functions for model modules
add_max_observed_delay()
,
add_pmfs()
,
convolution_matrix()
,
enw_reference_by_report()
,
enw_reps_with_complete_refs()
,
extract_obs_metadata()
,
extract_sparse_matrix()
,
simulate_double_censored_pmf()
This function uses a series internal functions
to break an input formula into its component parts each of which
can then be handled separately. Currently supported components are
fixed effects, lme4 style random effects, and random walks using the
rw()
helper function.
parse_formula(formula)
parse_formula(formula)
formula |
A model formula that may use standard fixed
effects, random effects using lme4 syntax (see |
A list of formula components. These currently include:
fixed
: A character vector of fixed effect terms
random
: A list of of lme4 style random effects
rw
: A character vector of rw()
random walk terms.
The random walk functions used internally by this function were
adapted from code written by J Scott (under an MIT license) as part of
the epidemia
package (https://github.com/ImperialCollegeLondon/epidemia/).
Functions used to help convert formulas into model designs
as_string_formula()
,
construct_re()
,
construct_rw()
,
enw_formula()
,
enw_manual_formula()
,
remove_rw_terms()
,
re()
,
rw_terms()
,
rw()
,
split_formula_to_terms()
epinowcast:::parse_formula(~ 1 + age_group + location) epinowcast:::parse_formula(~ 1 + age_group + (1 | location)) epinowcast:::parse_formula(~ 1 + (age_group | location)) epinowcast:::parse_formula(~ 1 + (1 | location) + rw(week, location))
epinowcast:::parse_formula(~ 1 + age_group + location) epinowcast:::parse_formula(~ 1 + age_group + (1 | location)) epinowcast:::parse_formula(~ 1 + (age_group | location)) epinowcast:::parse_formula(~ 1 + (1 | location) + rw(week, location))
plot
method for class "epinowcast".
## S3 method for class 'epinowcast' plot( x, latest_obs = NULL, type = c("nowcast", "posterior_prediction"), log = FALSE, ... )
## S3 method for class 'epinowcast' plot( x, latest_obs = NULL, type = c("nowcast", "posterior_prediction"), log = FALSE, ... )
x |
A |
latest_obs |
A |
type |
Character string indicating the plot required; enforced by
|
log |
Logical, defaults to |
... |
Additional arguments to the plot function specified by |
ggplot2
object
Other epinowcast:
epinowcast()
,
summary.epinowcast()
Plotting functions
enw_plot_nowcast_quantiles()
,
enw_plot_obs()
,
enw_plot_pp_quantiles()
,
enw_plot_quantiles()
,
enw_plot_theme()
nowcast <- enw_example("nowcast") latest_obs <- enw_example("obs") # Plot nowcast plot(nowcast, latest_obs = latest_obs, type = "nowcast") # Plot posterior predictions by reference date plot(nowcast, type = "posterior_prediction") + ggplot2::facet_wrap(ggplot2::vars(reference_date), scales = "free")
nowcast <- enw_example("nowcast") latest_obs <- enw_example("obs") # Plot nowcast plot(nowcast, latest_obs = latest_obs, type = "nowcast") # Plot posterior predictions by reference date plot(nowcast, type = "posterior_prediction") + ggplot2::facet_wrap(ggplot2::vars(reference_date), scales = "free")
Defines random effect terms using the lme4 syntax
re(formula)
re(formula)
formula |
A random effect as returned by |
A list defining the fixed and random effects of the specified random effect
Functions used to help convert formulas into model designs
as_string_formula()
,
construct_re()
,
construct_rw()
,
enw_formula()
,
enw_manual_formula()
,
parse_formula()
,
remove_rw_terms()
,
rw_terms()
,
rw()
,
split_formula_to_terms()
form <- epinowcast:::parse_formula(~ 1 + (1 | age_group)) re(form$random[[1]]) form <- epinowcast:::parse_formula(~ 1 + (location | age_group)) re(form$random[[1]])
form <- epinowcast:::parse_formula(~ 1 + (1 | age_group)) re(form$random[[1]]) form <- epinowcast:::parse_formula(~ 1 + (location | age_group)) re(form$random[[1]])
Remove profiling statements from a character vector representing stan code
remove_profiling(s)
remove_profiling(s)
s |
Character vector representing stan code |
A character
vector of the stan code without profiling statements
Functions used to help convert models into the format required for stan
enw_formula_as_data_list()
,
enw_get_cache()
,
enw_model()
,
enw_priors_as_data_list()
,
enw_replace_priors()
,
enw_sample()
,
enw_set_cache()
,
enw_stan_to_r()
,
enw_unset_cache()
,
write_stan_files_no_profile()
This function removes random walk terms
denoted using rw()
from a formula so that they can be
processed on their own.
remove_rw_terms(formula)
remove_rw_terms(formula)
formula |
A model formula that may use standard fixed
effects, random effects using lme4 syntax (see |
A formula object with the random walk terms removed.
This function was adapted from code written
by J Scott (under an MIT license) as part of
the epidemia
package (https://github.com/ImperialCollegeLondon/epidemia/).
Functions used to help convert formulas into model designs
as_string_formula()
,
construct_re()
,
construct_rw()
,
enw_formula()
,
enw_manual_formula()
,
parse_formula()
,
re()
,
rw_terms()
,
rw()
,
split_formula_to_terms()
epinowcast:::remove_rw_terms(~ 1 + age_group + location) epinowcast:::remove_rw_terms(~ 1 + age_group + location + rw(week, location))
epinowcast:::remove_rw_terms(~ 1 + age_group + location) epinowcast:::remove_rw_terms(~ 1 + age_group + location + rw(week, location))
A call to rw()
can be used in the 'formula' argument of model
construction functions in the epinowcast
package such as enw_formula()
.
Does not evaluate arguments but instead simply passes information for use in
model construction.
rw(time, by, type = c("independent", "dependent"))
rw(time, by, type = c("independent", "dependent"))
time |
Defines the random walk time period. |
by |
Defines the grouping parameter used for the random walk. If not specified no grouping is used. Currently this is limited to a single variable. |
type |
Character string, how standard deviation of grouped random
walks is estimated: "independent", or "dependent" across groups;
enforced by |
A list defining the time frame, group, and type with class
"enw_rw_term" that can be interpreted by construct_rw()
.
Functions used to help convert formulas into model designs
as_string_formula()
,
construct_re()
,
construct_rw()
,
enw_formula()
,
enw_manual_formula()
,
parse_formula()
,
remove_rw_terms()
,
re()
,
rw_terms()
,
split_formula_to_terms()
rw(time) rw(time, location) rw(time, location, type = "dependent")
rw(time) rw(time, location) rw(time, location, type = "dependent")
This function extracts random walk terms
denoted using rw()
from a formula so that they can be
processed on their own.
rw_terms(formula)
rw_terms(formula)
formula |
A model formula that may use standard fixed
effects, random effects using lme4 syntax (see |
A character vector containing the random walk terms that have been identified in the supplied formula.
This function was adapted from code written
by J Scott (under an MIT license) as part of
the epidemia
package (https://github.com/ImperialCollegeLondon/epidemia/).
Functions used to help convert formulas into model designs
as_string_formula()
,
construct_re()
,
construct_rw()
,
enw_formula()
,
enw_manual_formula()
,
parse_formula()
,
remove_rw_terms()
,
re()
,
rw()
,
split_formula_to_terms()
epinowcast:::rw_terms(~ 1 + age_group + location) epinowcast:::rw_terms(~ 1 + age_group + location + rw(week, location))
epinowcast:::rw_terms(~ 1 + age_group + location) epinowcast:::rw_terms(~ 1 + age_group + location + rw(week, location))
This function simulates the probability mass function of a daily double-censored process. The process involves two distributions: a primary distribution which represents the censoring process for the primary event and another distribution (which is offset by the primary).
simulate_double_censored_pmf( max, fun_primary = stats::runif, primary_args = list(), fun_dist = stats::rlnorm, dist_args = list(...), n = 1e+06, ... )
simulate_double_censored_pmf( max, fun_primary = stats::runif, primary_args = list(), fun_dist = stats::rlnorm, dist_args = list(...), n = 1e+06, ... )
max |
Maximum value for the computed CDF. If not specified, the maximum value is the maximum simulated delay. |
fun_primary |
Primary distribution function (default is |
primary_args |
List of additional arguments to be passed to the primary distribution function. |
fun_dist |
Distribution function to be added to the primary (default is
|
dist_args |
List of additional arguments to be passed to the distribution function. |
n |
Number of simulations (default is 1e6). |
... |
Additional arguments to be passed to the distribution function.
This is an alternative to |
A numeric vector representing the PMF.
Helper functions for model modules
add_max_observed_delay()
,
add_pmfs()
,
convolution_matrix()
,
enw_reference_by_report()
,
enw_reps_with_complete_refs()
,
extract_obs_metadata()
,
extract_sparse_matrix()
,
latest_obs_as_matrix()
simulate_double_censored_pmf(10, meanlog = 0, sdlog = 1)
simulate_double_censored_pmf(10, meanlog = 0, sdlog = 1)
Split formula into individual terms
split_formula_to_terms(formula)
split_formula_to_terms(formula)
formula |
A model formula that may use standard fixed
effects, random effects using lme4 syntax (see |
A character vector of formula terms
Functions used to help convert formulas into model designs
as_string_formula()
,
construct_re()
,
construct_rw()
,
enw_formula()
,
enw_manual_formula()
,
parse_formula()
,
remove_rw_terms()
,
re()
,
rw_terms()
,
rw()
epinowcast:::split_formula_to_terms(~ 1 + age_group + location)
epinowcast:::split_formula_to_terms(~ 1 + age_group + location)
Read in a stan function file as a character string
stan_fns_as_string(files, include)
stan_fns_as_string(files, include)
files |
A character vector specifying the names of Stan files to be
exposed. These must be in the |
include |
A character string specifying the directory containing Stan
files. Defaults to the 'stan/functions' directory of the |
A character string in the of stan functions.
Utility functions
aggregate_rolling_sum()
,
coerce_date()
,
coerce_dt()
,
date_to_numeric_modulus()
,
get_internal_timestep()
,
is.Date()
summary
method for class "epinowcast".
## S3 method for class 'epinowcast' summary( object, type = c("nowcast", "nowcast_samples", "fit", "posterior_prediction"), max_delay = object$max_delay, ... )
## S3 method for class 'epinowcast' summary( object, type = c("nowcast", "nowcast_samples", "fit", "posterior_prediction"), max_delay = object$max_delay, ... )
object |
A |
type |
Character string indicating the summary to return; enforced by
|
max_delay |
Maximum delay to which nowcasts should be summarised. Must be equal (default) or larger than the modelled maximum delay. If it is larger, then nowcasts for unmodelled dates are added by assuming that case counts beyond the modelled maximum delay are fully observed. |
... |
Additional arguments passed to summary specified by |
A summary data.frame
summary epinowcast
Other epinowcast:
epinowcast()
,
plot.epinowcast()
nowcast <- enw_example("nowcast") # Summarise nowcast posterior summary(nowcast, type = "nowcast") # Nowcast posterior samples summary(nowcast, type = "nowcast_samples") # Nowcast model fit summary(nowcast, type = "fit") # Posterior predictions summary(nowcast, type = "posterior_prediction")
nowcast <- enw_example("nowcast") # Summarise nowcast posterior summary(nowcast, type = "nowcast") # Nowcast posterior samples summary(nowcast, type = "nowcast_samples") # Nowcast model fit summary(nowcast, type = "fit") # Posterior predictions summary(nowcast, type = "posterior_prediction")
Write copies of the .stan files of a Stan model and its #include files with all profiling statements removed.
write_stan_files_no_profile( stan_file, include_paths = NULL, target_dir = epinowcast::enw_get_cache() )
write_stan_files_no_profile( stan_file, include_paths = NULL, target_dir = epinowcast::enw_get_cache() )
stan_file |
The path to a .stan file containing a Stan program. |
include_paths |
Paths to directories where Stan should look for files specified in #include directives in the Stan program. |
target_dir |
The path to a directory in which the manipulated .stan
files without profiling statements should be stored. To avoid overriding of
the original .stan files, this should be different from the directory of the
original model and the |
A list
containing the path to the .stan file without profiling
statements and the include_paths for the included .stan files without
profiling statements
Functions used to help convert models into the format required for stan
enw_formula_as_data_list()
,
enw_get_cache()
,
enw_model()
,
enw_priors_as_data_list()
,
enw_replace_priors()
,
enw_sample()
,
enw_set_cache()
,
enw_stan_to_r()
,
enw_unset_cache()
,
remove_profiling()