This vignette is partly based on this nowcasting example by Sebastian Funk and Sam Abbott.

In this case study we walk through a simple approach to jointly
estimating the effective reproduction number over time and the delay
from a positive test to this test being reported. For more on the
`epinowcast`

package see the documentation.

An epidemic is in progress. We want to know what the
**effective reproduction number is at the current time**.
This is the average number of secondary infections caused by a single
infected individual at a given point in time and is a measure of the
transmissibility of the infection in the population at that time.

Unfortunately, we can’t directly observe infections and the data we do observe suffers from reporting delays.

- A linelist of cases with a date of interest (for example test positive date) and a date of report (for example date of hospitalisation report);
- An estimate of the distribution of times from infection to positive test;
- An estimate of the generation time distribution, which is the distribution of the time intervals between the infection of a primary case and the infection of secondary cases caused by the primary case.
- An estimate of the ascertainment rate, which is the proportion of infections that are reported as hospitalisations.

- Visualise and contextualise the data this case study uses (COVID-19 hospitalisations in Germany);
- Specify a joint model of the effective reproduction number, the delay from infection to hospitalisation and the delay from hospitalisation to report;
- Fit this model to both the data that would have been available in real-time, and retrospectively;
- Visualise the nowcast from the real-time data in comparison to the data that was ultimately observed;
- Compare real-time and retrospective reproduction number estimates and delay distribution estimates;
- Outline the limitations and strengths of this approach and highlight areas where it can be extended.

`epinowcast`

is still in development and so the following
case study has pain points and limitations that we are working on
improving. If you have any suggestions or feedback please get in touch
via the issues
page. We also welcome contributions via pull requests (see our contributing
guidelines.

See the package
documentation for guidance on installing the package and getting
setup with `cmdstanr`

(the backend used here for fitting
models).

As well as the `epinowcast`

package we will use the
following packages in this vignette.

- Here we make use of
`data.table`

for simple data transformations but tools from the tidyverse or base R tools could just as easily be used; - We also use
`purrr`

to make use of the`partial()`

function which allows us to partially specify a function for later reuse. This is useful for specifying the model as we can specify the parts of the model that are common to all models and then specify the parts that are specific to each model when we call the updated function; - Finally, we use
`ggplot2`

for plotting.

In this case study, we use data sourced from the Robert Koch Institute via the Germany Nowcasting hub. The data represent hospitalisation counts by date of positive test and date of test report in Germany. These data have been used extensively for nowcasting and forecasting COVID-19 hospitalisations in Germany and are described in detail in Wolffram et al. (2023). We use these data as they are a good example of the type of data that are available in many settings. We first summarise the data,

```
summary(germany_covid19_hosp)
#> reference_date location age_group confirm
#> Min. :2021-04-06 DE : 90405 00-04:219555 Min. : 0.00
#> 1st Qu.:2021-05-15 DE-BB : 90405 00+ :219555 1st Qu.: 0.00
#> Median :2021-06-23 DE-BE : 90405 05-14:219555 Median : 1.00
#> Mean :2021-06-25 DE-BW : 90405 15-34:219555 Mean : 11.44
#> 3rd Qu.:2021-08-02 DE-BY : 90405 35-59:219555 3rd Qu.: 6.00
#> Max. :2021-10-20 DE-HB : 90405 60-79:219555 Max. :1552.00
#> (Other):994455 80+ :219555
#> report_date
#> Min. :2021-04-06
#> 1st Qu.:2021-06-24
#> Median :2021-08-03
#> Mean :2021-07-31
#> 3rd Qu.:2021-09-11
#> Max. :2021-10-20
#>
```

In this case study we only consider national level data without
stratification by age group. We can filter these data using
`data.table`

as follows,

```
germany_hosp <- germany_covid19_hosp[location == "DE"][age_group == "00+"]
germany_hosp <- germany_hosp[, .(reference_date, report_date, confirm)]
germany_hosp
#> reference_date report_date confirm
#> <IDat> <Date> <int>
#> 1: 2021-04-06 2021-04-06 149
#> 2: 2021-04-07 2021-04-07 312
#> 3: 2021-04-08 2021-04-08 424
#> 4: 2021-04-09 2021-04-09 288
#> 5: 2021-04-10 2021-04-10 273
#> ---
#> 12911: 2021-07-27 2021-10-16 81
#> 12912: 2021-07-28 2021-10-17 159
#> 12913: 2021-07-29 2021-10-18 145
#> 12914: 2021-07-30 2021-10-19 117
#> 12915: 2021-07-31 2021-10-20 132
```

These data are already in a format that can be used with
`epinowcast`

, as it contains

- a reference date (column
`reference_date`

): the date of the observation, in this example the date of a positive test - a report date (column
`report_date`

): the date of report for a given set of observations by reference date - a count (column
`confirm`

): the total number of hospitalisations by reference date and report date. Note that this is cumulative by report date.

The package also provides a range of tools to convert data from line list, incidence, or other common formats into the required format (see Data converters). For example we could convert the data to a individual case linelist,

```
germany_covid19_hosp_linelist <- germany_covid19_hosp |>
enw_add_incidence() |>
enw_incidence_to_linelist()
germany_covid19_hosp_linelist
#> id reference_date location age_group report_date delay
#> <int> <IDat> <fctr> <fctr> <IDat> <int>
#> 1: 1 2021-04-06 DE 00+ 2021-04-06 0
#> 2: 2 2021-04-06 DE 00+ 2021-04-06 0
#> 3: 3 2021-04-06 DE 00+ 2021-04-06 0
#> 4: 4 2021-04-06 DE 00+ 2021-04-06 0
#> 5: 5 2021-04-06 DE 00+ 2021-04-06 0
#> ---
#> 11486202: 11486202 2021-10-20 DE-TH 15-34 2021-10-20 115
#> 11486203: 11486203 2021-10-20 DE-TH 60-79 2021-10-20 117
#> 11486204: 11486204 2021-10-20 DE-TH 60-79 2021-10-20 117
#> 11486205: 11486205 2021-10-20 DE-TH 60-79 2021-10-20 117
#> 11486206: 11486206 2021-10-20 DE-TH 60-79 2021-10-20 117
```

This linelist could then be itself converted into the format
`epinowcast`

requires using the
`enw_linelist_to_incidence()`

function,

```
incidence_from_linelist <- enw_linelist_to_incidence(
germany_covid19_hosp_linelist,
reference_date = "reference_date",
report_date = "report_date",
by = c("age_group", "location"),
max_delay = 30
)
incidence_from_linelist
#> Key: <age_group, location>
#> age_group location report_date reference_date new_confirm confirm
#> <fctr> <fctr> <IDat> <IDat> <int> <int>
#> 1: 00-04 DE-BY 2021-04-06 <NA> 0 0
#> 2: 00-04 DE-BY 2021-04-07 <NA> 0 0
#> 3: 00-04 DE-BY 2021-04-08 <NA> 0 0
#> 4: 00-04 DE-BY 2021-04-09 <NA> 0 0
#> 5: 00-04 DE-BY 2021-04-10 <NA> 0 0
#> ---
#> 2049593: 80+ DE-TH 2021-10-19 2021-10-18 0 0
#> 2049594: 80+ DE-TH 2021-10-20 2021-10-18 0 0
#> 2049595: 80+ DE-TH 2021-10-19 2021-10-19 1 1
#> 2049596: 80+ DE-TH 2021-10-20 2021-10-19 0 1
#> 2049597: 80+ DE-TH 2021-10-20 2021-10-20 0 0
#> delay
#> <int>
#> 1: 0
#> 2: 1
#> 3: 2
#> 4: 3
#> 5: 4
#> ---
#> 2049593: 1
#> 2049594: 2
#> 2049595: 0
#> 2049596: 1
#> 2049597: 0
```

Here we have specified the `reference_date`

and
`report_date`

columns, the `by`

columns (which are
used to group the data), and the `max_delay`

which is the
maximum delay between the reference date and the report date (note that
the observed delay is longer than our specified delay and so is used
instead). The `enw_linelist_to_incidence()`

function will
then calculate the incidence by reference date and report date for each
group.

For this case study we will only consider data from the 1st of March
2020 onwards. As a first step, we filter the data to only include
observations from the 1st of May 2021 until the 1st of August 2021. We
do this using the `enw_filter_report_dates()`

and
`enw_filter_reference_dates()`

functions,

```
complete_germany_hosp <- germany_hosp |>
enw_filter_report_dates(latest_date = "2021-08-01") |>
enw_filter_reference_dates(earliest_date = "2021-05-01") |>
enw_complete_dates(missing_reference = FALSE) |>
enw_add_incidence()
```

Here we have also used `enw_complete_dates()`

to make sure
that we have complete data for all dates between the earliest and latest
dates. We have also used `enw_add_incidence()`

to add the
incidence column to the data.

Next we split the data into two parts, the data that was available at
the time and that will be used to fit the model
(`rt_nat_germany`

) and the data that was available
retrospectively about the same time period will be used to evaluate the
model (`retro_nat_germany`

). We again do this using the
`enw_filter_report_dates()`

and
`enw_filter_reference_dates()`

functions as follows:

- Create the real-time dataset (
`rt_nat_germany`

) by filtering the data to only include reported observations from the 1st of May 2021 until the 1st of July 2021;

```
rt_germany <- complete_germany_hosp |>
enw_filter_report_dates(latest_date = "2021-07-01")
rt_germany
#> report_date reference_date confirm new_confirm delay
#> <IDat> <IDat> <int> <int> <int>
#> 1: 2021-05-01 2021-05-01 234 234 0
#> 2: 2021-05-02 2021-05-01 352 118 1
#> 3: 2021-05-03 2021-05-01 391 39 2
#> 4: 2021-05-04 2021-05-01 464 73 3
#> 5: 2021-05-05 2021-05-01 507 43 4
#> ---
#> 1949: 2021-06-30 2021-06-29 31 11 1
#> 1950: 2021-07-01 2021-06-29 35 4 2
#> 1951: 2021-06-30 2021-06-30 20 20 0
#> 1952: 2021-07-01 2021-06-30 37 17 1
#> 1953: 2021-07-01 2021-07-01 20 20 0
```

- Create the retrospective dataset (
`retro_germany`

) by filtering the data to only include observations with reference dates (i.e. they may have been reported anytime up to the 1st of August 2021) from the 1st of May 2021 until the 1st of July 2021;

```
retro_germany <- complete_germany_hosp |>
enw_filter_reference_dates(latest_date = "2021-07-01")
retro_germany
#> report_date reference_date confirm new_confirm delay
#> <IDat> <IDat> <int> <int> <int>
#> 1: 2021-05-01 2021-05-01 234 234 0
#> 2: 2021-05-02 2021-05-01 352 118 1
#> 3: 2021-05-03 2021-05-01 391 39 2
#> 4: 2021-05-04 2021-05-01 464 73 3
#> 5: 2021-05-05 2021-05-01 507 43 4
#> ---
#> 3871: 2021-07-28 2021-07-01 57 0 27
#> 3872: 2021-07-29 2021-07-01 57 0 28
#> 3873: 2021-07-30 2021-07-01 57 0 29
#> 3874: 2021-07-31 2021-07-01 57 0 30
#> 3875: 2021-08-01 2021-07-01 57 0 31
```

Finally we can create a dataset that contains the latest data
available for each reference date. We do this using the
`enw_latest_data()`

function,

```
latest_germany_hosp <- retro_germany |>
enw_latest_data()
head(latest_germany_hosp, n = 10)
#> reference_date report_date confirm new_confirm delay
#> <IDat> <IDat> <int> <int> <int>
#> 1: 2021-05-01 2021-08-01 1007 0 92
#> 2: 2021-05-02 2021-08-01 781 0 91
#> 3: 2021-05-03 2021-08-01 467 0 90
#> 4: 2021-05-04 2021-08-01 823 0 89
#> 5: 2021-05-05 2021-08-01 1028 0 88
#> 6: 2021-05-06 2021-08-01 1016 0 87
#> 7: 2021-05-07 2021-08-01 892 0 86
#> 8: 2021-05-08 2021-08-01 822 0 85
#> 9: 2021-05-09 2021-08-01 561 0 84
#> 10: 2021-05-10 2021-08-01 364 0 83
```

Before we define, or fit, a model we should visualise the data to get
an idea of the trends in the data and its reporting structure. There is
currently no function in `epinowcast`

to visualise the data,
but we can use the `ggplot2`

package to do this manually as
follows,

```
gh_vis_cohorts <- copy(retro_germany)[
,
report_date := fcase(
report_date <= as.Date("2021-05-15"), as.Date("2021-05-15"),
report_date <= as.Date("2021-06-01"), as.Date("2021-06-01"),
report_date <= as.Date("2021-06-15"), as.Date("2021-06-15"),
report_date <= as.Date("2021-07-01"), as.Date("2021-07-01"),
report_date <= as.Date("2021-07-15"), as.Date("2021-07-15"),
report_date <= as.Date("2021-08-01"), as.Date("2021-08-01")
) |>
factor(levels = rev(c(
"2021-05-15", "2021-06-01", "2021-06-15", "2021-07-01",
"2021-07-15", "2021-08-01"
)))
]
gh_vis_cohorts_by_reference <- gh_vis_cohorts[,
.(confirm = sum(new_confirm)),
by = .(reference_date)
]
gh_vis_cohorts_by_ref_rep <- gh_vis_cohorts[,
.(confirm = sum(new_confirm)),
by = .(reference_date, report_date)
]
gh_vis_cohorts_by_ref_rep |>
ggplot() +
aes(
x = reference_date, y = confirm, fill = report_date, group = report_date
) +
geom_col(position = "stack", alpha = 1, col = "grey") +
geom_vline(
aes(xintercept = as.Date(report_date)),
linetype = 2, alpha = 0.9
) +
scale_y_continuous(labels = \(x)(scales::comma(x, accuracy = 1))) +
scale_fill_brewer(
palette = "Blues", aesthetics = c("color", "fill")
) +
theme_bw() +
labs(
x = "Date of positive test",
y = "Hospitalised cases by date of positive test",
fill = "Report date"
) +
guides(fill = guide_legend(reverse = TRUE)) +
theme(legend.position = "bottom")
```

The `epinowcast`

package provides a flexible framework for
modelling the incidence of infections and subsequent measures such as
hospitalisations that are themselves reported with some delay. The
framework is based on a generative model that describes the hypothesised
process by which the data are generated. The generative model is then
used to define a likelihood function that describes the probability of
observing the data given the model parameters. The likelihood function
is then combined with prior information used to fit the model parameters
to the data. In the following sections we describe the generative model
for the incidence of infections and hospitalisations. If you would like
to know more about the general framework see the model
vignette.

The first part of the generative process we consider is the expected number of hospitalisations by date positive test. This is the number of hospitalisations that we would expect to observe if we had perfect reporting of hospitalisations and there was no unmodelled variance. This part of the model is also decomposed into three parts: the expected number of infections, the delay from infection to positive test, and the probability of hospitalisation given an infection. These three parts are combined to give the expected number of hospitalisations by date positive test. We describe each of these parts in turn.

We model expected infections as a function of the effective
(instantaneous) reproduction number and previous infections weighted by
the generation time using the renewal equation^{[1,2]}. This corresponds to the
commonly used Susceptible-Exposed-Infected-Recovered (SEIR) model when
the appropriate generation time is specified^{[3]}.

We model the instantaneous reproduction number (*R*_{t}) on the log
scale as a weekly random walk as follows

where *r*_{0} is the
intercept, and $\sum_{i = 0}^{t \% 7}
\epsilon_i$ is a weekly random walk.

We choose this model because it is flexible, can be used to model a
range of different scenarios, and is parsimonious. For example, if we
set *σ*_{ϵ} = 0
then the model reduces to a constant reproduction number. If we set
*σ*_{ϵ} > 0
then the model allows for changes in the reproduction number over time.
Other sensible choices for the model include a random walk with a longer
time scale (e.g. monthly) or shorter time scale (e.g. daily), a random
walk with a drift term, or a random walk with a drift term and a
seasonal component.

We model the expected number of infections/latent notifications
(*λ*^{l}) using
a renewal process^{[1,2]}. This model is a generalisation
of the default `epinowcast`

model and implies that current
infections/notifications are dependent on past infections/notifications
based on a kernel. This kernel is defined by the generation time
distribution, the probability distribution of the time between infection
and subsequent infectiousness, and the previously defined model for the
effective reproduction number. The model is defined as follows,

We assume the generation time (*g*) is a gamma distribution with mean
4 days and a standard deviation of 3 days. To initialise the model we
assume that the first *P* latent
notifications are log-normally distributed with mean *μ*_{g, t}^{l}
and standard deviation *σ*_{g, t}^{l}.
This is equivalent to assuming that the first *P* latent notifications are
independent and identically distributed. The mean of the log-normal
distribution for each group is the log of the latest reported case count
for the first reference date for that group scaled by the sum of the
latent reporting delay. The standard deviation is assumed to be 1.

In many settings, such as in this case study, there may be additional
reporting delays on top of those that are directly observed in the data,
and therefore “nowcastable”, a common example is the delay from exposure
to symptom onset. For these settings we support modelling “latent”
reporting delays as a convolution of the underlying expected counts with
the potential for these delays to vary over time and by group. This
implementation is similar to that implemented in `EpiNow2`

and `epidemia`

as well as other similar models^{[4–7]}. In addition to this, we
support modelling ascertainment through the use of improper probability
mass functions (i.e. by not enforcing a sum to 1 constraint) and
inferring ascertainment where possible (for example day of the week
reporting patterns).

Here we assume an infection to hospitalisation ratio of 2% modified by a random effect for the day of the week to account for within week reporting periodicity. We also assume a lognormal reporting delay with a mean of 5 days and a standard deviation of 2 days, representing the delay from infection to positive test. We assume that the reporting delay is the same for all days of the week. This model is defined as follows,

Where *ν*_{i} is a random
effect for each day of the week which represents reporting
periodicity.

`epinowcast::enw_expectation()`

We first need to define the weekly random walk model for the effective reproduction number. We can do this using the formula interface as follows,

Next we define the generation time distribution. As already discussed, we assume a gamma distribution with mean 4 days and a standard deviation of 3 days. These summary parameters first need to be transformed to the shape and scale parameters used for the gamma distribution. We then need to account for daily censoring and finally normalise the probability mass function (PMF) to account for right truncation.

```
# first transform mean and sd to shape and scale
gamma_mean <- 4
gamma_sd <- 3
gamma_shape <- gamma_mean^2 / gamma_sd^2
gamma_scale <- gamma_sd^2 / gamma_mean
# then discretise the distribution
gt_pmf <- simulate_double_censored_pmf(
max = 15, shape = gamma_shape, scale = gamma_scale, fun_dist = rgamma
) |>
# and normalise it to account for right truncation
(\(x) x / sum(x))()
plot(gt_pmf)
```

Here and for other PMFs we set the maximum delay to be 15 days as this gives good coverage of the majority of the distribution and limits the computational burden of fitting the model.

Now we define the latent reporting delay in a similar way remembering we specified a lognormal distribution with a mean of 5 days and a standard deviation of 2 days. We again also need to account for daily censoring and normalise the PMF to account for right truncation.

```
lgn_mean <- 5
lgn_sd <- 2
meanlog <- log(lgn_mean^2 / sqrt(lgn_sd^2 + lgn_mean^2))
sdlog <- sqrt(log(1 + lgn_sd^2 / lgn_mean^2))
# then discretise the distribution
reporting_pmf <- simulate_double_censored_pmf(
max = 15, meanlog = meanlog, sdlog = sdlog, fun_dist = rlnorm
) |>
# normalise it to account for right truncation
(\(x) x / sum(x))() |>
# and scale it by the infection to hospitalisation ration (2%)
(\(x) x * 0.02)()
plot(reporting_pmf)
```

The last part of the model to define is the remaining part of the ascertainment model for expected hospitalisations. As we have already defined the base ascertainment ratio (i.e. the 2% infection to hospitalisation ratio) we only need to define the day of the week random effect. We can do this using the formula interface as follows,

Finally, we can combine these four parts of the model to define the
expected hospitalisations by date of positive test. We do this using the
`epinowcast::enw_expectation()`

function as follows,

Given case counts both by date of reference and by date of report, we can estimate the reporting delay distribution directly and jointly with the underlying process model, rather than relying on external estimates from other sources (though we may want to account for external information in our priors).

We consider the reporting delay to follow a LogNormal(*μ*^{d}, *σ*^{d})
distribution, with parameters

This distribution is discretised into daily probabilities *p*_{t, d}
and adjusted for the maximum delay, see
`vignette("distributions")`

for details.

Expected notifications by date of positive test (*t*) and reporting date can now be
found by multiplying expected final notifications by date of positive
test for each *t* with the
probability of reporting for each day of delay (*p*_{t, d}).
We assume a Poisson observation model and produce a nowcast of final
observed hospitalisations at each reference date by summing posterior
estimates for unobserved notification and observed notifications for
that reference date.

Before fitting the model we have just defined we need to preprocess
the data in order for it to be in the correct format to work with
`epinowcast`

and to produce metadata that describes aspects
of the data we use in the model (for example the maximum delay, the
number of groups, and the number of observations in each group). We do
this using the `enw_preprocess_data()`

function as follows
for both the real-time and retrospective datasets. Note that we set
`max_delay = 30`

to constrain the modelled maximum delay to
30 days. This is a pragmatic choice to ensure that the model can be fit
in a reasonable amount of time, but we could also set this to be longer
if we wanted to and if the data suggested a longer delay was
possible.

```
rt_germany_pobs <- enw_preprocess_data(rt_germany, max_delay = 30)
rt_germany_pobs
#> obs new_confirm latest
#> <list> <list> <list>
#> 1: <data.table[1425x8]> <data.table[1425x9]> <data.table[62x8]>
#> missing_reference reporting_triangle metareference metareport
#> <list> <list> <list> <list>
#> 1: <data.table[0x4]> <data.table[62x32]> <data.table[62x8]> <data.table[91x10]>
#> metadelay max_delay time snapshots by groups max_date
#> <list> <num> <int> <int> <list> <int> <IDat>
#> 1: <data.table[30x5]> 30 62 62 1 2021-07-01
#> timestep
#> <char>
#> 1: day
```

and do the same for the retrospective data

```
retro_germany_pobs <- enw_preprocess_data(retro_germany, max_delay = 30)
retro_germany_pobs
#> obs new_confirm latest
#> <list> <list> <list>
#> 1: <data.table[1860x8]> <data.table[1860x9]> <data.table[62x8]>
#> missing_reference reporting_triangle metareference
#> <list> <list> <list>
#> 1: <data.table[0x4]> <data.table[62x32]> <data.table[62x8]>
#> metareport metadelay max_delay time snapshots by
#> <list> <list> <num> <int> <int> <list>
#> 1: <data.table[120x10]> <data.table[30x5]> 30 62 62
#> groups max_date timestep
#> <int> <IDat> <char>
#> 1: 1 2021-07-30 day
```

`epinowcast`

modelBefore we are ready to fit the model we need to first specify some
fitting options for using `cmdstanr`

(more information on
these options can be found in the `cmdstanr`

documentation).
We use 2 chains with 2 threads per chain (so using 4 CPU cores in total)
500 warmup samples and 1000 iterations per chain. As this is a complex
model we set `adapt_delta`

to 0.98 (the default is 0.8) so
that the Hamiltonian Monte Carlo sampler can explore the posterior
distribution more efficiently. For the same reason, we have also set the
`max_treedepth`

to 15 (the default is 10). Finally, we set
`save_warmup`

to `FALSE`

to save space (we don’t
need the warmup samples for this analysis), and `pp`

to
`TRUE`

so that we can use posterior predictive checks to
assess the model fit.

```
fit_module <- partial(enw_fit_opts,
chains = 2,
parallel_chains = 2,
threads_per_chain = floor(min(max(1, parallel::detectCores() / 4), 2)),
iter_warmup = 500,
iter_sampling = 1000,
adapt_delta = 0.95,
max_treedepth = 12,
save_warmup = FALSE,
pp = TRUE,
show_messages = FALSE, # set this to TRUE to show fitting messages
refresh = 0 # remove this to show fitting progress
)
```

If you have more than 4 CPU cores available you can increase the
number of threads per chain to make use of these additional cores. For
example if you have 8 CPU cores available you could set
`threads_per_chain = 4`

to use all 8 cores (as there are 2
chains).

We also need to compile the model using `cmdstanr`

before
fitting it (we only have to do this once after installing the package).
In order for this line to work you should have followed the installation
guide in the README. This
will take around a minute to run when called for the first time.

Now we are ready to bring together all the modules we have specified, combine them with the model we have just compiled, and fit our synthetic data. We first fit to the data that would have been available in real-time,

```
germany_nowcast <- epinowcast(
data = rt_germany_pobs,
expectation = expectation_module(data = rt_germany_pobs),
reference = reference_module(data = rt_germany_pobs),
obs = obs_module(data = rt_germany_pobs),
fit = fit_module(),
model = epinowcast_model
)
```

Fitting this model with the default options will take around 5 minutes to run on a laptop with 4 CPU cores. If you have more CPU cores available you can reduce the time it takes to fit the model by increasing the number of threads per chain (see the fitting options section above).

and then on the retrospectively observed data.

```
retro_germany_nowcast <- epinowcast(
data = retro_germany_pobs,
expectation = expectation_module(data = retro_germany_pobs),
reference = reference_module(data = retro_germany_pobs),
obs = obs_module(data = retro_germany_pobs),
fit = fit_module(),
model = epinowcast_model
)
```

You may see a number of warning messages from `cmdstanr`

when fitting the model. These generally occur when the model is being
initialised and are due to this process not currently being optimised in
`epinowcast`

. The model should still fit correctly unless
these warnings continue into the fitting process (outside of the warmup
phase). This is something we are actively working on improving in future
versions of `epinowcast`

.

We first plot the nowcast based on real-time data. Effectively here we are trying to use our model to estimate the hospitalisations that will ultimately be reported based on the data that is currently available. We can compare this to the most recent observations to see how well the model is doing.

Internally, this plotting function (see
`?plot.epinowcast()`

) sums observed data and posterior
estimates for unobserved data to produce a nowcast of the number of
hospitalisations that will ultimately be reported.

In this instance, the model has done a good job of estimating the number of hospitalisations that will ultimately be reported based on the data that is currently available. This is not always the case, and the model can sometimes over or under estimate the number of hospitalisations that will ultimately be reported.

An important thing to note here is that this model is clearly not perfect with more complete data being less well estimated by the model. This likely indicates some misspecification in the reporting delay model which could be fixed by making it more complex or even by making it entirely non-parametric (vs being based on a LogNormal).

We can also plot the nowcast based on the retrospective data. As this plot uses observed data where available this is effectively plotting observed data after a 30 day delay (as this was the maximum we specified) compared to data that will ultimately be observed. This can be used to get a feel for how well specified our maximum delay is. Here we see that a small number of hospitalisations will be reported after 30 days. This will impact the performance of even an ideal model as it will not be able to predict these hospitalisations and so should be accounted for in the model evaluation.

To better understand the fit of the model to data we can instead plot
the posterior predictions for the observed data. This is effectively the
same as the plot above but instead of plotting the posterior predictions
for the unobserved data we plot the posterior predictions for the
observed data. This can be used to get a feel for how well the model is
able to predict the observed data. Rather than using the build in plot
function (by changing `type = "posterior_prediction"`

) we
instead use `enw_plot_pp_quantiles()`

so we can control the
number of references dates we plot (as otherwise the plot will be quite
overwhelming!).

```
plot_select_pp_dates <- function(nowcast, dates) {
nowcast |>
summary(type = "posterior_prediction") |>
(\(x) x[reference_date %in% dates])() |>
enw_plot_pp_quantiles() +
facet_wrap(vars(reference_date), scales = "free")
}
plot_select_pp_dates(
retro_germany_nowcast,
as.Date(c("2021-05-01", "2021-05-14", "2021-06-01", "2021-07-01"))
)
```

It is clear from this plot that for these example dates the model is in the main able to predict the observed data well. However, there are also some aspects of the data that the model is less well able to predict, in particular the apparent periodicity in the reporting delay. This is likely due to the model not being able to account for the day of the week reporting periodicity. This could be fixed by adding a day of the week random effect to the reporting delay model which we demonstrate in the package README.

A key output of the model is the posterior distribution of the effective reproduction number. We can plot this for both the real-time and retrospective data to see how the estimates change as more data becomes available. Ideally, we would hope that the real-time estimates would overlap with the retrospective estimates. This would indicate that our model is able to accurately estimate hospitalisations that will be ultimately reported based on those that have already been reported.

```
get_rt_posterior <- function(nowcast, expectation = expectation_module) {
rt <- enw_posterior(nowcast$fit[[1]], variables = "r")
cols <- c("mean", "median", "q5", "q20", "q80", "q95")
rt[, (cols) := lapply(.SD, exp), .SDcols = cols]
rt <- cbind(
expectation(data = nowcast)$data_raw$r[, .(date)], rt
)
return(rt)
}
rt <- rbind(
get_rt_posterior(germany_nowcast)[, Data := "Real-time"],
get_rt_posterior(retro_germany_nowcast)[, Data := "Retrospective"]
)
ggplot(rt) +
aes(x = date, col = Data, fill = Data) +
geom_line(aes(y = median), linewidth = 1, alpha = 0.6) +
geom_line(aes(y = mean), linetype = 2) +
geom_ribbon(aes(ymin = q5, ymax = q95), alpha = 0.2, linewidth = 0.2) +
geom_ribbon(aes(ymin = q20, ymax = q80, col = NULL), alpha = 0.2) +
geom_hline(yintercept = 1, linetype = 2) +
scale_y_continuous(trans = "log") +
scale_fill_brewer(palette = "Dark2", aesthetics = c("color", "fill")) +
theme_bw() +
labs(
x = "Date of infection",
y = "Effective reproduction number"
) +
theme(legend.position = "bottom")
```