The package currently supports two models both with various options. A baseline single strain model and a two strain model with flexible variable dynamics. Both share the same underlying structure with expected notifications being assumed to be a product of past expected notifications, and an exponentiated time-varying term. This term is then modelled in both instances as a differenced autoregressive process with a signal lag term. In the two strain model this process can either be modelled jointly for variants, as a correlated process, or independently. By default it is assumed to vary inline with the timestep of the data but this can be altered to speed up computation or to improve out of sample performance. Observed cases are estimated from expected cases by assuming either a negative binomial or Poisson observation model. In the two strain model variant of concern positive samples are estimated using a Beta-binomial or binomial observation model. Detailed model definitions are given below. Prior values when given represent the package defaults and can, and generally should, altered by the user based on their domain specific knowledge.
We model the mean (λt) of reported cases (Ct) as an order 1 autoregressive (AR(P)) process by time unit (t). The model is initialised by assuming that the initial reported cases are representative with a small amount of error (2.5%) for each t < P.
Where rt can be
interpreted as the growth rate and the exponential of rt as the
effective reproduction number (Rt) assuming a
mean generation time equal to the scaling rate used (see the
documentation for scale_r
in ?forecast()
).
rt is then
itself modelled as a piecewise constant differenced AR(1) process,
Where,
We then assume a negative binomial observation model with overdispersion ϕc for reported cases (Ct),
Where σ, and $\frac{1}{\sqrt{phi_c}}$ are truncated to be
greater than 0 and β is
truncated to be between -1 and 1. Optionally a Poisson observation model
may instead be used (see the documentation for
overdispersion
in ?forecast()
).
The stan code for this model available here
or can be loaded into R
using the following code,
We model strain dynamics using the single strain model as a starting point but with the addition of strain specific AR(P) case model and a beta binomial (or optionally binomial) observation process for sequence data. The variant of concerns growth rate is modelled either as a fixed scaling of the non-variant of concern growth rate, as an independent AR(1) process, or in a vector autoregression framework as a correlated AR(1) process. This last formulation is described in detail below along with the modifications required to define the other two formulations. Parameters related to the variant of concern (VoC) are given the δ superscript and parameters related to non-VoC cases are given the o superscript.
Mean reported cases are again defined using a AR(1) process on the log scale for each strain and then combined for overall mean reported cases.
Where C0δ is derived by calculating the mean proportion of cases that had the VoC for the first time point using the overall number of reported cases, the number of sequenced cases, and the number of sequences that were positive for the VoC. The growth rate for both VoC and non-VoC cases (rto, δ) is again modelled as differenced AR(1) processes but now combined with a variant specific modifier (s0o, δ) to growth when variant sequences are first reported (tseq), and a correlation structure for the time and strain varying error terms (ηo, δ).
Where,
Where Σ is a 2 × 2 covariance matrix which we decompose for computational stability into a diagonal matrix containing variant specific scale parameters (Δ) and a symmetric correlation matrix (Ω) as follows[1],
Where Ω has a Lewandowski-Kurowicka-Joe (LKJ) prior where ω controls the prior expectation for correlation between variants (with ω = 1 resulting in a uniform prior over all correlations, ω < 1 placing more weight on larger correlations and ω > 1 placing more weight on small amounts of correlations). By default we set ω = 0.5 giving more weight to correlations between variants dynamics.
On top of this correlated strain dynamic model
forecast.vocs
also offers two other options that represent
extremes in the correlated model. The first of these assumes that both
strains evolve with independent differenced AR(1) growth rates with only
a scaling factor (s^{}) linking them. The second assumes that strains
growth rates are completely correlated in time (i.e they are governed by
a single AR(1) differenced process) and only differ based on a scaling
factor (sδ). See the
documentation for variant_relationship
in
?forecast()
for details.
We then assume a negative binomial observation model with overdispersion ϕc for reported cases (Ct) as in the single strain model,
Where σ, and $\frac{1}{\sqrt{phi_c}}$ are truncated to be
greater than 0 and β is
truncated to be between -1 and 1. Again a Poisson observation model may
instead be used (see the documentation for overdispersion
in ?forecast()
).
Finally, the mean proportion of samples that have the VoC (pt) is then estimated using the mean reported cases with the VoC and the overall mean reported cases.
We assume a beta binomial observation model for the number of sequences (Nt) that are positive (Pt) for the VoC with overdispersion ϕs.
Where σo, δ,
and $\frac{1}{\sqrt{\phi_s}}$ are
truncated to be greater than 0. A binomial observation model is also
available (see the documentation for overdispersion
in
?forecast()
).
The stan code for this model available here
or can be loaded into R
using the following code,
As well as posterior predictions and forecasts for both notifications by variant and variant of concern proportion the models also return several summary statistics which may be useful for drawing inferences about underlying transmission dynamics. These include the log scale growth rate (gto, δ), the instantaneous effective reproduction number (Rto, δ), and the transmission advantage of the variant of concern (At). These are calculated as follows:
Ts is a
user set scaling parameter that defines the timespan over which the
summary metrics apply dependent on the time step of the data. It can be
set using the scale_r
and defaults to 1 which returns
summary statistics scaled to the timestep of the data. Depending on the
setup of the model used these summary measures will be more or less
related to their epidemiological definitions. In particular, adding a
weighting to past expected cases that is more complex than a simple lag
may cause interpretation issues.
The models are implemented in stan
using
cmdstanr
with no defaults altered[1,2]. Due to the complexity of the
posterior it is likely that increasing the adapt_delta
may
be required to mitigate potential bias in posterior estimates[3]. forecast.vocs
incorporates additional functionality written in R[4] to enable plotting forecasts and
posterior predictions, summarising forecasts, and scoring them using
scoringutils
[5].
All functionality is modular allowing users to extend and alter the
underlying model whilst continuing to use the package framework.