Overview
estimate_dist() fits a parametric delay distribution
from interval-censored linelist data via MCMC. The model accounts for
double interval censoring of the primary and secondary events, right
truncation due to a finite observation period, and optional left
truncation. The likelihood is provided by the primarycensored
package[1]; see its
documentation for derivations and a fuller account of why these
adjustments are needed[2,3].
The conceptual motivation in EpiNow2 is covered in
vignette("delays"), and a worked example in
vignette("estimate_dist_workflow"). For analyses needing
time-varying delays, partial pooling across strata, or regression on
covariates, epidist builds
on the same likelihood and is the recommended next step.
Data and notation
Each observation \(i = 1, \ldots,
N\) is a record of a primary event (e.g. infection) and a
secondary event (e.g. symptom onset). Both events are observed only as
intervals. Let \(P_i = [p^{-}_i,
p^{+}_i)\) be the primary event window and \(S_i = [s^{-}_i, s^{+}_i)\) be the secondary
event window. We refer to \(w^{P}_i = p^{+}_i
- p^{-}_i\) as the primary window width and to \(w^{S}_i = s^{+}_i - s^{-}_i\) as the
secondary window width. At daily resolution both are typically equal to
one.
The observed delay interval is \([d^{-}_i,
d^{+}_i) = [s^{-}_i - p^{-}_i,\, s^{+}_i - p^{-}_i)\), the
difference between the secondary window and the start of the primary
window. Each observation also has a relative observation time \(D_i\) and an optional left truncation point
\(L_i\), both measured from \(p^{-}_i\). \(D_i\) is the time at which the data were
extracted relative to the primary event start and acts as the right
truncation point. Per-observation weights \(n_i\) allow identical \((d^{-}_i, d^{+}_i, w^{P}_i, D_i)\)
combinations to be aggregated; this affects only computational
efficiency, not the likelihood.
The continuous, unobserved delay \(T_i\) between the true primary event time
\(T^{P}_i\) and the true secondary
event time \(T^{S}_i\) is assumed to
follow a parametric distribution with cumulative distribution function
\(F(\cdot \,|\, \theta)\), where \(\theta\) is the parameter vector. We denote
the primary event distribution within its window by \(f_P\). The model treats \(\theta\) as unknown and \(f_P\) as known.
Likelihood
Truncation
The model handles right truncation at \(D_i\) and optional left truncation at \(L_i\) by renormalising \(F^{*}\):
\[\begin{equation}
F^{*}_{[L_i, D_i]}(t \,|\, \theta, w^{P}_i)
= \frac{F^{*}(t \,|\, \theta, w^{P}_i) - F^{*}(L_i \,|\, \theta,
w^{P}_i)}
{F^{*}(D_i \,|\, \theta, w^{P}_i) - F^{*}(L_i \,|\, \theta,
w^{P}_i)}
\end{equation}\]
for \(L_i \leq t \leq D_i\) and zero
outside the interval. \(D_i\) may be
set to \(\infty\) to indicate no right
truncation; in that case the denominator reduces to \(1 - F^{*}(L_i \,|\, \theta, w^{P}_i)\).
\(L_i\) defaults to \(0\) and is not currently exposed at the R
interface.
Discrete observation likelihood
The probability mass for observation \(i\) is the difference of the truncated,
primary-censored CDF at the secondary window boundaries:
\[\begin{equation}
\Pr(d^{-}_i \leq T_i < d^{+}_i \,|\, \theta, w^{P}_i, L_i, D_i)
= F^{*}_{[L_i, D_i]}(d^{+}_i \,|\, \theta, w^{P}_i)
- F^{*}_{[L_i, D_i]}(d^{-}_i \,|\, \theta, w^{P}_i).
\end{equation}\]
The aggregated log-likelihood used in estimate_dist.stan
is
\[\begin{equation}
\log \mathcal{L}(\theta) = \sum_{i=1}^{N} n_i \log
\Pr(d^{-}_i \leq T_i < d^{+}_i \,|\, \theta, w^{P}_i, L_i, D_i),
\end{equation}\]
implemented as the call to primarycensored_lpmf() inside
the model block.
Untruncated approximation
When \(D_i\) is much larger than the
largest observed delay, the right truncation factor is numerically
indistinguishable from one and the renormalisation in the equation above
can be skipped. estimate_dist() applies this approximation
by setting \(D_i\) to \(\infty\) whenever \(D_i > c\, \max_j d^{+}_j\), where \(c\) is the obs_time_threshold
argument and defaults to \(2\)
following the same heuristic used by epidist. Setting
obs_time_threshold = Inf disables the approximation.
Delay families and parameterisations
The model supports five delay families, selected by the
dist argument. Parameter names match the natural
parameterisation used elsewhere in EpiNow2.
"lognormal" |
meanlog, sdlog |
\(T > 0\) |
"gamma" |
shape, rate |
\(T > 0\) |
"normal" |
mean, sd |
\(T \in \mathbb{R}\) |
"exp" |
rate |
\(T > 0\) |
"weibull" |
shape, scale |
\(T > 0\) |
For "lognormal", "gamma", and
"weibull" with a uniform primary, the primary-censored CDF
\(F^{*}\) has a closed form and is
computed analytically. The other combinations use numerical integration
over the primary window.
Priors
Priors are passed via the priors argument as a named
list of dist_spec objects, with names matching the
parameter names of the chosen family. Each prior is converted into a
Normal() density on the parameter and applied
independently. Lower bounds are imposed for parameters that must be
positive (e.g. sdlog, rate,
shape, scale); the truncation is handled by
params_lp() in
inst/stan/functions/params.stan.
The default priors are family-specific:
\[\begin{align}
\text{lognormal:}\quad \text{meanlog} &\sim \mathrm{Normal}(1, 1),
&
\text{sdlog} &\sim \mathrm{Normal}(0.5, 0.5), \\
\text{gamma:}\quad \text{shape} &\sim \mathrm{Normal}(2, 2), &
\text{rate} &\sim \mathrm{Normal}(0.5, 0.5), \\
\text{normal:}\quad \text{mean} &\sim \mathrm{Normal}(5, 5), &
\text{sd} &\sim \mathrm{Normal}(1, 1), \\
\text{exponential:}\quad \text{rate} &\sim \mathrm{Normal}(0.5,
0.5), \\
\text{Weibull:}\quad \text{shape} &\sim \mathrm{Normal}(2, 2),
&
\text{scale} &\sim \mathrm{Normal}(5, 5).
\end{align}\]
These defaults are deliberately wide and are intended as starting
points rather than fits to any particular delay. Users are encouraged to
set priors that reflect domain knowledge about the delay being
estimated; see vignette("estimate_dist_workflow") for
guidance on translating prior beliefs into the lognormal
parameterisation.