Here we demonstrate basic usage of epidemia; in particular we infer time-varying reproduction numbers during the 1918 H1N1 Influenza pandemic in Baltimore using only daily case data and a discrete serial interval. \(R_t\) is modeled as following a weekly random walk. Of course epidemia is capable of much more complex modeling, and such cases are considered in future sections.
In addition to inferring reproduction numbers and seeded infections, we show how to do posterior predictive checks and how to extend the model to treats latent infections as parameters.
First use the following command.
This ensure that chains are run in parallel rather than sequentially. The data for this example is provided by the R package EpiEstim.
## $incidence ##  5 1 6 15 2 3 8 7 2 15 4 17 4 10 31 11 13 36 13 ##  33 17 15 32 27 70 58 32 69 54 80 405 192 243 204 280 229 304 265 ##  196 372 158 222 141 172 553 148 95 144 85 143 87 73 70 62 116 44 38 ##  60 45 60 27 51 34 22 16 11 18 11 10 8 13 3 3 6 6 13 ##  5 6 6 5 5 1 2 2 3 8 4 1 2 3 1 0 ## ## $si_distr ##  0.000 0.233 0.359 0.198 0.103 0.053 0.027 0.014 0.007 0.003 0.002 0.001
First we form the
data argument for the model fitting function
epim(). Recall that this must contain all observations (case data), and
covariates to fit the model. Since \(R_t\) will follow a weekly random walk, the
only covariate required is a
date is constructed so that the first observations are seen on the second
day of the epidemic rather than the first. This is so that the first observation
can be explained by seeded infections on the previous day.
We need to specify models for \(R_t\) and the case data. \(R_t\) is modeled through
a call to
epirt(). The parameterisation of \(R_t\) is defined by the
formula argument. A random walk can be specified using the
function. This has an optional
time argument which tells epidemia which
data to use to define the random walk increments. If unspecified it
occurs at the same frequency as \(R_t\), i.e. a daily random walk is used. The random
walk error is modeled as half-normal with a scale hyperparameter. The value of
this is set using the
prior_scale argument. This is used in the snippet
In this case the prior on the intercept specifies the prior on \(R_0\). Since a log link is being used, the prior above gives \(R_0\) a prior mean of around 2.
Each observation vector is modeled through a call to
epiobs(). These are
then collected into a list and passed to
epim() as the
argument. In this case only case data is used and so their is only one call
epiobs() in the list.
For the purposes of this exercise we have assumed that roughly all infections
manifest as a case. There is a small symmetric error around this to account for
possible over/under-reporting. The
i2o argument implies that cases are
recorded with equal probability in any of the four days post infection.
Finally, we collect all remaining arguments required for
gen takes a discrete generation distribution. Here we have
used the serial interval provided by EpiEstim.
algorithm = "sampling"
ensures that the model is fit using Stan’s adaptive HMC sampler.
seed set the number of MCMC iterations and seeds respectively, and are passed
directly on to
fm1 <- do.call(epim, args)
The fitted model
fm1 does not treat infections as latent parameters, and
propagates infections using the renewal process. To extend
the model as described in Section 5.2 of the
model description, we simply use
latent = TRUE in
the call to
The results of both models are shown in 1. The key difference stems from the infection process. In the basic model, infections are deterministic given the reproduction numbers and seeds. However when infection counts are low, we generally expect high variance in the infection process. Since this variance is unaccounted for, the model appears to place too much confidence in the reproduction numbers in this setting. The extended model on the other hand, has much wider credible intervals for \(R\) when infections are low. This is intuitive: when counts are low, changes in infections could be explained by either the variance in the offspring distribution of those who are infected, or by changes in the \(R\) value. This model captures this intuition.
Posterior predictive checks are shown in the bottom panel of Figure 1, and show that both models are able to fit the data well.