Here we demonstrate basic usage of **epidemia**; in particular we infer time-varying
reproduction numbers during the 1918 H1N1 Influenza pandemic in Baltimore using
only daily case data and a discrete serial interval. \(R_t\) is modeled as
following a weekly random walk. Of course **epidemia** is capable of much more
complex modeling, and such cases are considered in future sections.

In addition to inferring reproduction numbers and seeded infections, we show how to do posterior predictive checks and how to extend the model to treats latent infections as parameters.

First use the following command.

options(mc.cores = parallel::detectCores())

This ensure that chains are run in parallel rather than sequentially. The data for this example is provided by the R package EpiEstim.

```
## $incidence
## [1] 5 1 6 15 2 3 8 7 2 15 4 17 4 10 31 11 13 36 13
## [20] 33 17 15 32 27 70 58 32 69 54 80 405 192 243 204 280 229 304 265
## [39] 196 372 158 222 141 172 553 148 95 144 85 143 87 73 70 62 116 44 38
## [58] 60 45 60 27 51 34 22 16 11 18 11 10 8 13 3 3 6 6 13
## [77] 5 6 6 5 5 1 2 2 3 8 4 1 2 3 1 0
##
## $si_distr
## [1] 0.000 0.233 0.359 0.198 0.103 0.053 0.027 0.014 0.007 0.003 0.002 0.001
```

First we form the `data`

argument for the model fitting function
`epim()`

. Recall that this must contain all observations (case data), and
covariates to fit the model. Since \(R_t\) will follow a weekly random walk, the
only covariate required is a `week`

column.

date <- as.Date("1918-01-01") + seq(0, along.with = c(NA, Flu1918$incidence)) data <- data.frame( city = "Baltimore", cases = c(NA, Flu1918$incidence), date = date, week = week(date) )

`date`

is constructed so that the first observations are seen on the second
day of the epidemic rather than the first. This is so that the first observation
can be explained by seeded infections on the previous day.

We need to specify models for \(R_t\) and the case data. \(R_t\) is modeled through
a call to `epirt()`

. The parameterisation of \(R_t\) is defined by the
`formula`

argument. A random walk can be specified using the `rw()`

function. This has an optional `time`

argument which tells **epidemia** which
column in `data`

to use to define the random walk increments. If unspecified it
occurs at the same frequency as \(R_t\), i.e. a daily random walk is used. The random
walk error is modeled as half-normal with a scale hyperparameter. The value of
this is set using the `prior_scale`

argument. This is used in the snippet
below.

rt <- epirt( formula = R(city, date) ~ rw(time = week, prior_scale = 0.1), prior_intercept = normal(log(2), 0.2), link = 'log' )

In this case the prior on the intercept specifies the prior on \(R_0\). Since a log link is being used, the prior above gives \(R_0\) a prior mean of around 2.

Each observation vector is modeled through a call to `epiobs()`

. These are
then collected into a list and passed to `epim()`

as the `obs`

argument. In this case only case data is used and so their is only one call
to `epiobs()`

in the list.

obs <- epiobs( formula = cases ~ 1, prior_intercept = rstanarm::normal(location=1, scale=0.01), link = "identity", i2o = rep(.25,4) )

For the purposes of this exercise we have assumed that roughly all infections
manifest as a case. There is a small symmetric error around this to account for
possible over/under-reporting. The `i2o`

argument implies that cases are
recorded with equal probability in any of the four days post infection.

Finally, we collect all remaining arguments required for `epim()`

.

args <- list( rt = rt, inf = epiinf(gen = Flu1918$si_distr), obs = obs, data = data, iter = 1e3, seed = 12345 )

The argument `gen`

takes a discrete generation distribution. Here we have
used the serial interval provided by EpiEstim. `algorithm = "sampling"`

ensures that the model is fit using Stan’s adaptive HMC sampler. `iter`

and
`seed`

set the number of MCMC iterations and seeds respectively, and are passed
directly on to `rstan::sampling`

.

fm1 <- do.call(epim, args)

The fitted model `fm1`

does not treat infections as latent parameters, and
propagates infections using the renewal process. To extend
the model as described in Section 5.2 of the
model description, we simply use `latent = TRUE`

in
the call to `epiinf()`

.

args$inf <- epiinf(gen = Flu1918$si_distr, latent=TRUE, prior_aux = normal(10,2)) fm2 <- do.call(epim, args)

The results of both models are shown in 1. The key difference stems from the infection process. In the basic model, infections are deterministic given the reproduction numbers and seeds. However when infection counts are low, we generally expect high variance in the infection process. Since this variance is unaccounted for, the model appears to place too much confidence in the reproduction numbers in this setting. The extended model on the other hand, has much wider credible intervals for \(R\) when infections are low. This is intuitive: when counts are low, changes in infections could be explained by either the variance in the offspring distribution of those who are infected, or by changes in the \(R\) value. This model captures this intuition.

Posterior predictive checks are shown in the bottom panel of Figure 1, and show that both models are able to fit the data well.