Here we describe how to partially pool parameters underlying the reproduction
numbers in epidemia. This is done using a special operator in the formula passed to
epirt()
. If you have previously used any of the lme4,
nlmer, gamm4, glmer or rstanarm packages then this
syntax will be familiar.
A general R formula is written as y ~ model
, where y
is the response that
is modeled as some function of the linear predictor which is symbolically
represented by model
. model
is made up of a series of terms separated by
+
. In epidemia, as in many other packages, parameters can be partially pooled
by using terms of the form (expr | factor)
, where both expr
and factor
are
R expressions. expr
is a standard linear model (i.e. treated the same as
model
), and is parsed to produce a model matrix. The syntax (expr | factor)
makes explicit that columns in this model matrix have separate effects for
different levels of the factor variable.
Of course, separate effects can also be specified using the standard interaction
operator :
. This however corresponds to no pooling, in that parameters
at different levels are given separate priors. The |
operator, on the other
hand, ensures that parameters at different levels are given a common prior, which
is itself parameterised by parameters which are given hyperpriors. This allows
information to be shared between levels of the factor. To be concrete, suppose
that the model matrix parse from expr
has \(p\) columns,
and that factor
has \(L\) levels. The \(p\)-dimensional parameter vector for
the \(l\)^{th} group can be denoted by \(\theta_l\). In epidemia, this vector
is modeled as multivariate normal with an unknown covariance matrix. Specifically,
\[\begin{equation}
\theta_{l} \sim N(0, \Sigma),
\end{equation}\]
where the covariance \(\Sigma\) is given a prior. epidemia offers the same
priors for covariance matrices as rstanarm; in particular the rstanarm::decov
and
rstanarm::lkj
priors can be used. Note that \(\Sigma\) is not assumed diagonal, i.e.
the effects within each level may be correlated. For more information on priors
for covariance matrices, please see this article.
If independence is desired for parameters in \(\theta_l\), we can simply replace
(expr | factor)
with (expr || factor)
. This latter term effectively
expands into \(p\) terms of the form (expr_1 | factor)
, \(\ldots\), (expr_p | factor)
,
where expr_1
produces the first column of the model matrix given by expr
,
and so on. From the above discussion, the effects are independent across terms,
and essentially \(\Sigma\) is replaced by \(p\) one-dimensional covariance matrices
(i.e. variances).
The easiest way to become familiar with how the |
operator works is to
see a multitude of examples. Here, we give many examples, their interpretations,
and where possible we compare the models to the no pooling and full pooling equivalents.
For a comprehensive reference on mixed model formulas, please see the
lme4 vignette.
There are many possible ways to specify intercepts.
R(region, date) ~ 1 + ...
: Full pooling, common intercept for all regionsR(region, date) ~ region + ...
: Separate intercepts for each region, not pooled.R(region, date) ~ (1 | region) + ...
Separate intercepts for each region which are partially pooled.R(region, date) ~ (1 | continent) + ...
Separate intercepts based on a factor other than region
, partially pooled.We can also partially pool effects of covariates. Let npi
be some indicator
of a non-pharmaceutical intervention.
R(region, date) ~ 1 + npi + ...
: Full pooling. Effect of NPI the same across all regions.R(region, date) ~ 1 + npi:region + ...
: No pooling. Separate effect in each regionR(region, date) ~ 1 + (0 + npi|region) + ...
: Partial pooling. Separate effects in each region.R(region, date) ~ 1 + (npi|region) + ...
: Right hand side expands to 1 + (1 + npi|region)
, and so
both the intercept and effect are partially pooled.The final example above shows that it is important to remember that to parse the
term (expr | factor)
, epim()
first parses expr
into a model matrix
in the same way as functions like lm
and glm
parse models. In this case,
the intercept term is implicit. Therefore, if this is to be avoided, we must
explicitly use 0 +
or -1 +
.
By default, the vector of partially pooled intercepts and slopes for each region
are correlated. The ||
operator can be used to specify independence. An example:
R(region, date) ~ npi + (npi || region) + ...
: Right hand side expands to
1 + npi + (1 | region) + (npi | region) + ...
. Separate intercepts and
effects for each region which are partially pooled. The intercept and NPI effect
are assumed independent within regions.Often we have groupings that are nested. For example, suppose we are modeling an epidemic at quite a fine scale, say at the level of local districts. Often there will be little data for any given district, and so no pooling will give highly variable estimates of reproduction numbers. Nonetheless, pooling at a broad scale, say at the country level may hide region specific variations.
If we have another variable, say county
, which denotes the county to which
each district belongs, we can in theory use a formula of the form
R(district, date) ~ (1 | county / district) + ...
The right hand side expands to (1 | county) + (1 | county:district)
.
There is a county level intercept, which is partially
pooled across different counties. There are also district intercepts
which are partially pooled within each county.