1 Partial Pooling in epidemia


We describe how to partially pool parameters underlying the reproduction numbers. This is done using a special operator in the formula passed to epirt(). If you have previously used any of the lme4, nlmer, gamm4, glmer or rstanarm packages then this syntax will be familiar.

A general R formula is written as y ~ model, where y is the response that is modeled as some function of the linear predictor which is symbolically represented by model. model is made up of a series of terms separated by +. In epidemia, as in many other packages, parameters can be partially pooled by using terms of the form (expr | factor), where both expr and factor are R expressions. expr is a standard linear model (i.e. treated the same as model), and is parsed to produce a model matrix. The syntax (expr | factor) makes explicit that columns in this model matrix have separate effects for different levels of the factor variable.

Of course, separate effects can also be specified using the standard interaction operator :. This however corresponds to no pooling, in that parameters at different levels are given separate priors. The | operator, on the other hand, ensures that effects for different levels are given a common prior. This common prior itself has parameters which are given hyperpriors. This allows information to be shared between different levels of the factor. To be concrete, suppose that the model matrix parsed from expr has \(p\) columns, and that factor has \(L\) levels. The \(p\)-dimensional parameter vector for the \(l\)th group can be denoted by \(\theta_l\). In epidemia, this vector is modeled as multivariate normal with an unknown covariance matrix. Specifically, \[\begin{equation} \theta_{l} \sim N(0, \Sigma), \end{equation}\] where the covariance \(\Sigma\) is given a prior. epidemia offers the same priors for covariance matrices as rstanarm; in particular the decov() and lkj() priors from rstanarm can be used. Note that \(\Sigma\) is not assumed diagonal, i.e.  the effects within each level may be correlated.

If independence is desired for parameters in \(\theta_l\), we can simply replace (expr | factor) with (expr || factor). This latter term effectively expands into \(p\) terms of the form (expr_1 | factor), \(\ldots\), (expr_p | factor), where expr_1 produces the first column of the model matrix given by expr, and so on. From the above discussion, the effects are independent across terms, and essentially \(\Sigma\) is replaced by \(p\) one-dimensional covariance matrices (i.e. variances).

1.1 Example Formulas


The easiest way to become familiar with how the | operator works is to see a multitude of examples. Here, we give many examples, their interpretations, and where possible we compare the models to the no pooling and full pooling equivalents. For a comprehensive reference on mixed model formulas, please see Bates et al. (2015).

There are many possible ways to specify intercepts. Table 1.1 demonstrates some of these, including fully pooled, partially pooled and unpooled. Effects may also be partially pooled. This is shown in Table 1.2.

Table 1.1: Different intercept specifications. The intercept often has an interpretation as setting \(R_0\) in each region. The left hand side of each formula is assumed to take the form R(region, date).
Formula R.H.S. Interpretation
1 + ... Full pooling, common intercept for all regions.
region + ... Separate intercepts for each region, not pooled.
(1 | region) + ... Separate intercepts for each region which are partially pooled.
(1 | continent) + ... Separate intercepts based on a factor other than region, partially pooled.
Table 1.2: Different covariate specifications. Here NPI refers to some non-pharamceutical intervention. The left hand side of each formula is assumed to take the form R(region, date).
Formula R.H.S. Interpretation
1 + npi + ... Full pooling. Effect of NPI the same across all regions.
1 + npi:region + ... No pooling. Separate effect in each region.
1 + (0 + npi|region) + ... Partial pooling. Separate effects in each region.
1 + (npi|region) + ... Right hand side expands to 1 + (1 + npi|region), and so both the intercept and effect are partially pooled.

The final example in Table 1.2 shows that it is important to remember that to parse the term (expr | factor), epim() first parses expr into a model matrix in the same way as functions like lm() and glm() parse models. In this case, the intercept term is implicit. Therefore, if this is to be avoided, we must explicitly use either (0 + npi | region) or (-1 + npi | region).

1.1.1 Independent Effects

By default, the vector of partially pooled intercepts and slopes for each region are correlated. The || operator can be used to specify independence. For example, consider a formula of the form

R(region, date) ~ npi + (npi || region) + ...

The right hand side expands to 1 + npi + (1 | region) + (npi | region) + .... Separate intercepts and effects for each region which are partially pooled. The intercept and NPI effect are assumed independent within regions.

1.1.2 Nested Groupings

Often groupings that are nested. For example, suppose we wish to model an epidemic at quite a fine scale, say at the level of local districts. Often there will be little data for any given district, and so no pooling will give highly variable estimates of reproduction numbers. Nonetheless, pooling at a broad scale, say at the country level may hide region specific variations.

If we have another variable, say county, which denotes the county to which each district belongs, we can in theory use a formula of the form

R(district, date) ~ (1 | county / district) + ...

The right hand side expands to (1 | county) + (1 | county:district). There is a county level intercept, which is partially pooled across different counties. There are also district intercepts which are partially pooled within each county.

2 References


Bates, Douglas, Martin Mächler, Ben Bolker, and Steve Walker. 2015. “Fitting Linear Mixed-Effects Models Using Lme4.” Journal of Statistical Software 67 (1): 1–48. https://doi.org/10.18637/jss.v067.i01.