The open-source R (R Core Team 2020) package epidemia provides a framework for Bayesian, regression-oriented modeling of the temporal dynamics of infectious diseases. Typically, but not exclusively, these models are fit to areal time-series; i.e. aggregated event counts for a given population and period. Disease dynamics are described explicitly; observed data are linked to latent infections, which are in turn modeled as a self-exciting process tempered by time-varying reproduction numbers. Regression models are specified for several objects in the model. For example, reproduction numbers are expressed as a transformed predictor, which may include both covariates and autoregressive terms. A range of prior distributions can be specified for unknown parameters by leveraging the functionality of rstanarm (Goodrich et al. 2020). Multilevel models are supported by partially pooling covariate effects appearing in the predictor for reproduction numbers between multiple populations.
The mathematical framework motivating the implemented models has been described in Bhatt et al. (2020). Specific analyses using such models have appeared during the COVID-19 pandemic, and have been used to estimate the effect of control measures (Flaxman et al. 2020; Mellan et al. 2020; Olney et al. 2021), and to forecast disease dynamics under assumed epidemiological parameters and mitigation scenarios (Vollmer et al. 2020; Hawryluk et al. 2020). The modeling approach has been extended to estimate differences in transmissibility between COVID-19 lineages (Faria et al. 2021; Volz et al. 2021).
Models of infectious disease dynamics are commonly classified as either mechanistic or statistical (Myers et al. 2000). Mechanistic models derive infection dynamics from theoretical considerations over how diseases spread within and between communities. An example of this are deterministic compartmental models (DCMs) (Kermack, William Ogilvy and McKendrick 1927, 1932, 1933), which propose differential equations that govern the change in infections over time. These equations are motivated by contacts between individuals in susceptible and infected classes. Purely statistical models, on the other hand, make few assumptions over the transmission mechanism, and instead infer future dynamics from the history of the process and related covariates. Examples include Generalized Linear Models (GLMs), time series approaches including Auto Regressive Integrated Moving Average (ARIMA) (Box and Jenkins 1962), and more modern forecasting methods based on machine learning.
epidemia provides models which are semi-mechanistic. These are statistical models that explicitly describe infection dynamics. Self-exciting processes are used to propagate infections in discrete time. Previous infections directly precipitate new infections. Moreover, the memory kernel of the process allows an individual’s infectiousness to depend explicitly on the time since infection. This approach has been used in multiple previous works (Fraser 2007; Cori et al. 2013; Nouvellet et al. 2018; Cauchemez et al. 2008) and has been shown to correspond to a Susceptible-Exposed-Infected-Recovered (SEIR) model when a particular form for the generation distribution is used (Champredon, Dushoff, and Earn 2018). In addition, population adjustments may be applied to account for depletion of the susceptible population. The models are statistical in the sense that they define a likelihood function for the observed data. After also specifying prior distributions for model parameters, samples from the posterior can then be obtained using either Hamiltonian Monte Carlo or Variational Bayes methods.
The Bayesian approach has certain advantages in this context. Several aspects of these models are fundamentally unidentified (Roosa and Chowell 2019). For most diseases, infection counts are not fully observable and suffer from under-reporting (Gibbons et al. 2014). Recorded counts could be explained by a high infection and low ascertainment regime, or alternatively by low infections and high ascertainment. If a series of mitigation efforts are applied in sequence to control an epidemic, then the effects may be confounded and difficult to disentangle (Bhatt et al. 2020). Bayesian approaches using MCMC allow full exploration of posterior correlations between such coupled parameters. Informative, or weakly informative, priors may be incorporated to regularize, and help to mitigate identifiability problems, which may otherwise pose difficulties for sampling (Gelman et al. 2008; Gelman and Shalizi 2013).
epidemia’s functionality can be used for a number of purposes. A researcher can simulate infection dynamics under assumed parameters by setting tight priors around the assumed values. It is then possible to sample directly from the prior distribution without conditioning on data. This allows in-silico experimentation; for example, to assess the effect of varying a single parameter (reproduction numbers, seeded infections, incubation period). Another goal of modeling is to assess whether a simple and parsimonious model of reality can replicate observed phenomena. This helps to isolate processes helpful for explaining the data. Models of varying complexity can be specified within epidemia, largely as a result of its regression-oriented framework. Posterior predictive checks can be used to assess model fit. If the model is deemed misspecified, additional features may be considered. This could be modeling population adjustments, explicit modeling of super-spreader events (Wong and Collins 2020), alternative and over-dispersed models for the data, or more flexible functional forms for reproduction numbers or ascertainment rates. This can be done rapidly within epidemia’s framework.
Forecasting models are critical during an ongoing epidemic as they are used to inform policy decisions under uncertainty. As a sign of their importance, the United States Centers for Disease Control and Prevention (CDC) has run a series of forecasting challenges, including the FluSight seasonal forecasting challenges since 2015 (https://www.cdc.gov/flu/weekly/flusight/) and more recently the Covid-19 Forecast hub (https://covid19forecasthub.org/). Similar challenges have been run by the European Center for Disease Prevention and Control (ECDC) (https://covid19forecasthub.eu/). Long-term forecasts quantify the cost of an unmitigated epidemic, and provide a baseline from which to infer the effects of control measures. Short-term forecasts are crucial in informing decisions on how to distribute resources such as PPE or respirators, or whether hospitals should increase capacity and cancel less urgent procedures. Traditional statistical approaches often give unrealistic long-term forecasts as they do not explicitly account for population effects. The semi-mechanistic approach of epidemia combines the strengths of statistical approaches with plausible infection dynamics, and can thus be used for forecasting at different tenures.
Andersson, Håkan, and Tom Britton. 2000. Stochastic Epidemic Models and Their Statistical Analysis. Vol. 151. New York, NY: Springer New York. https://doi.org/10.1007/978-1-4612-1158-7.
Bhatt, Samir, Neil Ferguson, Seth Flaxman, Axel Gandy, Swapnil Mishra, and James A Scott. 2020. “Semi-Mechanistic Bayesian Modeling of COVID-19 with Renewal Processes.” arXiv Preprint arXiv:2012.00394. https://arxiv.org/abs/2012.00394.
Box, G. E. P., and G. M. Jenkins. 1962. “Some Statistical Aspects of Adaptive Optimization and Control.” Journal of the Royal Statistical Society: Series B (Methodological) 24 (2): 297–331. https://doi.org/10.1111/j.2517-6161.1962.tb00460.x.
Cauchemez, Simon, Alain Jacques Valleron, Pierre Yves Boëlle, Antoine Flahault, and Neil M. Ferguson. 2008. “Estimating the impact of school closure on influenza transmission from Sentinel data.” Nature. https://doi.org/10.1038/nature06732.
Champredon, David, Jonathan Dushoff, and David J. D. Earn. 2018. “Equivalence of the Erlang-distributed SEIR epidemic model and the renewal equation.” SIAM Journal on Applied Mathematics. https://doi.org/10.1137/18M1186411.
Chatzilena, Anastasia, Edwin van Leeuwen, Oliver Ratmann, Marc Baguelin, and Nikolaos Demiris. 2019. “Contemporary statistical inference for infectious disease models using Stan.” Epidemics 29 (December). https://doi.org/10.1016/j.epidem.2019.100367.
Cori, Anne. 2020. EpiEstim: Estimate Time Varying Reproduction Numbers from Epidemic Curves. https://cran.r-project.org/package=EpiEstim.
Cori, Anne, Neil M. Ferguson, Christophe Fraser, and Simon Cauchemez. 2013. “A new framework and software to estimate time-varying reproduction numbers during epidemics.” American Journal of Epidemiology. https://doi.org/10.1093/aje/kwt133.
Doremalen, Neeltje van, Trenton Bushmaker, Dylan H. Morris, Myndi G. Holbrook, Amandine Gamble, Brandi N. Williamson, Azaibi Tamin, et al. 2020. “Aerosol and Surface Stability of SARS-CoV-2 as Compared with SARS-CoV-1.” New England Journal of Medicine 382 (16). https://doi.org/10.1056/NEJMc2004973.
Faria, Nuno R., Thomas A. Mellan, Charles Whittaker, Ingra M. Claro, Darlan da S. Candido, Swapnil Mishra, Myuki A. E. Crispim, et al. 2021. “Genomics and epidemiology of the P.1 SARS-CoV-2 lineage in Manaus, Brazil.” Science, April. https://doi.org/10.1126/science.abh2644.
Flaxman, Seth, Swapnil Mishra, Axel Gandy, H Juliette T Unwin, Thomas A Mellan, Helen Coupland, Charles Whittaker, et al. 2020. “Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe.” Nature. https://doi.org/10.1038/s41586-020-2405-7.
Fraser, Christophe. 2007. “Estimating individual and household reproduction numbers in an emerging epidemic.” PLoS ONE. https://doi.org/10.1371/journal.pone.0000758.
Gelman, Andrew, Aleks Jakulin, Maria Grazia Pittau, and Yu-Sung Su. 2008. “A weakly informative default prior distribution for logistic and other regression models.” The Annals of Applied Statistics 2 (4). https://doi.org/10.1214/08-AOAS191.
Gelman, Andrew, and Cosma Rohilla Shalizi. 2013. “Philosophy and the practice of Bayesian statistics.” British Journal of Mathematical and Statistical Psychology 66 (1). https://doi.org/10.1111/j.2044-8317.2011.02037.x.
Gibbons, Cheryl L, Marie-Josée J Mangen, Dietrich Plass, Arie H Havelaar, Russell John Brooke, Piotr Kramarz, Karen L Peterson, et al. 2014. “Measuring underreporting and under-ascertainment in infectious disease datasets: a comparison of methods.” BMC Public Health 14 (1). https://doi.org/10.1186/1471-2458-14-147.
Goodrich, Ben, Jonah Gabry, Imad Ali, and Sam Brilleman. 2020. “rstanarm: Bayesian applied regression modeling via Stan.” https://mc-stan.org/rstanarm/.
Grinsztajn, Léo, Elizaveta Semenova, Charles C Margossian, and Julien Riou. 2021. “Bayesian workflow for disease transmission modeling in Stan.” http://arxiv.org/abs/2006.02985.
Groendyke, Chris, and David Welch. 2018. “<b>epinet</b> : An <i>R</i> Package to Analyze Epidemics Spread across Contact Networks.” Journal of Statistical Software 83 (11). https://doi.org/10.18637/jss.v083.i11.
Hauser, Anthony, Michel J. Counotte, Charles C. Margossian, Garyfallos Konstantinoudis, Nicola Low, Christian L. Althaus, and Julien Riou. 2020. “Estimation of SARS-CoV-2 mortality during the early stages of an epidemic: A modeling study in Hubei, China, and six regions in Europe.” PLOS Medicine 17 (7). https://doi.org/10.1371/journal.pmed.1003189.
Hawryluk, Iwona, Thomas A. Mellan, Henrique Hoeltgebaum, Swapnil Mishra, Ricardo P. Schnekenberg, Charles Whittaker, Harrison Zhu, et al. 2020. “Inference of COVID-19 epidemiological distributions from Brazilian hospital data.” Journal of the Royal Society Interface 17 (172). https://doi.org/10.1098/rsif.2020.0596.
Held, Leonhard, Michael Höhle, and Mathias Hofmann. 2005. “A statistical framework for the analysis of multivariate infectious disease surveillance counts.” Statistical Modelling 5 (3). https://doi.org/10.1191/1471082X05st098oa.
Held, Leonhard, and Michaela Paul. 2012. “Modeling seasonality in space-time infectious disease surveillance data.” Biometrical Journal 54 (6). https://doi.org/10.1002/bimj.201200037.
Höhle, Michael, and Ulrike Feldmann. 2007. “RLadyBug—An R package for stochastic epidemic models.” Computational Statistics & Data Analysis 52 (2). https://doi.org/10.1016/j.csda.2006.11.016.
Jenness, Samuel M., Steven M. Goodreau, and Martina Morris. 2018. “<b>EpiModel</b> : An <i>R</i> Package for Mathematical Modeling of Infectious Disease over Networks.” Journal of Statistical Software 84 (8). https://doi.org/10.18637/jss.v084.i08.
Kermack, William Ogilvy and McKendrick, A. G. 1927. “A contribution to the mathematical theory of epidemics.” Proceedings of the Royal Society of London. Series A, Containing Papers of a Mathematical and Physical Character. https://doi.org/10.1098/rspa.1927.0118.
———. 1932. “Contributions to the mathematical theory of epidemics. II. —The problem of endemicity.” Proceedings of the Royal Society of London. Series A, Containing Papers of a Mathematical and Physical Character 138 (834). https://doi.org/10.1098/rspa.1932.0171.
———. 1933. “Contributions to the mathematical theory of epidemics. III.—Further studies of the problem of endemicity.” Proceedings of the Royal Society of London. Series A, Containing Papers of a Mathematical and Physical Character 141 (843). https://doi.org/10.1098/rspa.1933.0106.
Liboschik, Tobias, Konstantinos Fokianos, and Roland Fried. 2017. “tscount: An R Package for Analysis of Count Time Series Following Generalized Linear Models.” Journal of Statistical Software 82 (5): 1–51. https://doi.org/10.18637/jss.v082.i05.
Mellan, Thomas A., Henrique H. Hoeltgebaum, Swapnil Mishra, Charlie Whittaker, Ricardo P. Schnekenberg, Axel Gandy, H. Juliette T. Unwin, et al. 2020. “Subnational analysis of the COVID-19 epidemic in Brazil.” https://doi.org/10.1101/2020.05.09.20096701.
Merl, Daniel, Leah R. Johnson, Robert B. Gramacy, and Marc Mangel. 2010. “<b>amei</b> : An <i>R</i> Package for the Adaptive Management of Epidemiological Interventions.” Journal of Statistical Software 36 (6). https://doi.org/10.18637/jss.v036.i06.
Meyer, Sebastian, Leonhard Held, and Michael Höhle. 2017. “Spatio-Temporal Analysis of Epidemic Phenomena Using the <i>R</i> Package <b>surveillance</b>.” Journal of Statistical Software 77 (11). https://doi.org/10.18637/jss.v077.i11.
Myers, M. F., D. J. Rogers, J. Cox, A. Flahault, and S. I. Hay. 2000. “Forecasting disease risk for increased epidemic preparedness in public health.” Advances in Parasitology 47: 309–30. https://doi.org/10.1016/s0065-308x(00)47013-2.
Nouvellet, Pierre, Anne Cori, Tini Garske, Isobel M. Blake, Ilaria Dorigatti, Wes Hinsley, Thibaut Jombart, et al. 2018. “A simple approach to measure transmissibility and forecast incidence.” Epidemics. https://doi.org/10.1016/j.epidem.2017.02.012.
Obadia, Thomas, Romana Haneef, and Pierre-Yves Boëlle. 2012. “The R0 package: a toolbox to estimate reproduction numbers for epidemic outbreaks.” BMC Medical Informatics and Decision Making 12 (1). https://doi.org/10.1186/1472-6947-12-147.
Olney, Andrew M, Jesse Smith, Saunak Sen, Fridtjof Thomas, and H Juliette T Unwin. 2021. “Estimating the Effect of Social Distancing Interventions on COVID-19 in the United States.” American Journal of Epidemiology, January. https://doi.org/10.1093/aje/kwaa293.
Paul, M., and L. Held. 2011. “Predictive assessment of a non-linear random effects model for multivariate time series of infectious disease counts.” Statistics in Medicine 30 (10). https://doi.org/10.1002/sim.4177.
Paul, M., L. Held, and A. M. Toschke. 2008. “Multivariate modelling of infectious disease surveillance data.” Statistics in Medicine 27 (29). https://doi.org/10.1002/sim.3440.
R Core Team. 2020. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Roosa, Kimberlyn, and Gerardo Chowell. 2019. “Assessing parameter identifiability in compartmental dynamic models using a computational approach: application to infectious disease transmission models.” Theoretical Biology and Medical Modelling 16 (1). https://doi.org/10.1186/s12976-018-0097-6.
Stan Development Team. 2018. “The Stan Core Library.” http://mc-stan.org/.
———. 2020. “RStan: the R interface to Stan.” https://mc-stan.org/.
Vasileios, Siakoulis. 2015. acp: Autoregressive Conditional Poisson. https://cran.r-project.org/package=acp.
Vollmer, Michaela A. C., Swapnil Mishra, H. Juliette T Unwin, Axel Gandy, Thomas A. Mellan, Valerie Bradley, Harrison Zhu, et al. 2020. “Report 20: Using mobility to estimate the transmission intensity of COVID-19 in Italy: A subnational analysis with future scenarios.” https://doi.org/10.1101/2020.05.05.20089359.
Volz, Erik, Swapnil Mishra, Meera Chand, Jeffrey C. Barrett, Robert Johnson, Lily Geidelberg, Wes R. Hinsley, et al. 2021. “Assessing transmissibility of SARS-CoV-2 lineage B.1.1.7 in England.” Nature. https://doi.org/10.1038/s41586-021-03470-x.
Wallinga, Jacco, and Peter Teunis. 2004. “Different epidemic curves for severe acute respiratory syndrome reveal similar impacts of control measures.” American Journal of Epidemiology. https://doi.org/10.1093/aje/kwh255.
Wong, Felix, and James J. Collins. 2020. “Evidence that coronavirus superspreading is fat-tailed.” Proceedings of the National Academy of Sciences 117 (47). https://doi.org/10.1073/pnas.2018490117.