Well-known British statistician George Box famously said all models are wrong, but some are useful. Here’s what he meant — and why you may find disease modeling for the novel coronavirus so confusing.
Disease models help us understand the behavior of infectious disease in a population by providing projections and forecasts. Projections allow us to see what might happen under different conditions — for example, how modifying our behavior through social distancing might change when and how many people get sick. It is often a good thing when observed reality is different from a model’s projection because it means that we were able to control the course of disease with an intervention.
Forecasts are different from projections in that forecasts tell us what we can anticipate will happen given current conditions, much like a weather forecast. Perhaps even more importantly, traditional epidemiologic models are highly useful for providing a framework for understanding very complex problems and prioritizing the areas that are most important to focus attention, resources, and research.
Approaches to disease modeling
There are several ways to approach disease modeling, depending on the information you have and the questions you are interested in asking. A common output of a disease model is an epidemic curve (Figure 1) that shows when and how many people get sick or die.
Many of the models produced for COVID-19 are based on the traditional epidemiologic compartment model of disease, which divides a population into discrete boxes (“compartments”) and determines the rate of change between disease states. Many such models are also called SIR or SEIR models based on the labels of the compartments (S = susceptible, E = exposed, I = infectious, R = recovered or removed).1 The compartments are very flexible in that they can be subdivided, or new ones added, to capture important disease characteristics, such as the proportion of infectious individuals who require isolation, hospitalization, or ventilation. Differential equations are then used to calculate the rates of change, typically represented by Greek letters, between compartments as a disease progresses through a population in order to generate an epidemiologic curve (Figure 2).
Compartment models are very helpful to understand which variables that affect disease behavior are of greatest importance; however, they require good input data to generate accurate projections. This can be a challenge for a novel disease such as coronavirus when many input variables are not well understood. Another potential drawback is that they use a “mass action” principle to explain disease spread. (The mass action principle can be understood basically as a bunch of balls bouncing around inside a compartment. If an “infectious” ball bounces into a “susceptible” ball, the susceptible ball becomes an infectious ball, which can in turn infect other susceptible balls until recovered.2)
Although this type of modeling works relatively well at the population level, it may not be an accurate representation of what actually happens with infectious diseases such as COVID-19 for which social networks and super-spreader events appear to play a large role in transmission. Examples of compartment models are Youyang Gu (SEIS; S = susceptible, E = exposed, I = infectious, S = susceptible again), Columbia (SEIR), and Northeastern University (SLIR; S = susceptible, L = latent, I = infectious, R = removed).
Other models attempt to compensate for the lack of knowledge about specific variables by fitting a curve to data observed from other countries further along in the disease process, and then applying this curve to the current pattern of data of a given country or state in which the disease process has not yet progressed as much. The most well-known model using this approach is the original model from the Institute for Health Metrics and Evaluation (IHME).3 It used the starting dates of restrictions on mass gatherings, school closings and stay-at-home orders to model projections in the U.S based on other countries such as China and South Korea (Figure 3).
A strength of this model is that it shows both model uncertainty as well as the actual data observed. It is important to note that the intervals around the estimate do not represent the best and worst case estimates as is commonly assumed, but rather the range of uncertainty around the model’s estimate given what is a fairly best case scenario (i.e., that the impact of the distancing interventions in the U.S. will approximate that of the more extreme China lockdowns or the more sophisticated South Korean contact tracing).
What is the Association for Veterinary Informatics?
Veterinary clinical informatics blends information technology, communications, social and behavioral science, and veterinary medicine to improve the quality and safety of patient care. Founded in 1981, the Association for Veterinary Informatics (AVI) is a nonprofit international and interdisciplinary organization comprised of individuals involved in biomedical informatics research, design, implementation, education and advocacy within the domain of veterinary medicine.
The group’s mission is to guide and transform the veterinary profession in understanding, using and extending the practice of informatics. AVI members have access to exclusive member content, including news, a member directory, past symposia details, and forums. Click here to learn more.
A final class is agent-based models (ABMs), also known as individual-based models. ABMs compensate for the weakness of compartment models, which assume disease transmission through the mass action principle, by simulating the operations and interactions of individuals within a population. Multiple simulations are run with each agent following relatively simple rules and with some random variation to generate emergent results based on the interactions and actions of individuals.
These models can incorporate important variables such as age, household composition, environmental changes (change in temperature and humidity in the summer), social interactions and behavioral changes that individuals may make in response to evolving circumstances (such as staying home more often if they observe more deaths in their community, or less often if they do not). Although agents’ actions are based on simple rules, they can capture unexpected aggregate phenomena that result from their interactions with other agents and the environment around them.
ABMs are quite computationally intensive, and they require significant data on environment and behavior, such as the density of different types of workplaces. The most well-known COVID-19 ABM is from London’s Imperial College; this model investigated non-pharmaceutical interventions in COVID-19 and was used to guide early policy for the U.K. and U.S (Figure 4).4
How do I know if I can trust a model’s output?
To understand any model, you must know the assumptions on which it is based, such as how much contact people are assumed to have with affected individuals and how likely they are to die if infected. Knowing the assumptions of a model is like reading the materials and methods of a study — you cannot fully understand or trust the conclusion without it.
The assumptions on which the model is based should be readily available. For models that use data inputs, such as compartment models, you should know what sources provided the values and what those values were. Models should provide a sensitivity analysis, which tests how the model output changes when the underlying data or assumptions are changed. If there are variables for which small changes in the input value lead to large changes in the output of the model, those variables should be examined carefully. Models should also be validated or calibrated to see if the projections match what we have seen actually happen. It is important to note that during an epidemic, accurate data collection can be challenging, and reported data against which models are validated may themselves be imperfect. For example, deaths are often undercounted in the beginning of an epidemic and are revised later to reflect the actual, higher number. Models are dynamic and are updated as knowledge and data change, and as our interventions change the course of the disease.
Which type of model is the best?
None of these approaches to modeling is a perfect representation of the real world. In other words, all models are wrong. The good news is that models do not need to be perfect to be useful.5 All approaches to modeling have strengths and weaknesses, and the “best” model typically depends on the question you are trying to answer and the quality of the available data. For forecast models, a combination of forecasts to create an ensemble or consensus model may be the best approach. This is similar to the “spaghetti” models6 often used to forecast the likely path of hurricanes. Models that have been more accurate in the past often are given greater weight. For example, the Centers for Disease Control and Prevention generates an ensemble model7 that combines nine different types of models (Figure 5), and Reich Lab has an interactive real-time model.8
So, why are there so many different models for the novel coronavirus? Because there are many questions of interest, many ways to answer those questions, and a lot of unknown data about the disease. All types of models can generate misleading results, and it is critical, when considering the quality of the model, to understand the assumptions it makes and the sources and reliability of the input data. Each individual model helps us to understand the whole better. The models are not competitive with each other; rather, they are dynamic snapshots that complement one another. Models provide us a range of possibilities based on our best guess of where we are now to help us prepare for the future or change our behavior to help control our future.
Dr. Dennis owns and practices full-time at Stratham-Newfields Veterinary Hospital in New Hampshire. She is currently the AAVSB representative to the AVMA’s American Board of Veterinary Specialties, chair of the AAHA Veterinary Informatics Committee and Diagnostic Terms Editorial Board, a peer reviewer for JAAHA, chair of the Al Hahn Award for Lifetime Achievement in Veterinary Informatics, and a member of the Outreach and Credentials Committees with ABVP. Dr. Dennis lectures nationally about topics including ethics exhaustion and informatics.
Ms. Piché was recently graduated with a degree in biomedical science from the University of New Hampshire. She is currently working in companion animal practice while exploring her options for pursuing a graduate or professional degree.
Dr. Kreisler, current president of the Association of Veterinary Informatics, is an associate professor at Midwestern University, where she teaches shelter and community medicine, public health, epidemiology, appraisal of the veterinary literature, and practice ownership. Her work outside the classroom includes driving a 33-foot mobile veterinary clinic, providing surgical and medical services to shelter and community animals, investigating disease outbreaks, and researching questions relevant to clinical decision making.
Bradhurst RA, Roche SE, East IJ, et el. A hybrid modeling approach to simulating foot-and-mouth disease outbreaks in Australian livestock. Front Environ Sci. 2015;3. doi: 10.3389/fenvs.2015.00017
Ferguson NM, Laydon D, Nedjati-Gilani G, et al. Impact of non-pharmaceutical interventions (NPIs) to reduce COVID-19 mortality and healthcare demand. Imperial College London. Published March 16, 2020. Accessed June 22, 2020.https://doi.org/10.25561/77482