First, the positive. It has been heartening to see the emergence of data sites like Worldometer and The Verge, which have promoted interest and attention in data-driven analysis of the COVID-19 outbreak. It’s also been encouraging to see “data science-y” phrases, such as “flattening the curve,” become part of our everyday language. (View the Data Literacy Project video on the topic.) News organizations, too, have been publishing useful comparison charts and graphs of the outbreak, examining the outbreak across countries and looking at how COVID-19 relates to other virus breakouts and conventional causes of death.
All of these developments are useful in forming a baseline understanding of how we process this outbreak. My favorite graph is one by 91-DIVOC (below), which has been published by many news outlets. Graphs like these are getting the public grounded in evaluating the risk of COVID-19 exposure and understanding its global prevalence.
Image Credit: 91-DIVOC
But, the news is not all good.
What I have noticed, with some concern, is how difficult it has been for the
media to convey (what should be) a simple metric: What is the mortality
rate of COVID-19? A recent New
York Times article attempted to explain why this has been such a challenge,
noting varying population densities and inconsistencies in public health
reporting practices as contributing to the confusion. Although the article implies
that many factors conspire to make a definitive answer unknowable (at least in
the short term), I found some charts in the media that nevertheless tried to
provide an answer. To be clear, this is not an exercise in picking apart any
organization’s presentation of data, so much as it is an attempt to point out
how we can improve our ability to interpret charts rigorously and critically.
Image Credit: The New York Times
The New York Times published this chart (above) in a February 28 article, indicating that the projected fatality rate of the virus would be “below 3%.” Note the early date of the article. This analysis, it would seem, relied on limited data supplied by the Chinese government. Also, note the lack of specific sources (i.e., “many estimates”) and that the Y-axis is a logarithmic scale, which provides a visual that people could misinterpret if they don’t read the chart closely.
Image Credit: MPR NEWS
On March 11, MPR News provided this assessment (above). It’s helpful that this chart does not use a logarithmic scale, but note the figure of 1.4% sourced from, among others, the World Health Organization (WHO). This figure can be confusing, as it appears to conflict with what the WHO states in one of its COVID-19 situation reports published March 6. Here is a quote: “Mortality for COVID-19 appears higher than for influenza, especially seasonal influenza. While the true mortality of COVID-19 will take some time to fully understand, the data we have so far indicate that the crude mortality ratio (the number of reported deaths divided by the reported cases) is between 3-4% [...].” Again, I don’t fault any particular organization. Rather, I would like to ask that readers attempt to challenge what is being presented to them.
Image Credit: Worldometer; note this chart is updated frequently
Now, take this chart (above) from Worldometer related to daily new cases of the virus in Iran. How should we interpret this? It looks like there have been three ebbs in new virus cases, only to be followed by two spikes. Does this mean that we should expect a new surge? Is our “flatten the curve” bias causing us to believe that there is a retreat in new virus cases when the evidence does not support this? Consider these, along with other assumptions we may have about the virus, such as its ability to mutate, our susceptibility to contract the same or a mutated strain of the virus, or the influence of the warmer months and varying pandemic response strategies.
In short, how data is presented can be confusing, and we must be wary about letting our eyes and our preconceptions from getting the better of us. As champions of data literacy and analytics, we need to help in every way possible inside our communities and across our physical and virtual world to make sound policy decisions based on a collective understanding of the situation uninfluenced by political ideology or our gut feelings. The process, which I’ve outlined in six steps below in question form, should look familiar to anyone who has worked in data before.
- What are the core definitions of data that we care about?
- What are the definitions of our common measures and KPIs?
- What are our assumptions about the data?
- What are the clear policy priorities that we wish to enable?
- What statistical models fit most closely with what we are experiencing?
- What should we do?
Following these steps is essential to making appropriate recommendations. While many of us may feel tempted to immediately jump to step six, let’s remember that we first must have a clear understanding of the situation before we can act – and that can only happen if we act as data literate citizens.
Click here to learn more about how Qlik defines data literacy and why it matters.