COVID-19 Is Holding A Mirror Up To Our Data Literacy Skills

How Do We Look?

Last week, Mike Potter – Qlik’s CTO – was describing how evaluating the COVID-19 outbreak is an interesting exercise in data literacy. Qlik defines data literacy as the ability to read, work with, analyze and communicate with data. Mike was curious about how the public at large consumed the growing mass of available data and how it would affect our societal and governmental decision making.

First, the positive. It has been heartening to see the emergence of data sites like Worldometer and The Verge, which have promoted interest and attention in data-driven analysis of the COVID-19 outbreak. It’s also been encouraging to see “data science-y” phrases, such as “flattening the curve,” become part of our everyday language. (View the Data Literacy Project video on the topic.) News organizations, too, have been publishing useful comparison charts and graphs of the outbreak, examining the outbreak across countries and looking at how COVID-19 relates to other virus breakouts and conventional causes of death.

All of these developments are useful in forming a baseline understanding of how we process this outbreak. My favorite graph is one by 91-DIVOC (below), which has been published by many news outlets. Graphs like these are getting the public grounded in evaluating the risk of COVID-19 exposure and understanding its global prevalence.

Image Credit: 91-DIVOC

But, the news is not all good. What I have noticed, with some concern, is how difficult it has been for the media to convey (what should be) a simple metric: What is the mortality rate of COVID-19? A recent New York Times article attempted to explain why this has been such a challenge, noting varying population densities and inconsistencies in public health reporting practices as contributing to the confusion. Although the article implies that many factors conspire to make a definitive answer unknowable (at least in the short term), I found some charts in the media that nevertheless tried to provide an answer. To be clear, this is not an exercise in picking apart any organization’s presentation of data, so much as it is an attempt to point out how we can improve our ability to interpret charts rigorously and critically.

Image Credit: The New York Times

The New York Times published this chart (above) in a February 28 article, indicating that the projected fatality rate of the virus would be “below 3%.” Note the early date of the article. This analysis, it would seem, relied on limited data supplied by the Chinese government. Also, note the lack of specific sources (i.e., “many estimates”) and that the Y-axis is a logarithmic scale, which provides a visual that people could misinterpret if they don’t read the chart closely.

Image Credit: MPR NEWS

On March 11, MPR News provided this assessment (above). It’s helpful that this chart does not use a logarithmic scale, but note the figure of 1.4% sourced from, among others, the World Health Organization (WHO). This figure can be confusing, as it appears to conflict with what the WHO states in one of its COVID-19 situation reports published March 6. Here is a quote: “Mortality for COVID-19 appears higher than for influenza, especially seasonal influenza. While the true mortality of COVID-19 will take some time to fully understand, the data we have so far indicate that the crude mortality ratio (the number of reported deaths divided by the reported cases) is between 3-4% [...].” Again, I don’t fault any particular organization. Rather, I would like to ask that readers attempt to challenge what is being presented to them.

Image Credit: Worldometer; note this chart is updated frequently

Now, take this chart (above) from Worldometer related to daily new cases of the virus in Iran. How should we interpret this? It looks like there have been three ebbs in new virus cases, only to be followed by two spikes. Does this mean that we should expect a new surge? Is our “flatten the curve” bias causing us to believe that there is a retreat in new virus cases when the evidence does not support this? Consider these, along with other assumptions we may have about the virus, such as its ability to mutate, our susceptibility to contract the same or a mutated strain of the virus, or the influence of the warmer months and varying pandemic response strategies.

In short, how data is presented can be confusing, and we must be wary about letting our eyes and our preconceptions from getting the better of us. As champions of data literacy and analytics, we need to help in every way possible inside our communities and across our physical and virtual world to make sound policy decisions based on a collective understanding of the situation uninfluenced by political ideology or our gut feelings. The process, which I’ve outlined in six steps below in question form, should look familiar to anyone who has worked in data before.

  • What are the core definitions of data that we care about?
  • What are the definitions of our common measures and KPIs?
  • What are our assumptions about the data?
  • What are the clear policy priorities that we wish to enable?
  • What statistical models fit most closely with what we are experiencing?
  • What should we do?

Following these steps is essential to making appropriate recommendations. While many of us may feel tempted to immediately jump to step six, let’s remember that we first must have a clear understanding of the situation before we can act – and that can only happen if we act as data literate citizens.

Click here to learn more about how Qlik defines data literacy and why it matters.

Our own @JoeDosSantos challenges us to read COVID-19 graphs with more rigor as #dataliterate citizens. Read his latest blog post.

 

In this article:

Get ready to transform your entire business with data.

Follow Qlik