Sifting Through COVID-19 Research With Qlik and Machine Learning

Research on COVID-19 is being produced at an accelerating rate, and machine intelligence could be crucial in helping the medical community find key information and insights. When I came across the COVID-19 Open Research Dataset (CORD-19), it contained about 57,000 scholarly articles. Just one month later, it has over 158,000 articles. If the clues to fighting COVID-19 lie in this vast repository of knowledge, how can Qlik help?

The call to action on the CORD-19 dataset has been issued through an AI challenge to the world’s AI experts. The desired outcome is a set of text and data mining tools that can help answer high-priority scientific questions. Looking through the submissions to the challenge, I realized that this is a very ambitious goal if we rely entirely on AI as it exists today. So, I tried to approach this problem a bit differently with Qlik.

My goal was to give researchers the ability to find COVID-19 related articles based on Qlik’s native search and exploration capabilities, and then apply machine learning (ML) techniques within the chosen context. You’ll see the solution at the end of this post.

Best of Both Worlds

Over the last two years, I’ve been working on an open source project (https://github.com/nabeel-oz/qlik-py-tools) that provides Data Science and ML capabilities for Qlik. I have been fascinated with the idea of using advanced analytics techniques within the free-form exploratory experience of Qlik Sense.

In this case, I used two ML capabilities: Named Entity Recognition and Clustering. The first technique is used to extract biomedical entities from the title and abstract of each article using a pre-trained Deep Learning model. This becomes a rich new dimension for search and exploration in the Qlik Sense app. While this is done during the reload process, the clustering algorithm is used in real-time for interactive analysis. As a user drills down by making selections, an algorithm groups research articles into clusters based on similarity of entities appearing in titles and abstracts.

In short, the solution combines Qlik’s associative experience with ML techniques in a way that boosts human intelligence and ability.

Staying Up To Date

A key part of the challenge is staying up to date with a growing amount of research. With Qlik, it is standard practice to set up incremental loads, and this solution makes the process relatively straight forward.

Applicability To Unstructured Data

While this app was built for the CORD-19 dataset, the techniques can be applied to text data in general. This is a demonstration of how Qlik can help provide intelligence on the vast amount of unstructured data that is usually being left out of analytics solutions at most organizations today.

“The value of data in the fight against COVID-19 cannot be overstated. The ability to bring unstructured data and research information into the solution and combine it with data sets from the WHO, CDC, Johns Hopkins and others will accelerate our society’s ability to leverage knowledge and resources to stop the pandemic and recover in as fast, healthy, safe and economical way as possible,” said Julie Kae, Executive Director of Qlik.org. “Qlik.org has made many assets available, which can be found at www.qlik.org/covid19 and proudly includes this unique solution for public access, as well.”

Demonstration

See below to watch a short demonstration of the solution.

The project is being maintained on GitHub, and the app is publicly available and can be accessed by clicking here.

How can #Qlik and #machinelearning help you sift through #COVID19 research? Find out by reading our own Nabeel Asif's blog post.

 

In this article:

Comments

Get ready to transform your entire business with data.

Follow Qlik