Research on COVID-19 is being produced at an accelerating
rate, and machine intelligence could be crucial in helping the medical
community find key information and insights. When I came across the COVID-19 Open Research
Dataset (CORD-19), it contained about 57,000
scholarly articles. Just one month later, it has over 158,000 articles. If the
clues to fighting COVID-19 lie in this vast repository of knowledge, how can
call to action on the CORD-19 dataset has been issued through an AI challenge to the world’s AI
experts. The desired outcome is a set of text and data mining tools that can
help answer high-priority scientific questions. Looking through the submissions
to the challenge, I realized that this is a very ambitious goal if we rely
entirely on AI as it exists today. So, I tried to approach this problem a bit
differently with Qlik.
goal was to give researchers the ability to find COVID-19 related articles based
on Qlik’s native search and exploration capabilities, and then apply machine
learning (ML) techniques within the chosen context. You’ll see the solution at
the end of this post.
Best of Both Worlds
the last two years, I’ve been working on an open source project (https://github.com/nabeel-oz/qlik-py-tools) that provides Data
Science and ML capabilities for Qlik. I have been fascinated with the idea of
using advanced analytics techniques within the free-form exploratory experience
of Qlik Sense.
this case, I used two ML capabilities: Named Entity Recognition and Clustering. The first technique
is used to extract biomedical entities from the title and abstract of each
article using a pre-trained Deep Learning model. This becomes a rich
new dimension for search and exploration in the Qlik Sense app. While this is
done during the reload process, the clustering algorithm is used in real-time for
interactive analysis. As a user drills down by making selections, an algorithm
groups research articles into clusters based on similarity of entities
appearing in titles and abstracts.
short, the solution combines Qlik’s associative experience with ML techniques
in a way that boosts human intelligence and ability.
Staying Up To Date
A key part of the challenge is staying up to date with a growing amount of research. With Qlik, it is standard practice to set up incremental loads, and this solution makes the process relatively straight forward.
Applicability To Unstructured Data
While this app was built for the CORD-19 dataset, the techniques can be applied to text data in general. This is a demonstration of how Qlik can help provide intelligence on the vast amount of unstructured data that is usually being left out of analytics solutions at most organizations today.
“The value of data in the fight against COVID-19 cannot be overstated. The ability to bring unstructured data and research information into the solution and combine it with data sets from the WHO, CDC, Johns Hopkins and others will accelerate our society’s ability to leverage knowledge and resources to stop the pandemic and recover in as fast, healthy, safe and economical way as possible,” said Julie Kae, Executive Director of Qlik.org. “Qlik.org has made many assets available, which can be found at www.qlik.org/covid19 and proudly includes this unique solution for public access, as well.”
See below to watch a short
demonstration of the solution.
The project is being maintained on GitHub, and the app is
publicly available and can be accessed by clicking here.
How can #Qlik and #machinelearning help you sift through #COVID19 research? Find out by reading our own Nabeel Asif's blog post.