The jury is still out on how COVID-19 will impact the adoption of AI and machine learning (ML). But, the signs so far indicate that the adoption will only increase with businesses relying more heavily on AI and ML for functioning in a changed world.
Prior to the pandemic, the AI and ML market was growing at a robust rate, projected to reach nearly $100B in global spending by 2023 per IDC. That projection does not seem to have changed. In fact, from my perspective, the crisis has seemingly acted as a catalyst to driving that growth, with companies big and small deploying AI to run projections, looking for ways to prepare for what lies ahead.
Although AI possesses great potential to deal with COVID-19 related disruptions across sectors, spurring novel uses, such as touch-less robo-deliveries in retail or remote diagnostics in healthcare, the Achilles heel for AI and ML adoption continues to be appropriate data management. AI and ML models can be exceptional in recognizing patterns that may be too subtle for human operators; however, to do so successfully, the models need to be exposed to and trained on huge volumes of data to recognize these patterns. Additionally, because there are no generic, all-purpose AI and ML models, the data needs to be relevant (i.e., an AI solution that predicts, say, a customer’s propensity to respond to a specific offer amongst thousands of offers, or the one that identifies fraudulent activity amongst millions of transactions, needs to be trained for that particular task, with appropriate data).
Therein lies the problem. Although organizations have massive volumes of data, they lack the right technology infrastructure to ensure that the data is well defined, accessible, possesses the right quality and integrity characteristics, and is consumption-ready for AI and ML. Looking at the highest level, there are three key data challenges hampering the success of AI initiatives. The 1st challenge is:
Data lakes can provide that single, central source of data to feed and train AI and ML models, for their ability to store massive volumes of all types of data – structured, semi-structured and unstructured. But, data lakes on their own offer little value to AI and ML initiatives. This brings us to our 2nd challenge.
Automation can help accelerate the transformation and refinement of raw data into an analytics-ready stage, while alleviating the need for specialized data engineering and programming skills. An integrated catalog can help generate rich metadata to ensure data is easily understandable and searchable.
However, the need for speed has to be balanced with the need for veracity and trust, which brings us to our 3rd but most important data challenge.
Data confidence comes in many, equally important facets: change propagation to ensure source-target schema sync; the ability to persist change history for end-to-end lineage; integrated security and governance to enable enterprise-level access controls. These are critical components, so your data scientists can bypass tedious data preparation to focus on high-value modeling and training tasks.
AI usage is only going to accelerate in the coming years. Well-architected, modern data lakes can provide you that single source of trusted, analytics-ready data to help you maximize return from your AI investments.
To learn how Qlik can help you architect data lakes of the future to accelerate your AI and ML journey, register for our upcoming webinar on June 3: “One Source of Truth for AI & Analytic Data: Optimizing Your Data Lake Pipeline for Faster Business Insights.” Jointly presented by myself and experts from TDWI and AWS, the webinar will highlight the importance of managed data lakes for the success of AI and ML programs and the core capabilities required for building a performant data lake.
We look forward to talking with you and answering your questions on June 3.
In the meantime, we encourage you to learn more about our Data Lake Creation solution and try our Qlik Replicate product for free to see how easy and quick it is to ingest multi-source, multi-format data into a data lake to accelerate you AI and ML initiatives.