In these uncertain times of the COVID-19 crisis, one thing is certain – data is key to decision making, now more than ever. And, the need for speed in getting access to data as it changes has only accelerated. It’s no wonder, then, that organisations are looking to technologies that help solve the problem of streaming data continuously, so they can run their businesses in real-time.
As I write this blog post, it is during an unusually hot summer and during lockdown. I’ve taken the opportunity of an evening to dust off the CDs (remember them?) to have a music revival by genre. It was during a ska session, when the much covered “A Message To You Rudy” – by UK ska band The Specials – came on. It got me thinking about data streaming messages.
What Is Data Streaming?
Data streaming is designed to deliver real-time insight that can improve your business and give it a competitive edge. Data is processed in real-time from potentially thousands of sources, such as sensors, financial trading floor transactions, e-commerce purchases, web and mobile applications, social networks and many more. By aggregating and analyzing this real-time data, you can use database streaming to gain actionable insights to improve business agility, make better-informed decisions, fine tune operations, improve customer service and act quickly to respond to any opportunity or, indeed, crisis that may come your way.
Effective database streaming requires a sophisticated streaming architecture and Big Data solution like Apache Kafka. Kafka is a fast, scalable and durable publish-subscribe messaging system that can support data stream processing by simplifying data ingest. Kafka can process and execute more than 100,000 transactions per second and is an ideal tool for enabling database streaming to support Big Data analytics and data lake initiatives.
What Is Stream Processing?
Stream processing is a method of performing transformations or generating analytics on transactional data inside a stream. Traditional ETL-based data integration functions are performed on data that will be analysed in a database, data lake or data warehouse. Analytics are typically run within a data warehouse with structured and cleansed data. In contrast, streaming platforms like Apache Kafka enable both integration and in-stream analytics within data as it moves through the stream. Typically, data is ingested into a stream via change data capture (CDC) technologies.
Stream Processing Use Cases
What Is CDC?
CDC is an optimal mechanism for capturing and delivering transactional data from sources into a streaming platform. Stream processing can then take this CDC-generated data and create new streams for additional use cases, or it can generate analysis within the stream of transactions.
The basic concept of stream processing is that database values are converted to a changelog and then reconstructed or used to create new values by operations done in stream processing. This is also a basic example of how CDC replicates source data to a target.
Why Use Log-based CDC?
Log-based CDC results in low to near-zero impact to production sources while creating new streams and performing in-stream analytics in near real-time rather than batch processing. Thus, you can avoid processing duplicate messages, process messages individually or in aggregate, as well as execute time-based aggregation of messages.
However, using Kafka for database streaming can create a variety of challenges. Source systems may be adversely impacted. A significant amount of custom development may be required by highly skilled data engineers. And scaling efficiently to support many and varied data sources can be difficult.
To help ease and overcome these challenges, look to Qlik. We have a strong partnership with Confluent using Qlik Replicate, which leverages CDC. Below are six reasons to choose Confluent and Qlik for your real-time Apache Kafka data streaming.
If you want to learn more, please visit http://www.qlik.com/confluent, and I highly recommend reading the eBook, titled “Apache Kafka Transaction Data Streaming for Dummies,” jointly written with Confluent. Reading it will allow you to understand why Confluent and Qlik are ideal solutions for your Apache Kafka data streaming needs.