Apache Kafka is a massively scalable distributed platform for publishing, storing and processing data streams. Kafka streams integrate real-time data from diverse source systems and make that data consumable as a message sequence by applications and analytics platforms such as data lake Hadoop systems. Kafka technology is used by some of the world's leading enterprises in support of streaming applications and data lake analytics, but for many organizations there are still questions about how to integrate Kafka streams into existing enterprise data infrastructures in a way that maximizes benefits while minimizing costs and risks.
Although Kafka has been employed in high-profile production deployments, it remains a relatively new technology with programming interfaces that are unfamiliar to many enterprise development teams. Organizations seeking to implement Kafka streams run the risk that a lack of relevant programming expertise may result in delays launching Kafka initiatives, or that once Kafka implementations are in place they may lack the agility needed to keep pace with changing business requirements.
Qlik Replicate eases these problems by serving as a producer to Kafka and automating the creation of inbound Kafka streams. With Qlik Replicate you can use a graphical interface to configure and execute data publishing pipelines from diverse source systems into a Kafka cluster, without having to do any manual coding or scripting. This empowers data architects and data scientists to supply real-time source data to Kafka-Hadoop pipelines and other Kafka-based pipelines, without being tied up waiting on the availability of expert development staff.
Part of the appeal and power of Kafka is its ability to integrate streaming data from multiple diverse source systems into one highly scalable stream processing and subscription platform. The fact that a large number of heterogeneous source systems can publish into the Kafka streams platform does however pose difficulties in terms of maintenance and transparency, if the different source systems use different clients or scripts to publish to Kafka.
Qlik Replicate reduces maintenance complexity and increases transparency by providing a single unified solution through which all source-to-Kafka pipelines can be managed. Qlik supports GUI-driven integration between Kafka and a wide range of source systems, including all major database systems – leveraging Qlik low-impact, agentless change data capture technology – as well as major SAS applications, enterprise data warehouse platforms, and legacy mainframe systems. Through a single interface you can configure, execute, monitor, and update all your Kafka data ingestion pipelines, with seamless support for native Kafka streams features like topics and partitions.
Along with supporting Kafka streams implementations, Qlik Replicate supports other data integration pipelines between all major on-premises or cloud-based source or destination.. Your team can use Qlik Replicate as a direct Hadoop data ingestion tool, a database migration tool, or a tool for replicating on-premises data to cloud targets like AWS Redshift, for example. Qlik engineers have powerfully answered the question "What is data replication?" in the modern enterprise by developing a unified, any-to-any replication solution that supports the full range of modern data replication use cases.