The Architectural Principles of Qlik Replicate: Support for Full Load & CDC Data Replication

A Look Under the Hood of the Foundation of Qlik Data Integration Platform – Qlik Replicate – and How It Supports Both Full Load and CDC

In this technical post, let's take a look under the hood of Qlik Replicate, which is the foundation of our complete dataintegration platform. Here we look at Qlik Replicate's support for both Full Load (a.k.a. batch) and change data capture (CDC).

The architecture of Qlik Replicate is web based and typically acts as a middle-tier server, comprised of three domains: the sources, the replication server and the targets. Data sources and targets can be on-premises and in the cloud. DBAs and Data Engineers can interact with these domains through an intuitive web-based interface with Console and Designer views. This web-based architecture follows through the rest of the data integration platform, which unlocks information assets through high performance, scalable and easy-to-use solutions to create high performance data pipelines.

Support for Full Load and CDC Replication

With Qlik Replicate, the Full Load process is comprised of three steps:

  1. Creating files or tables at the target database;
  2. Automatically defining the metadata that is required at the target; and
  3. Populating the tables with data from the source.

Unlike the CDC process, data is loaded into one or multiple tables or files at a time. This is done for efficiency reasons. Although the source tables may be subject to update activity during the Full Load process, there is no need to stop applications in the source. To guarantee the consistency of the data, the CDC process is automatically activated as soon as the Load starts.

However, changes are not applied at the target until after the table loading completes. Although the data on the target may not be consistent while the Load is active, the target data’s consistency and integrity at the conclusion of the Load are guaranteed. In addition, the Load process can be interrupted. When it restarts, it will continue from wherever it was stopped. New tables can be added to an existing target without reloading the existing tables. Similarly, columns in previously populated target tables can be added or dropped without the need to reload.

The CDC process obtains a stream of filtered events or changes in data or metadata from the transaction and archive log files from the source. One of its most important functions is to buffer all the changes for a given transaction into a single unit before it is sent forward to the target when the transaction commits. As already mentioned above, during the initial load, CDC also buffers all the changes that occur within a transaction until after all affected tables have finished being loaded. If changes cannot be applied to the target database in a reasonable timeframe, they are buffered on the replication server for as long as necessary. This alleviates the need for re-reading the source database logs, which could take significant amounts of time.

Two important processes that are essential for both Full Load and CDC are Filter/Compress and Transformation:

  • Filter/Compress — Filtering conditions can be defined on the values of one or more source columns, which means that rows and columns which are not relevant are discarded before replicating them to the target database. This may be useful, for example, when a column is not in the target database schema or when a row does not pass the user-defined predicates on the rows within the source tables.
  • Transformation — There may be circumstances in which data to be included in the replicated tables is not an exact copy of the source data. Qlik Replicate allows users to define and automatically apply those changes to tables and columns. Examples include: renaming the target table, renaming any column in the target table, deleting a target column, changing the data type and/or the length of any target column, as well as adding additional target columns. Qlik Replicate performs data-type transformations as required, calculating the values of computed fields, and applying the changes as one transaction to the target. When no user defined transformation is set, but replication is done between heterogeneous databases, some transformation between different database data types may be required. In these cases, Qlik Replicate automatically takes care of the required transformations and computations during the Load or CDC execution.

Qlik Replicate will apply these lightweight filtering and transformations in-memory, and it is worth noting that additional more complex transformations can be applied by utilizing the rest of the Qlik Data Integration platform further upstream, such as automatic creation of data marts, warehousing and lakes by taking advantage of Data Warehouse Automation, Data Lake Creation and Enterprise Data Catalog, which are comprised of other products called Qlik Compose and Qlik Catalog.

There is a lot of flexibility in the data flows that can be configured with Qlik Replicate, such as replicating data from one source to many targets (fan-out) or many sources to one target (fan-in) and moving data from multiple disparate systems into the cloud.

A quick note to make for the persistent store is that this is not a permanent store for the data that is being replicated; it stores the configuration and state of a task.

Each replication source and target is configured as a single task, and it’s this configuration that is stored (metadata, selected fields and tables, and the required transformations). The state of the last replicate task is also stored, so Qlik Replicate can pick up where it left off in case of interruptions, errors or pauses in the replication tasks. Think bookmarks.

And all of this can be done at scale. We have many customers with hundreds of tasks running in their production systems, automating their data integration requirements.

In a separate post, I'll detail Qlik Replicate's zero-footprint architecture and scalability.

If you would like to try Qlik Replicate for yourself, you can take a Test Drive in a controlled sandbox environment and see how easy it is to start data replication or get in touch to discuss your requirements.

Peek under the hood at the Architectural Principles of @Qlik Replicate w/ @adammayerwrk

 

In this article:

Get ready to transform your entire business with data.

Follow Qlik