Hybrid Cloud Data Transfer

Introduction to Hybrid Cloud Data Transfer

We all know that cloud architectures have redefined IT. Elastic resource availability, usage-based pricing, and reduction of in-house IT management burden have proven compelling for most modern enterprises. Odds are that your organization is getting more and more serious about running operations and analytics workloads in the public cloud, most notably Amazon Web Services (AWS), Azure, or Google Cloud Platform. Most data infrastructure, including data warehouses, data lakes, and streaming systems, can run in the cloud, where they need the same data integration and processing steps outlined in this Chapter 3.

New Cloud-Based Data Architectures

As you adopt cloud platforms, data movement becomes more critical than ever. This is because clouds are not a monolithic, permanent destination. Odds are also that your organization has no plans to abandon on-premises datacenters wholesale anytime soon. Hybrid cloud architectures will remain the rule. In addition, as organizations’ knowledge and requirements evolve, IT will need to move workloads frequently between clouds and even back on-premises.

Change Data Capture (CDC) makes the necessary cloud data transfer more efficient and cost-effective. You can continuously synchronize on-premises and cloud data repositories without repeated, disruptive batch jobs. When implemented correctly, CDC consumes less bandwidth than batch loads, helping organizations reduce the need for expensive transmission lines. Transmitting updates over a wide area network (WAN) does raise new performance, reliability, and security requirements. Typical methods to address these requirements include multipathing, intermediate data staging, and encryption of data in flight.

Streaming Change Data Capture

Learn how to modernize your data and analytics environment with scalable, efficient and real-time data replication that does not impact production systems.

CDC Use Case for Cloud Architectures

A primary CDC use case for cloud architectures is zero-downtime migration or replication. Here, CDC is a powerful complement to an initial batch load. By capturing and applying ongoing updates:

  • CDC can ensure that the target is and remains fully current upon completion of the initial batch load.

  • You can switch from source to target without any pause in production operations.

Case Study (Fanatics)

The sports fan merchandiser Fanatics realized these benefits by using CDC to replicate 100 TB of data from its transactional, ecommerce, and back-office systems, running on SQL Server and Oracle on-premises, to a new analytics platform on Amazon S3. This cloud analytics platform, which included an Amazon EMR (Hadoop)–based data lake, Spark, and Redshift, replaced the company’s on- premises data warehouse.

Hybrid Cloud Data Transfer Results

By applying CDC to the initial load and subsequent updates, Fanatics maintained uptime and minimized WAN bandwidth expenses. Its analysts are gaining actionable, real-time insights into customer behavior and purchasing patterns.

Learn more about Qlik Replicate today