Announcing Attunity Compose for Hive

A new way to accelerate data loading and transformation for Hadoop Data Lakes.

Attunity Compose for Hive automates the data pipeline to create analytics-ready data by leveraging the latest innovations in Hadoop such as the new ACID Merge SQL capabilities, available today in Apache Hive (part of the Hortonworks 2.6 distribution), to automatically and efficiently process data insertions, updates and deletions.

Attunity Compose for Hive was announced at the DataWorks Summit 2017. Itamar Ankorion, Chief Marketing Officer at Attunity explained that “We help large corporations around the world implement strategic data lake initiatives by making data available in real-time for analytics and enabling them to overcome the inherent challenges associated with building modern data systems. Attunity Compose for Hive directly addresses these challenges to automate the implementation of Hadoop Hive. It works by eliminating complex and lengthy manual development work for faster and more efficient implementation of analytics-ready data sets.”

How Does Attunity Compose for Hive Work?

Attunity Compose for Hive automates the creation, loading and transformation of data into Hadoop Hive structures. It fully automates the pipeline of business intelligence (BI) ready data into Hive, to create both Operational Data Stores (ODS) and Historical Data Stores (HDS). Attunity Compose leverages the latest innovations in Hadoop such as the new ACID Merge SQL capabilities that are available today in Apache Hive (part of the Hortonworks 2.6 distribution), to process data insertions, updates and deletions.

Attunity Replicate integrates with Attunity Compose to accelerate data ingestion, data landing, SQL schema creation, data transformation and ODS & HDS creation/updates.

With Attunity Compose for Hive, you have:

  • Real-time data ingestion and landing. Leverage tight integration with Attunity Replicate to ingest data in batch or via continuous data capture (CDC), then copy that data to an on-premises or cloud target.
  • Comprehensive automation. Generate Hive schemas automatically for ODS and HDS targets, and all necessary data transformations are seamlessly applied.
  • Continuous, non-disruptive data store updates. Leverage the ANSI SQL compliant ACID MERGE operation to process data insertions, updates and deletions in a single pass.
  • Transaction consistency. Partition updates by time to ensure each transaction update is processed holistically for maximum consistency.
  • Improved operational visibility. Support slow changing dimensions to understand change impact with a granular history of updates such as customer address changes, etc. within the Historical Data Store.

Data Automation to Hive in Five Steps

Step 1: Use Attunity Replicate ingest data into Hadoop and partition the data

Attunity Replicate transfers data into Hadoop and the HDFS files systems in parallelized formats via WebHDFS and HttpFS protocols or over NFS and connects to HCatalog via ODBC and HQL Scripts. As data is loaded into Hadoop, the process of data partitioning is introduced as a way of creating metadata to address the consistent, transactionally verified datasets. Data files are uploaded to HDFS, according to the maximum size and time definition, and then stored in a directory under the change table directory. Whenever the specified partition timeframe ends, a partition is created in Hive, pointing to the HDFS directory.

Attunity Replicate Ingest Data into Hadoop

Step 2: Connect to the Hadoop Cluster and configure CDC and ETL process

The images below showcase the connections into Hive and into the source database, Northwind, a MySQL instance.

Connect to the Hadoop Cluster

By optionally storing the history of changes through the Manage Metadata -> Save Changes screen, you have the ability to select design an Operational or Historical data stores.

Creating a Data Warehouse

Step 3: Generate HIVE LLAP code for loading data

Generate Hive LLAP Code

Attunity Compose considers these key items while generating Hive ETL calls:

Attunity Compose considers these key items while generating Hive ETL calls:

  • Extracting data from the sources (initial load and CDC)
  • Loading data into landing zone in transactionally consistent data partitions to maintain integrity
  • Transforming data in the landing zone from sequence to ORC format
  • Handling ETL for DELETE operations
  • Scaling to support large number of sources, tables and truncations with the considerations of parallel processing of tasks
  • Inelegantly managing number of parallel ETL processes to prevent Hadoop cluster overload.

By adding some changes to the source system, the data becomes delivered to [table]’_delivery’ zone, which is where the final presentation layers.

Data Delivery to Table

By carrying audits throughout the process with another set of tables for audits per record in [table]’_landing HIVE tables that have change tables and a record of the table’s partitions. The CDC partitions create records of when changes hit those partitions in the ‘attrep_cdc_partitions.’

CDC Partitions Table

By reviewing the content, the latest merge content gets introduced. By looking at the latest updates and merges record by reviewing the ‘I’ (Insert) and ‘U’ (Update) statements, as well as, appending to process to reconcile, where a delete occurred.

Manage ETL Sets

Step 4: Configure the Parallelism and Optimizations needed

Throttling of run to overload the Hadoop cluster (by limiting the number of SQL statements we run), within the manage ETL set under ETL Commands, settings, then advanced to address the number of max concurrent DB connections to use.

Configure the Parallelism and Optimizations needed

Step 5: Show the data through Hive

Finally, a reconciled delivery zone of data presented through Hive.

Show the Data through Hive

In summary, the business benefits of Attunity Compose for Hive are:

• Faster data lake operational readiness

• Reduced development time

• Reduced reliance on Hadoop skills

• Standard SQL access

Would you like to try Attunity Compose for Hive? To learn more or participate in the beta program, please click here.

Keep up with the latest insights to drive the most value from your data.

Get ready to transform your entire business with data.

Follow Qlik