Hadoop Data Lake

For organizations looking to turn big data into business intelligence and competitive advantage, the Hadoop data lake has emerged as a powerful complement to the traditional enterprise data warehouse.

Enterprise data warehouses are a premium platform to integrate structured data for traditional reporting and analytic tasks. Hadoop data lakes are new approach. They support the aggregation and ad hoc analysis of vast quantities of structured, semi-structured, and unstructured data on a low cost and readily scalable cluster of commodity servers. Big data and Hadoop empower organizations to find patterns, correlations, and anomalies in long-term historical data and in types of data and content that cannot be stored in or queried by relational database systems.

But to reap the benefits of big data analytics, enterprises must meet the challenges of creating, loading, and managing a Hadoop data lake.

Qlik (Attunity): The Fast and Easy Way to Ingest Data into a Hadoop Data Lake

Qlik Replicate (formerly Attunity Replicate) is the big data integration solution trusted by thousands of organizations worldwide. With industry-leading support for replicating data from nearly any type of source system to nearly any type of target system, Qlik (Attunity) is uniquely well-suited to the task of filling and continuously refreshing a Hadoop data lake.

Qlik (Attunity) provides a simple, universal, and real time solution for moving structured, semi-structured, and unstructured data into a Hadoop data lake running any of the major Hadoop distributions such as Cloudera, Hortonworks, or MapR. With Qlik (Attunity) you have centralized control over Hadoop data ingestion from any type of data source including:

  • All major relational database systems
  • All major data warehouse systems
  • All major mainframe systems
  • Enterprise applications such as SAS and Salesforce
  • Flat files and media files
  • For all such sources, and for any target system including Hadoop, you can configure and run end-to-end replication jobs through an intuitive GUI, with no manual coding. Qlik (Attunity) automatically creates and executes the code necessary to invoke the native APIs on the source and target systems.

    By enabling self-service for creation and execution of big data ingestion jobs and flows, Qlik (Attunity) empowers analysts and data scientists to quickly add new feeds to a Hadoop data lake. And while self-service makes your big data operations more agile, Qlik (Attunity) change data capture (CDC) technology allows for continuous updates from the source systems to the data lake, ensuring that your team's data lake analytics are always based on the freshest possible data.

    Maximizing the Value of a Hadoop Data Lake

    Along with making it easy to load data into a Hadoop data lake and keep it fresh, Qlik (Attunity) helps you maximize your return on your data lake investment through enterprise features including:

  • Unified monitoring of Hadoop and EDW data and resource usage. With Qlik Visibility (formerly Attunity Visibility) you gain insight into Hadoop data and processing usage by application, by line of business, and by user, in support of capacity planning, performance optimization, charge-back and show-back, and ROI assessment. From the same console you can monitor and analyze data and resource usage within your enterprise data warehouse, so you can identify data and workloads that may be better offloaded to the lower-cost data lake in order to improve cost and performance.
  • Replication of Hadoop output to other target systems. Just as Qlik (Attunity) makes it easy to move data into your Hadoop data lake, so too it makes it easy to move the output of Hadoop jobs to other target systems such as database systems, your data warehouse, or cloud-based targets like Amazon S3 or Elastic MapReduce.

Five Principles for Effectively Managing Your Data Lake Pipeline