Qlik Open Lakehouse: Now Generally Available

A few months ago, at Qlik Connect, we announced the launch of Qlik Open Lakehouse, a new, fully managed capability that brings together the best of the Upsolver platform together with Qlik Talend Cloud. And today, we are excited to announce the general availability of Qlik Open Lakehouse within Qlik Talend Cloud that makes it easy and cost-effective for organizations to ingest, optimize, and manage massive amounts of data directly in Apache Iceberg–based lakehouses.

Why Qlik Open Lakehouse?

The rise of open lakehouses marks a fundamental shift in how enterprises approach modern data architecture. By blending the scalability of data lakes with the reliability of data warehouses, lakehouses powered by open table formats such as Apache Iceberg deliver flexibility, governance, and performance — all while eliminating costly silos.

Yet, building and running a performant Iceberg lakehouse has been challenging. From complex ingestion from diverse sources and pipeline management to manual table optimizations and overhead, organizations have struggled to unlock the full promise of Iceberg. Qlik Open Lakehouse changes that by automating the hard parts and delivering a seamless and truly end-to-end solution.

Key Benefits

Qlik Open Lakehouse handles all the hard parts automatically. It maps source schemas to target structures; resolves data type conflicts, supports rule-based data and metadata standardization, performs automatic file compactions, and handles updates and soft deletes as well as tracks SCD type 2 history – continuously with no manual intervention.

The lakehouse cluster runs in customer’s AWS environment, using Amazon S3 as storage and fully managed Lakehouse cluster as compute. It does not require a data warehouse for ingestion into Iceberg and utilizes cost-effective Amazon EC2 spot instances (with built-in failover mechanisms) delivering up to 80-90% lower costs and compute-burn for data ingestion - delivering true value for customers building open lakehouses.

Some of the key capabilities of Qlik Open Lakehouse include:

High-throughput ingestion into Iceberg: Real-time CDC and batch data ingestion from hundreds of sources (databases, SaaS apps, SAP, mainframes and more) directly into Iceberg tables, with just a few clicks.
Dramatically lower ingestion costs: Utilizes a decoupled, shared-nothing, auto healing architecture, allowing it to scale seamlessly to match usage by dynamically provisioning and deprovisioning EC2 instances, including low-cost spot instances (which are usually up to 50-80% cheaper), delivering dramatic compute cost savings compared to similar architectures.
Adaptive Iceberg Optimization: Continuous monitoring and automated compaction, and cleanups of Iceberg tables deliver 2.5x–5x faster queries while reducing costs by up to 50%. It continuously evaluates and adapts to produce the most impactful optimizations possible for each individual table, delivering unmatched query speeds and cost reduction out of the box, without the need for manual resources.
Interoperable by design: Integrates with leading Iceberg catalogs such AWS Glue Catalog, as part of this release, ensuring compatibility with diverse query and processing engines including Snowflake, Amazon Athena, Amazon Sagemaker Studio, Trino , Presto, and more. Customers can also use Qlik Cloud Analytics with Amazon Athena to effortlessly query data in the lakehouse. Moreover, additional catalog integrations including Snowflake Open Catalog, Apache Polaris (incubating) and Databricks Unity Catalog are coming soon.
Data warehouse mirroring: Seamlessly “mirror” Iceberg tables into cloud data warehouses like Snowflake without duplicating data, ensuring interoperability, reducing costs while minimizing data movement.
Unified platform: As part of a broader platform Qlik Talend Cloud, it offers customers an end-to-end solution to ingest, transform, govern, optimize, and manage Iceberg-based pipelines from one single unified platform — with no need to stitch together multiple tools or engines.

Let’s dive a little bit deeper into the architecture for Qlik Open Lakehouse.

High Level Architecture

Qlik Open Lakehouse is designed to remove complexity and give organizations the tools to manage and operate Iceberg at enterprise scale, without the need for specialized engineering teams or endless manual tuning.

The Qlik Open Lakehouse architecture combines low latency ingestion, scalable cost-effective compute, and efficient data processing to deliver a modern lakehouse experience.

With Qlik Open Lakehouse, data never leaves a customer’s private cloud. It runs in the customer’s own AWS cloud environment, which helps maintain data sovereignty, data security and governance -- and leverages native AWS components, including EC2, S3, and Glue Catalog to deliver a truly simplified end-to-end experience.

A high-level architecture diagram is shown below:

Qlik Talend Cloud

1. Ingestion Engine Built for Scale

Handles both batch and real-time ingestion from hundreds of operational and cloud data sources.
Built-in Change Data Capture (CDC) captures changes from operational systems in real time, preserving historical context with SCD Type 2 automated tracking.
The Data Movement Gateway runs in customer’s on-premises or cloud environment. It securely captures changes (CDC) from source systems, such as RDBMS, SAP, or mainframes, and sends the data directly to their Amazon S3 landing zone.
Automatically maps source-to-target data types, resolves type conflicts, executes row-level data standardizations and transformations seamlessly.

2. Lakehouse cluster (EC2 Auto-Scaling Group)

The lakehouse cluster is a group of AWS EC2-spot instances responsible for data processing. The cluster instances coordinate and execute the workloads to process incoming data from the landing area and, after processing, store the data in the target location in Iceberg format.
A lakehouse cluster with a single AWS Spot Instance is automatically created during the setup of network integration. Customers can manage and create additional clusters or scale clusters seamlessly to support their ongoing lakehouse requirements.
Further to defining the number of Spot and On-Demand instances to use, customers can also manage and configure a scaling strategy that best suits the workload and budget for their project including – low cost, low latency, consistent low latency, or no scaling.

3. Adaptive Iceberg Optimizer

Traditional Iceberg optimization requires constant manual tuning. Qlik changes that with an intelligent optimizer that continuously monitors workloads and table characteristics.
Qlik’s Adaptive Iceberg Optimizer, a key component of the Open Lakehouse, executes the right compactions, and cleanup operations automatically — boosting query performance by up to 2.5-5x while optimizing costs.
Runs continuously in the background, freeing data engineers from repetitive maintenance tasks.

4. Elastic Compute and Interoperability

Data is stored once in Iceberg and can be queried by a wide range of query and processing engines such as: Snowflake, Amazon Athena, Amazon Sagemaker Studio, Trino, Presto, Dremio, and more. Customers can also use Qlik Cloud Analytics together with Amazon Athena to easily can easily query data in the lakehouse.
Effectively decouples of storage, compute, and query engines eliminating vendor lock-in, allowing organizations to pick the best engine for each workload, delivering on the true promise and potential of open table formats.
With Data warehouse Mirroring customers can also create external Iceberg tables in Snowflake to enable the querying of data stored in the lakehouse using Snowflake without having to duplicate the data. This allows customers to easily utilize Snowflake analytics engine on top of Iceberg-managed data stored in formats such as Parquet on S3.

5. End-to-End Automation

Automated management of snapshot expiration, orphaned file cleanup, and file compaction ensures ongoing performance without intervention.
Optimizations and governance happen transparently, letting teams focus on innovation instead of infrastructure.

6. Governance & Trust at the Core

Automated data lineage & impact analysis so teams know exactly where data originated, how it’s been transformed and where its used
Data Quality validation rules, semantic type detection and Qlik Trust Score provide quality observability to ensure data meets business and governance policies
Delivers trusted data quality that’s ready for analytics, machine learning, and AI.

What is even more exciting is that all these capabilities that are launched today as part of Qlik Open Lakehouse will be included in the Standard Edition and upwards of Qlik Talend Cloud making it even easier and affordable for customers to adopt and work with Apache Iceberg.

The result: a lakehouse that feels as easy to manage as an analytics engine, but with the openness, scale, and cost-efficiency of the cloud.

For more details on the architecture and implementation please visit the Qlik Lakehouse Documentation page.

Qlik Open Lakehouse: What’s Available Today

With this GA release, customers can:

Deploy and manage an Iceberg-based lakehouse in AWS using Amazon S3 with just a few clicks.
Ingest CDC and batch data from 200+ sources directly into Iceberg tables, without the need for a data warehouse
Create and manage pipelines for the Qlik Open Lakehouse as the target, using AWS Glue as the catalog, Amazon S3 as storage
Continuously and automatically optimize Iceberg tables using Adaptive iceberg Optimizer with no manual effort.
Automatically manage SCD type 1 and 2 outputs.
Perform rule-based and ad-hoc data transformations
Mirror Iceberg tables into Snowflake for hybrid transformation workflows.
Integrate seamlessly with catalogs (AWS Glue Catalog) and leading query engines (see above).

What’s Coming Next

Qlik is committed to extending the Open Lakehouse roadmap with:

Advanced streaming ingestion into Iceberg: low-latency, high-volume pipelines at scale from streaming sources such as Kafka, Kinesis and S3
Streaming Transformations: transform complex hierarchical data structures into easy-to-query table structures
Deeper Ecosystem integrations: with catalogs including Snowflake Open Catalog, Apache Polaris and Databricks Unity
Enhanced observability: Advanced cluster monitoring, improved visibility on data profiling and table optimizations
Plug and play Connectors: Pre-built connectors for more BI, AI/ML, and data governance platforms.

Conclusion

With Qlik Open Lakehouse now generally available, organizations can finally realize the full potential of Apache Iceberg and the open lakehouse model. By combining simplicity, automation, and interoperability, Qlik delivers a foundation for trusted analytics, real-time insights, and AI/ML innovation utilizing the power of Iceberg — at scale and at lower cost.

Start building your open lakehouse today.

Visit Qlik Open Lakehouse to learn more and request a demo.

Learn More:

In this article:

Data Integration

Ready to get started?

Request a Demo

Why Qlik?

Agentic AI

Data Integration and Quality

Qlik Cloud Analytics

Find a partner

Global System Integrators