Effortless Table Management with Qlik's Adaptive Iceberg Optimizer: Boost Performance, Cut Costs

Boost query performance by more than 2.5X and reduce storage costs by 50% compared to self-tuned Hive tables, without lifting a finger.

iceberg Many data teams still need to manually develop and schedule custom tasks to maintain each and every table in their lakehouse, leading to inconsistent query performance and runaway costs. Code- and engineering-intensive workflows are used to ensure tables are always optimized, old versions of tables are properly expired, and newly ingested data doesn't conflict with ongoing optimizations. This creates unnecessary overhead and reduces the value businesses can generate from their lakehouse for their analytics and AI implementations.

Qlik Open Lakehouse, a fully managed capability within Qlik Talend Cloud, includes our industry-leading Adaptive Iceberg Optimizer – an intelligent agent that continuously monitors Iceberg tables, data files and optimizes how they are organized and stored for faster queries and lower storage costs. Adaptive Iceberg Optimizer automatically manages your tables to eliminate write conflicts and possible data corruption, freeing up hundreds of engineering hours spent on monitoring, troubleshooting and optimization tasks.

Adaptive Iceberg Optimizer works on Iceberg tables ingested and managed as part of pipelines within Qlik Talend Cloud and is integrated with AWS ecosystem including the AWS Glue Catalog, Amazon S3, Amazon Athena, Amazon Sagemaker Studio etc. Data within the lakehouse can be queried using several engines including many of the Iceberg-compatible query engines. Optimizations are powered by Qlik's highly scalable, cloud native lakehouse platform - making it easy and cost-effective to manage petabyte-scale iceberg lakehouse implementations.

Data layout and file optimization is a hard problem

Organizations adopt Apache Iceberg and the lakehouse architecture in order to make all their data available for analytics and AI, in a more scalable way than traditional data warehouses or data lakes. However, ingesting operational data into ready-to-query formats often becomes a stumbling block.

To extract the best query performance from Iceberg tables, files must be grouped into the right number of partitions, with rows sorted and coalesced efficiently among other techniques, to produce an ideal layout that best matches changing query patterns. Getting this right consistently requires deep technical expertise, long testing cycles and tedious manual troubleshooting, tuning and tweaking.

But even after that effort is complete, the work is not done. Engineers also need to answer:

Which tables should be optimized?
How often should optimizations be performed?
Which table properties should be tuned and to what values?
How can we ensure table data is retained for only the allowed duration?

Having gathered these answers, teams must then develop the code and procedures to implement and run these operational tasks. Tasks will include scheduling optimizations per table, recovering from write conflicts, diagnosing failures, and validating proper expiration and deletion of data. Finally, teams must continuously test, tune and update these jobs as data volume increases, new tables are added, and access patterns change.

Qlik’s Adaptive Iceberg Optimizer enables you to scale your data management automatically

With Adaptive Iceberg Optimizer in Qlik Open Lakehouse, it takes care of all the Iceberg data layout, optimizations and maintenance for you. The lakehouse cluster runs in customer’s AWS environment, utilizing Amazon S3 as storage and fully managed Lakehouse cluster as compute. It integrates with AWS Glue Data Catalog and additional catalog integrations including Snowflake Open Catalog, Apache Polaris (incubating) and Databricks Unity Catalog are coming soon. Qlik-managed Iceberg tables can be queried using different query engines such as Snowflake, Amazon Redshift and Athena, Dremio, Starburst, Apache Presto and Trino and other Iceberg compatible engines.

Once it's running, Adaptive Iceberg Optimizer offers three main capabilities:

Algorithmic analysis determines the most impactful way to optimize your Iceberg tables

Adaptive Iceberg Optimizer determines when and how to optimize Iceberg data. It also calculates when to delete files based on factors such as data profile, table properties, frequency of row-level changes, cost and performance characteristics. Using advanced algorithms, Adaptive Iceberg Optimizer continuously evaluates and combines these factors to produce the most impactful optimizations possible for each individual table, delivering unmatched query speeds and cost reduction out of the box.

During ingestion and compaction, Adaptive Iceberg Optimizer collects and refreshes table statistics, without the need to run ANALYZE on each table. These statistics assist query engines in further accelerating the planning and execution of queries on Iceberg tables.

Intelligent optimizations uniquely adapts to your data for improved Iceberg lakehouse performance

Not all tables are created equal. Adaptive Iceberg Optimizer, as its name suggests, adapts to the chaotic, suboptimal characteristics of raw data to uniquely structure, organize and optimize, delivering the most impactful performance and cost savings, per table. Previously, engineers applied a one-size-fits-all approach to table optimization or sacrificed performance and cost savings by only tuning the most popular tables. With Adaptive Iceberg Optimizer, you get both performance and cost savings across all of your tables without lifting a finger.

In our tests, we found that querying Iceberg tables optimized by Qlik was substantially faster, delivering up to 2.5 - 5x performance improvements with 50% storage cost reductions compared to unmanaged Iceberg tables.

Automatic file layout and partitioning makes your life easier and your queries faster

A key feature that will be coming out very soon (in Q1 2026) within Qlik's Adaptive Iceberg Optimizer is Adaptive Clustering, which is designed to simplify and accelerate the creation and management of Iceberg-based lakehouses optimally, even for previously unseen data.

With Adaptive Clustering, you will no longer need to worry about partitioning, as it figures out the right data layout for your tables, delivering better read and write performance compared to manually tuned and partitioned tables. This makes it easy for any user to create and load new Iceberg tables without needing prior knowledge of the data layout or planning and testing different partitioning strategies. Using Adaptive Clustering, you will be able to onboard new datasets and make them available to analytics and data science users in minutes and without data engineering resources.

Cluster keys that you choose will be used to dynamically partition or cluster rows based on characteristics of your data such as density, cardinality and skew. Once applied, benefits of the optimized clusters or partitions apply across all query engines. Adaptive Clustering additionally provides flexibility to redefine clustering keys without rewriting existing files, allowing you to evolve table layout over time to meet users' query needs.

Adaptive Clustering on Iceberg tables will be able to deliver performance boost out-of-the-box on any new or existing lakehouse deployment, dramatically reducing the total number of partitions and files, compared to manually tuned partitioned tables, resulting in significantly faster query performance.

Get started today

You can get started today by exploring Qlik Open Lakehouse to learn more. Request a demo how Adaptive Iceberg Optimizer can work for you.

Learn More:

Blog: Qlik Open Lakehouse – Now Generally Available

Technical Documentation for Qlik Open Lakehouse

Product Webpage – Qlik Open Lakehouse

Blog: Basics of Apache Iceberg

The Iceberg Data Lakehouse Stack: Choosing the Right Building Blocks

Ready to get started?

Request a Demo

Why Qlik?

Agentic AI

Data Integration and Quality

Qlik Cloud Analytics

Find a partner

Global System Integrators