Indeed, one of my recent discoveries was “5 Best Practices for Data Warehouse
Deployment” by Kent Graziano, Chief Tech Evangelist at Snowflake, one of
Qlik’s industry partners. Kent offers some great tips that are data warehouse
agnostic, so I encourage you to
download it and take a read.
Those five data warehouse best practices, as laid out in the eBook, align in
many ways with what we do here at Qlik.
I explore and expand upon them in this blog post, taking key quotes from Kent
about those practices and providing my take on each.
1. Create a Data Model
“The first key step in data warehouse development is to create a data
model [….] Having a data model gives you a way to document the data sets
that will be incorporated into the data warehouse, the relationship
between those data sets, and the business requirements of the data
I couldn’t agree more with this best practice. It takes a very brave or
foolish person to embark on a data warehouse project without a data model,
even though it’s certainly possible. However, it’s completely understandable
why many proceed without it. At first, the amount of data and number tables
are manageable, and modeling seems like overkill. Also, many practitioners
feel that they don’t have time or expertise to utilize standardized
architectural approaches, such as Kimball, Inmon or Data Vault. Finally,
many enterprises simply lack the appropriate tools and support. Regardless
of the reason, data modeling will make your warehouse easier to maintain,
more efficient to operate and improve the collaboration between data
warehouse implementors and consumers alike.
2. Adopt an Agile Data Warehouse Methodology
“In the past, data warehouse creation was a large, monolithic,
multi-quarter (or multi-year) effort, subject to the traditional
“waterfall” process. In the modern age, that’s no longer the norm as many
organizations are choosing to adopt a more flexible and iterative, or
Agile, design approach.”
I find myself, once again, violently nodding my head in agreement with Kent.
We simply don’t have the luxury of time anymore for traditional data
warehouse techniques. In fact, software developers have known this for
years, and you’d be hard pressed to find a dev team that doesn’t use agile
techniques today. However, it’s also fair to admit that the data warehousing
community has some way to go — we haven’t reached the sophistication of
making it a standard practice to use blue-green deployment techniques, as
supported by continuous integration/continuous delivery (CI/CD) tooling. (If
you want to read more of my thoughts about this point, read my book
co-authored with Microsoft: “Data Warehouse Automation in Azure for Dummies”.)
3. Favor ELT over ETL
“In the past, data warehousing development took an extract-transform-load
(ETL) approach, extracting the data to be imported into the data warehouse
from the source systems, cleaning it or applying business rules to it on
an external server, and then loading it into the target data warehouse.
Increased data warehousing computing power and capabilities have yielded a
new preferred approach: extract-load-transform (ELT).”
Oh, I have so much to add to this subject. First, ELT has become the de
facto standard for data loading, as the huge volumes of data we now require
are often too big for ETL batch windows (i.e., the nightly 8-hour update is
just not long enough to load updates).
Second, ELT is faster to
load data because no transformation processes occur during the update. Raw
data is simply replicated, as is, to “landing” tables in the warehouse,
which are then “transformed” by other means later in the data pipeline
process. In contrast, ETL joins disparate tables before loading, thereby
consuming additional time and computer resources.
represents the philosophy of change data capture (CDC) solutions, such as
Qlik Replicate. When data warehouses use CDC, then the landing zones are
continuously ingesting small data updates in real-time, thus ensuring that
the data warehouse always contains the most current information.
4. Adopt a Data Warehouse Automation Tool
“The goal of the data warehouse is to activate and deliver data more
quickly so it can inform business decisions and drive greater value. One
way to increase speed of delivery is to adopt the Agile methodology.
Another is to adopt automation tools that can help develop and deploy code
Agile data warehousing requires more than just a change in manual process.
It also requires a new toolchain that can deliver productivity gains by
automating manual, mundane and repetitive tasks. The areas ripe for
automation are: (1) data modeling, (2) data warehouse table creation, (3)
data mart creation, (4) documentation, and (5) data ingest and update. By
automating one, some or all those activities will pay efficiency and
productivity dividends that will increase data warehousing agility. Luckily,
for us, Qlik Compose automates all those tasks and offers extra
functionality to manage data warehouse operational activities, too.
5. Train Your Staff on New Approaches
“A move to the Agile methodology or automated code development isn’t
just a shift in skill sets—it’s a shift in mindset. Training and education
are required to ensure a data warehouse team is leveraging these new
approaches and technologies effectively.”
There’s an old joke that goes something like this: CEO “What if we train our
people and they leave?” IT Manager “What if we don’t and they stay?” This
dilemma clearly highlights that moving to an agile data warehousing model
doesn’t just mean implementing new tools and technology, but also requires
an investment in people and skills. Only then will you achieve the goal of
timelier data delivery that helps business drive insights.
Kent certainly had some great advice for data warehouse
practitioners, and I hope my comments added extra insights to the
conversation. Nevertheless, my last piece of advice is really an appeal. If
your data warehouse is struggling to keep up with business demands,
requiring too many resources or failing to deliver business value, then
please consider rethinking your strategy. Maybe it’s time to modernize your
approach. But, whatever you decide, don’t take my advice, take