Don’t Take My Data Warehouse Advice, Take Snowflake’s

Snowflake recently released an eBook that contained some very practical advice for every data warehouse practitioner

I’ve been involved with data warehousing in one way or another for over 15 years, and, in that time, earned a reputation for knowing a bit about the subject. I hesitate to call myself an “expert,” but folks often seek out my thoughts on data warehousing. I’m always happy to oblige. That being said, the most rewarding experience is when I read other expert advice from around the industry.

Indeed, one of my recent discoveries was “5 Best Practices for Data Warehouse Deployment” by Kent Graziano, Chief Tech Evangelist at Snowflake, one of Qlik’s industry partners. Kent offers some great tips that are data warehouse agnostic, so I encourage you to download it and take a read.

Those five data warehouse best practices, as laid out in the eBook, align in many ways with what we do here at Qlik. I explore and expand upon them in this blog post, taking key quotes from Kent about those practices and providing my take on each.

1. Create a Data Model

“The first key step in data warehouse development is to create a data model [….] Having a data model gives you a way to document the data sets that will be incorporated into the data warehouse, the relationship between those data sets, and the business requirements of the data warehouse.”


I couldn’t agree more with this best practice. It takes a very brave or foolish person to embark on a data warehouse project without a data model, even though it’s certainly possible. However, it’s completely understandable why many proceed without it. At first, the amount of data and number tables are manageable, and modeling seems like overkill. Also, many practitioners feel that they don’t have time or expertise to utilize standardized architectural approaches, such as Kimball, Inmon or Data Vault. Finally, many enterprises simply lack the appropriate tools and support. Regardless of the reason, data modeling will make your warehouse easier to maintain, more efficient to operate and improve the collaboration between data warehouse implementors and consumers alike.

2. Adopt an Agile Data Warehouse Methodology

“In the past, data warehouse creation was a large, monolithic, multi-quarter (or multi-year) effort, subject to the traditional “waterfall” process. In the modern age, that’s no longer the norm as many organizations are choosing to adopt a more flexible and iterative, or Agile, design approach.”


I find myself, once again, violently nodding my head in agreement with Kent. We simply don’t have the luxury of time anymore for traditional data warehouse techniques. In fact, software developers have known this for years, and you’d be hard pressed to find a dev team that doesn’t use agile techniques today. However, it’s also fair to admit that the data warehousing community has some way to go — we haven’t reached the sophistication of making it a standard practice to use blue-green deployment techniques, as supported by continuous integration/continuous delivery (CI/CD) tooling. (If you want to read more of my thoughts about this point, read my book co-authored with Microsoft: “Data Warehouse Automation in Azure for Dummies”.)

3. Favor ELT over ETL

“In the past, data warehousing development took an extract-transform-load (ETL) approach, extracting the data to be imported into the data warehouse from the source systems, cleaning it or applying business rules to it on an external server, and then loading it into the target data warehouse. Increased data warehousing computing power and capabilities have yielded a new preferred approach: extract-load-transform (ELT).”


Oh, I have so much to add to this subject. First, ELT has become the de facto standard for data loading, as the huge volumes of data we now require are often too big for ETL batch windows (i.e., the nightly 8-hour update is just not long enough to load updates).

Second, ELT is faster to load data because no transformation processes occur during the update. Raw data is simply replicated, as is, to “landing” tables in the warehouse, which are then “transformed” by other means later in the data pipeline process. In contrast, ETL joins disparate tables before loading, thereby consuming additional time and computer resources.

Finally, ELT represents the philosophy of change data capture (CDC) solutions, such as Qlik Replicate. When data warehouses use CDC, then the landing zones are continuously ingesting small data updates in real-time, thus ensuring that the data warehouse always contains the most current information.

4. Adopt a Data Warehouse Automation Tool

“The goal of the data warehouse is to activate and deliver data more quickly so it can inform business decisions and drive greater value. One way to increase speed of delivery is to adopt the Agile methodology. Another is to adopt automation tools that can help develop and deploy code more quickly.”


Agile data warehousing requires more than just a change in manual process. It also requires a new toolchain that can deliver productivity gains by automating manual, mundane and repetitive tasks. The areas ripe for automation are: (1) data modeling, (2) data warehouse table creation, (3) data mart creation, (4) documentation, and (5) data ingest and update. By automating one, some or all those activities will pay efficiency and productivity dividends that will increase data warehousing agility. Luckily, for us, Qlik Compose automates all those tasks and offers extra functionality to manage data warehouse operational activities, too.

5. Train Your Staff on New Approaches

“A move to the Agile methodology or automated code development isn’t just a shift in skill sets—it’s a shift in mindset. Training and education are required to ensure a data warehouse team is leveraging these new approaches and technologies effectively.”


There’s an old joke that goes something like this: CEO “What if we train our people and they leave?” IT Manager “What if we don’t and they stay?” This dilemma clearly highlights that moving to an agile data warehousing model doesn’t just mean implementing new tools and technology, but also requires an investment in people and skills. Only then will you achieve the goal of timelier data delivery that helps business drive insights.

Next Steps

Kent certainly had some great advice for data warehouse practitioners, and I hope my comments added extra insights to the conversation. Nevertheless, my last piece of advice is really an appeal. If your data warehouse is struggling to keep up with business demands, requiring too many resources or failing to deliver business value, then please consider rethinking your strategy. Maybe it’s time to modernize your approach. But, whatever you decide, don’t take my advice, take Snowflake’s.


@cbearman shares his thoughts on a recent eBook by @KentGraziano @SnowflakeDB in his latest blog post on #datawarehouse best practices. Read his post to learn more.

 

In this article:

Get ready to transform your entire business with data.

Follow Qlik