Nothing to See Here, Move Along
“OK, boomer!” I hear you shout. “Data warehouse methodologies are old hat, Daddio!” In fact, I heard much of the same pushback from many prospects. My favorite responses were as follows:
- “We don’t have time for that. We need to build the warehouse today.”
- “A data warehouse methodology is too complicated.”
- “Data warehouse design is too slow.”
- “We don’t have that sort of expertise.”
- “We’re only doing a quick project. That’s too bloated.”
- “We’ll just write SQL for ourselves.”
At first glance, much of these reasons seem fair. After all,
legacy data warehouses that relied on methodologies have a less-than-stellar
reputation for being responsive to business needs – especially when you compare
them to today’s cloud offerings. So, it makes sense to cut out the middleman
and pipe the data straight from the source to the cloud. Also, much of the
appeal of a cloud data warehouse is that it is infinitely more flexible than a traditional
solution. So, what’s the problem?
Getting Out Over Our Skis
It’s true: Cloud data warehouses offer much greater flexibility, because they scale compute and storage to match demand. They are also extremely easy to adopt. In many instances, you can just slide a credit card to get started and provision a data warehouse with a few simple clicks of a mouse. However, it’s this seductive simplicity that encourages us to cut corners, leading us to build and fill multiple data warehouses without a thought.
The more warehouses we build, the more tables, fields and data we need to manage. As our warehouses grow, our focus shifts to wrangling the SQL that manages the data. As a result, what began as a few simple scripts balloons to a mass of spaghetti code. The quick and nimble cloud data warehouse is now difficult to change, and we’ve come full circle.
A Framework For Agility
This is where a data warehouse design methodology can help. The three most common design methodologies in use today are as follows:
1. Inmon – Corporate Information Factory
- Bill Inmon’s top-down approach models the data warehouse
in a hub and spoke pattern, where entities and relationships are in third
normal form (3NF). Data marts are created as needed for reporting and
2. Kimball – Dimensional Modelling
- Ralph Kimball’s bottom-up approach starts with
the metrics and measurements for reporting and creates marts as star schemas. Dimensional
modelling optimizes the data warehouse for fast retrieval of data.
3. Linstedt – Data Vault
- Dan Linstedt’s Data Vault architecture is a
hybrid approach that combines the best of 3NF and dimension modeling. This
technique enables historical storage of data, integration of data from
different operational systems, and lineage tracing of incoming data. The Data
Vault approach is based on the concept of Hubs, Links and Satellites.
Although a detailed description of each methodology goes beyond the scope of this blog post, it’s important to recognize the benefits of employing any of the methods with a cloud data warehouse. The chief benefit of using a warehouse design methodology is that it can be automated with tooling. Most automation solutions actually generate the SQL code required to implement the methodology and provide a framework for the entire data warehouse lifecycle. Ultimately, this can save you hours of SQL coding and reduce code management pain, which means you can spend more time on designing and delivering the data for your analytics projects.
So, the next time you’re looking to implement a cloud data warehouse like Snowflake, Amazon Redshift, Google BigQuery or Azure Synapse, don’t be tempted to skip the design methodology. You won’t regret it!
my free eBook, “Data Warehouse Automation in Azure for Dummies,” for more information. You can also learn more about data warehouse automation by clicking here.