Not to be outdone, my colleague Ritu Jain recently posted how Qlik’s data integration platform supports Databrick’s vision of a “data lake house” — a concept that combines the best approaches of data warehousing and data lakes for a single source of truth.
Those posts are proof that the data market is incredibly dynamic.
Questions, Questions, Questions
With all this change, it’s only natural to feel a little overwhelmed and uncertain about implementing a data strategy that can stand the test of time and reliably deliver data for analytic insights. However, the decision that seems to pay dividends, regardless of data requirement, is to implement a data warehouse. The data warehouse has several benefits. It can hold your entire data history, regardless of the retention policies of your transactional systems. It can also relieve the real-time query burden from transactional systems, too, by being the authoritative source for your historical and analytic reporting. Moreover, new cost-effective cloud offerings ensure that the data warehouse is now within reach of practically every business. Consequently, the question I’m most frequently asked is “How do I begin architecting a data warehouse?”
In answering, I like to focus on four major architectural questions outlined below and my responses to each.
- What is your approach to designing/building your data warehouse?
- I strongly recommend using a well-understood and documented data warehouse design methodology, such as Inmon, Kimball or Data Vault 2.0, for operational efficiency.
- What is your approach to filling your data warehouse?
- This question gives clues about the type of data and frequency updates that we want for our data warehouse. For example, if you need to analyze the freshest data sets, then perhaps change data capture (CDC) is a good option for continual data ingestion.
- How will the data be consumed?
- This question offers insight as to where the data will be used and in what format. For example, it’s highly likely that a variety of data marts will need to be created for historical reporting.
- How will you manage change?
- This is possibly the greatest challenge of data warehousing. There are primarily two types of change that affect the data warehouse, regardless of the outcome:
- Changes that impact the data warehouse structure. E.g. adding new columns, data sources or creating new data mart tables.
- Changes to the data values contained within the warehouse. E.g. Updating an address table when businesses move locations or open new offices. In the case of a slowly changing dimension, you also need to consider keeping a record of the change. One other often overlooked change is designing a strategy for handling late arriving fact or dimension values due to system error or latency.
- Being able to efficiently handle change will often lead to successful data warehousing outcomes.
The architectural considerations with Microsoft Azure
Synapse for a data warehouse are depicted as follows:
I realize my answers may gloss over some logistics. I also assumed
that you had chosen a data warehouse, assembled a team and performed some
initial requirements gathering. Despite those assumptions, building, filling,
consuming and changing a data warehouse remain the core competencies of
operating a data warehouse.
Answers, Answers, Answers
You might be thinking “Where can I find out more about architecting a data warehouse for the long term?” You’re in luck: We’ve partnered with TDWI and Microsoft to discuss the topic at a webinar, to be held March 11, titled “2020 and Beyond: Architecting Your Data Warehouse for the New Decade.”
Here’s a sneak peak of the webinar’s format and content:
David Stodder, Senior Director of Research at TDWI, will kick off proceedings. He’s going to discuss the current data market trends and the data requirements he’s hearing from his clients.
Matt Basile, Azure Data Program Manager at Microsoft, will follow David and will describe the latest innovations in Azure Synapse Analytics. Incidentally, Matt and I co-authored “Data Warehouse Automation in Azure for Dummies.”
I follow Matt and close out the discussion with the responses to the architectural considerations depicted in the above graphic and implied by my earlier questions. The webinar promises to be a very lively and informative session.
Register today to find out if your data architecture is ready for the challenges of a new decade and new era of data transformation.