We previously mentioned the Data Mesh concept first introduced by Zhamak Dehghani in her seminal 2019 blog post. Zhamak’s contention is that we need a new approach to data management because of the failures of centralized data warehouses and lakes. She later revised her thinking in a subsequent blog post and postulated that four principles of her Data Mesh approach constitute a new data management paradigm.
In summary, Data Mesh creates a foundation for deriving value from analytical data at scale while the data landscape, use cases and responses are constantly changing. This is achieved by adhering to four underpinning principles as follows:
Figure 1. Data Mesh Attributes
- Domain-oriented – A decentralized data ownership and architecture that allows the autonomous nodes on the mesh to grow as the number of data sources, number of use cases, and diversity of data access models increases.
- Data-as-a-product – Data-as-a-product is the new unit of architecture that should be built, deployed, and maintained as a single quantum. Ensures that data consumers can easily discover, understand, and securely use high-quality data from across many domains.
- Self-service data infrastructure – Lets domain teams autonomously create and consume data products. The infrastructure platform uses abstractions to hide the underlying complexity of building, executing and maintaining secure and interoperable data products.
- Federated governance – Global governance and interoperability standards are baked into the mesh ecosystem, helping data consumers derive value from aggregating and correlating independent data products within the mesh ecosystem.
These principles combine to form a decentralized and distributed data mesh where domain data product owners leverage common data infrastructure via self-service, to develop pipelines that share data in a governed and open manner.
The data mesh concept differs from the past paradigms where pipelines (code) are managed as independent components from the data they produce. The storage infrastructure of past paradigms, like an instance of a warehouse or a lake, was also often shared among many datasets. The data product, however, is a composition of all components - code, data and infrastructure at the granularity of a data domain’s bounded context as shown below.
Having had a brief introduction to differing architectural
philosophies, let’s compare them side-by-side.
In my opinion, a data hub is the only concept that is a true architecture, as it defines a topology. The data fabric idea builds on many concepts introduced by the data hub, but really defines a stack of technologies with which to augment your data architectures. Therefore, a data fabric is a technology-enabled implementation capable of many outputs, only one of which is data products. This neatly brings us to the third concept, Data Mesh. In my opinion, we’re in the early days of defining Data Mesh. Consequently, I believe it’s a concept with very sound principles, but the market is only now catching up with infrastructure and tooling solutions to enable enterprise implementation.
Many folks consider the data market as mature, staid and lacking innovation, but the recent data architecture debate contradicts that conventional wisdom. In fact, the data market has never been more vibrant and dynamic, driven by an explosion of new data types, a growth in consumption patterns, and the renewed focus on data pipeline as a core construct for architectures, such as data hubs; technology stacks, such as a data fabric; and emerging concepts like the data mesh. However, one thing is clear: The data architecture debate is far from over. And, we’re once again, just getting started.
Join the conversation and let me know your thoughts. Do you agree or disagree with my assessment?