Digging in a bit deeper, let’s first discuss six factors that distinguish a data fabric from a standard data integration ecosystem:
- Augmented data catalog. Your data catalog will include and analyze all types of metadata (structural, descriptive, and administrative) in order to provide context to your information.
- Knowledge graph. To help you and the AI/ML algorithms interpret the meaning of your data, you will build and manage a knowledge graph that formally illustrates the relationships between entities in your data (concepts, objects, events, etc.). And it should be enhanced with unified data semantics, which describes the meaning of data components themselves.
- Metadata activation. You will switch from manual (passive) metadata to automatic (active) metadata. Active metadata management leverages machine learning to allow you to create and process metadata at massive scale.
- Recommendation engine. Based on your active metadata, AI/ML algorithms will continuously analyze, learn, and make recommendations and predictions about your data integration and management ecosystem.
- Data prep & ingestion. All common data preparation and delivery approaches will be supported, including the five key patterns of data integration: ETL, ELT, data streaming, application integration, and data virtualization.
- DataOps. Bring your DevOps team together with your data engineers and data scientists to ensure that your fabric supports the needs of both IT and business users.
Also seen on the diagram above, as data is provisioned from sources to consumers, a data fabric brings together data from a wide variety of systems sources across your organization including operational data sources and data repositories such as your warehouse, data lakes, and data marts. This is one reason why data fabric is appropriate for data mesh design.
The data fabric supports the scale of big data for both batch processes and real-time streaming data, and it provides consistent capabilities across your cloud, hybrid multicloud, on premises, and edge devices. It creates fluidity across data environments and provides you a complete, accurate, and up-to-date dataset for analytics, other applications, and business processes. It also reduces time and expense by providing pre-packaged components and connectors to stitch everything together. This way you don’t have to manually code each connection.
Your specific data fabric architecture will depend on your specific data needs and situation. But, according to the research firm Forrester, there are six common layers for modern enterprise data fabrics:
- Data management provides governance and security
- Data ingestion identifies connections between structured and unstructured data
- Data processing extracts only relevant data
- Data orchestration cleans, transforms, and integrates data
- Data discovery identifies new ways to integrate different data sources
- Data access enables users to explore data via analytic and BI tools based access permissions