Strategic change data capture (CDC) technology should be a comprehensive solution that addresses many business requirements and can be applied to solve a variety of real time and on-demand initiatives that can substantially improve return on investment. The following are just some examples of how a CDC/ETL combination can be implemented to solve real business problems:
- Business Intelligence (BI) and Data Warehousing
- Building Operational Data Stores (ODS)
- Real time Dashboards
- Data Propagation
- Data Synchronization
- Data Quality
BI and Data Warehousing
As described in my previous post, a common case for using CDC is in conjunction with ETL tools such as SSIS for faster more efficient data extract in data warehouse implementations. Traditional ETL processes require that the operational system(s) be taken off line for a given period of time. This period of time is referred to as a “batch window,” typically measured in hours and sometimes days, during which the system is busy with moving the data and cannot perform operational and other mission critical functions. Given the limitation of this “bulk” approach, most IT shops update their data warehouse only daily, and often on a weekly basis, meaning the data on which business decisions are being made can be up to a day or a week old.
Competitive pressures nowadays demand much more up-to-the-minute information, so companies now look for ways to update their data warehouse faster, and reduce transfer latency as much as possible. CDC is one such approach, and the leading vendors of the technology work seamlessly with ETL and EAI tools like SSIS or BizTalk Server respectively. The following diagram illustrates what a CDC-based solution used in conjunction with SSIS (or BizTalk) looks like:
CDC delivers changes to an ETL or EAI tool in batch or real time. This allows dramatic improvements in the efficiency of the entire process, reduces or totally eliminates batch windows, delivers information with low latency, and reduces the associated costs including CPU cycles, storage, network bandwidth and human resources.
Building Operational Data Stores (ODS)
An ODS is an integrated data repository addressing a specific business area (marketing, finance, support and maintenance, etc.) and providing complete and current information that can be used by business users and BI applications. The ODS is stored in a relational database and receives updates from a dedicated set of programs, ETL or EAI tools. Given the business need fulfilled by an operational data store, it requires timely updates. CDC provides an efficient mechanism to keep an ODS up-to-date, by identifying and delivering the changes to the ETL or EAI tool on a continuous basis, rather than periodically querying the entire database for changes. In addition, CDC can push changes in near real time to support ODS applications that have very low latency requirements.
Real Time Dashboards
Dashboards provide managers with selected metrics, known as Key Performance Indicators (KPI), that measure the performance of various business operations. These metrics can represent sales trends, margin tracking, financial triggers and others that alert the business user to a business condition requiring attention. Dashboards however, are only as good as the metrics they provide and the process of updating these metrics is typically done at the data integration level. Metrics that are updated infrequently mean that the user might become aware of a business problem too late. How far would you get with a fuel gauge that is updated every 2 hours?
Implementing CDC provides a method of identifying changes to different data sources that are required in order to measure a certain KPI. Ideally the CDC solution should come with data filters that can process data changes in a given order and to re-calculate new values of the KPI close to the time of the change. This in turn would provide greater, more timely visibility to the business user resulting in much faster and more effective response to changing business conditions.
Data propagation addresses the need to have one or more copies of the data from a given data source. Common examples include, making production data available for reporting purposes and accessible by various departments; distributing data from a source system to multiple data centers, and often to multiple ODS to improve response time.
With CDC, the process of propagating data can be made much more efficient and reduce the latency in making new data available.
Data synchronization is typically required in order to keep two or more systems in sync and up to date. A common reason for data synchronization is system migration (as a result of business consolidation; downsizing, etc…) where running two systems in parallel for a period of time, sometimes for several years, is required. During that period, transactions that are captured by one system need to be updated in the other as well. Another common reason for data synchronization is a merger or acquisition where systems used by one company overlap with others, and both needs to be kept up to date.
CDC captures changes as they occur and processes them using tools such as SSIS and BizTalk Server. By incorporating CDC, data synchronization can be made efficient and real time.\
Many BI initiatives fail because of poor data quality. As a result, organizations are looking for ways to improve the quality of data available to business users. In some cases, data is cleansed as part of the process of creating a new data store (e.g. DW). In others, processes are put in place to clean up the source data. With CDC, companies can identify and capture changes to source systems as they happen and immediately feed them into data quality processes. Such processes can then apply quality rules and recommend whether clean-up activities are required. This shortens the time for ‘bad’ data to reside in the system.
CDC can be used for many types of initiatives, and enterprises should take a strategic view of CDC to make sure that it can address current and future requirements. In my next post, I will provide an overview of key capabilities and functions required in a strategic CDC solution, and ones that you should look for when evaluating any CDC technology.
To learn more visit Change Data Capture CDC for ETL