But worse than no data, is bad or incomplete data. Using bad or incomplete data gives people confidence to make bold decisions only to find out later they were bad decisions.
The challenge is data in its raw form is often unusable. We’ve all heard the adage “garbage in, garbage out.” If you’ve ever tried to build analytics on raw data, you know that even the most structured ERP systems allow a fair amount of “garbage in!” Even when using data from a data warehouse, some companies assume the data is already clean, which is often not the case, or if it is clean, assume it needs to be combined with dirty data from other sources.
Therefore, a very important step to data integration is – what people in our business call – “transformation.” As part of the Extract/Transform/Load (ETL) process, the “transform” step is where you put in the smarts to recognize that ‘Atlnta’ and ‘Atnalta’ mean ‘Atlanta.’ You may also want to enrich the data to recognize latitude and longitude for ‘Atlanta’ to plot data on a map. Now, apply that logic to multiple fields in the same table, or across multiple tables, and you have scripted potentially hundreds of instances in multiple transformation scripts. But, after seeing the analytics, the user finds someone using ‘Atl’ and realizes that should be included as well. Imagine the nightmare of having to go into each of the hundreds of instances to modify the logic every time they find a new way to misspell ‘Atlanta.’ As well, based on the size of the data and/or need for frequent refreshes, you might need to utilize incremental loading, making the logic even more complex to maintain.
In talking to Jeff Robbins and Larry Aaron, Qlik Consultants who have developed hundreds of ETL scripts for Qlik customers, this is a pretty common scenario. Jeff, like many of our consultants, became tired of developing scripts field-by-field and table-by-table, so he built his own utility to aid in future ETL work. He calls this utility the “Structured ETL Engine for Qlik,” or SEEQ for short. Using a rules-based approach, he sets up the transform logic in an externalized table, then uses the same rules table every time the logic is needed. So, given my example above, instead of taking hours and hours to update hundreds of instances, he could make one small change in the rules table which then ensured every instance was correctly transformed in the next data refresh. No scripting, no fuss!