This blog post describes how to bridge that gap by augmenting
your Salesforce footprint with a modern cloud data warehouse.
Salesforce at the Center of Your Business
Salesforce is undoubtedly the pioneer in software as a
service and you’d be hard pressed to find a business today that doesn’t use it.
In addition, current Salesforce offerings have expanded beyond the traditional
CRM roots to encompass virtually every aspect of the customer interaction journey,
as the diagram below illustrates.
However, not every company uses all the Salesforce modules
to run their business, and it’s not uncommon to find a mix of differing
solutions. In that event, conventional wisdom suggests that every system feeds
customer data into Salesforce, making the platform “the center of gravity” for
Consequently, it’s relatively easy to use Salesforce to answer common operational questions:
- “What’s the European sales forecast for the current quarter?”
- “How many products were sold this month?”
- “Which customers have open customer support tickets?”
- “How many leads did a marketing campaign generate?”
That's Data-driven. Where's the Problem?
I know what you’re thinking: “What’s the problem if all my business data is stored in Salesforce?”
Several issues immediately come to mind.
Firstly, not all your business data belongs or needs to live in Salesforce. Replicating unnecessary data without a specific business reason will just bloat the size of your Salesforce instance and increase storage costs.
Secondly, Salesforce is an application that’s optimized for transactions and not for analysis. Each “database table” is an object with fields and reference data. An “Account” object, for example, can have many with references to other objects. Consequently, performing complex analyses can be extremely difficult and slow to execute.
Finally, the Salesforce data model doesn’t always contain the complete record history, which means that multi-dimensional historical reporting is virtually impossible. In dimensional modeling, numerical and text data are stored as facts and attributes in a star schema pattern. Each dimension provides a different perspective or axis on the transactional data.
For example, suppose we decide to record sales transactions from a retail store. In the Salesforce model, each transaction conducted at the register is logged as either a sale or a return “Order” object. This object may identify the product, customer, account, and date of the transaction. In a dimensional model, you can easily examine all the transactions by product, customer, date or some combination of them. Product, customer and date are considered dimensions of the transaction data.
In summary, it’s really the age-old discussion of when it’s appropriate to use online transaction processing (OLTP) and when to use online analytics processing (OLAP).
Augment Your Warehouse with Salesforce Data
There’s never been a better time to flip the script and put the data warehouse at the center of your decision making. This is because there are several great cloud data warehouses offerings to choose from. Solutions, such as Amazon Redshift, Azure Synapse, Google BigQuery, and Snowflake Cloud Data Platform, are all elastic, pay-as-you-go, self-service options.
So, now you can just import the data that your teams require to do their jobs into Salesforce, while letting related analytics systems handle specialty reporting.
Data Mapping Methods
Many practitioners will tell you that the Salesforce data
model is extremely flexible and easy to customize. However, every company which
implements a warehouse often defines a data model to suit their entire organizational
reporting needs. As a result, it’s much more efficient to add Salesforce data
to your organizational data model than the other way around (i.e., the black
lines in the above diagram).
Syncing Salesforce with the Data Warehouse
Filling and updating the data warehouse with Salesforce data
requires several phases. The first phase is called the “initial load,” which
extracts the bulk of Salesforce data and lands it into the warehouse. The
second phase focuses on keeping the data warehouse current with incremental Salesforce
Not surprisingly, there are many options to load a data
warehouse with Salesforce data. The most common are as follows:
- Manual CSV extract, and load – This is the most basic and manual technique that uses the Salesforce data loader to export objects to a CSV file.
- Bulk API – If you have a developer handy who understands the Salesforce Bulk API, then this is another option. The developer writes queries that return Salesforce data sets in a JSON or XML format that can then be manually or programmatically loaded into the data warehouse.
- ETL – ETL tools vary greatly in degrees of reliability, functionality, scalability, usability and price. However, they all work in a batch mode and will require target tables to be present in the data warehouse before successful loading.
- CDC – Many change data capture (CDC) solutions will provide an “initial load” function that generates the DDL for in the Salesforce objects, creates the tables in the data warehouse and loads the initial Salesforce data sets.
We should not be shocked to discover that there’s also a variety
of methods to keep the warehouse fed with current data. They are as follows:
- Batch ETL – This procedure executes an ETL process on a scheduled basis looking for changes to the Salesforce objects since the process was last run.
- “Salesforce Events” – The simplest event is a “workflow rule” that can be triggered when an object is created, deleted or updated. The rules can then call an outbound message configured to update an external data warehouse via an Heroku application.
- CDC Streaming – For the past couple of years, Salesforce has offered a native CDC option that works in a similar manner to workflow rules. Salesforce publishes change events that can be synchronized in real-time to external data stores.
CDC has the advantage that external
systems can be updated in real-time, keeping target data fresh.
Using Salesforce CDC with Qlik Replicate
Arguably the best data movement method is to use Qlik Replicate with
Salesforce CDC because this enables you to stream data changes between a wide
variety of heterogeneous databases, data lakes and data warehouse platforms,
not just Salesforce. Also, Qilk Replicate’s agentless solution moves your data
easily and efficiently without impacting the performance of production data
systems. Read the official documentation
for further details on how to configure Qlik Replicate and Salesforce CDC.
Data Warehouse Automation with the Qlik Data Integration Platform
Another important issue to consider is warehouse automation. Data Warehouse Automation (DWA) accelerates and simplifies the data warehouse lifecycle for faster time to insights.
While the scope of DWA goes beyond this blog post, let me just mention that the Qlik Data Integration platform automates the entire data warehouse lifecycle to accelerate the availability of analytics-ready data. Our model-driven approach helps data engineers deliver on the promise of agile data warehouses by automating every step of the data warehouse pipeline from data modeling and real-time ingestion, to data marts creation and governed data consumption.
In short, by combining Salesforce data with DWA, you can improve your data warehouse accuracy and efficiency, which can deliver demonstrable growth to your business.
To learn more about DWA, click here.