We Don’t Need a Data Warehouse. We Have Salesforce!

The Salesforce Customer Relationship Management (CRM) solution has been the at the center of practically every business for well over a decade. But, while it’s perfectly suited to improve operational efficiency, many discover that it’s challenging to use for multi-dimensional data analysis.

This blog post describes how to bridge that gap by augmenting your Salesforce footprint with a modern cloud data warehouse.

Salesforce at the Center of Your Business

Salesforce is undoubtedly the pioneer in software as a service and you’d be hard pressed to find a business today that doesn’t use it. In addition, current Salesforce offerings have expanded beyond the traditional CRM roots to encompass virtually every aspect of the customer interaction journey, as the diagram below illustrates.

However, not every company uses all the Salesforce modules to run their business, and it’s not uncommon to find a mix of differing solutions. In that event, conventional wisdom suggests that every system feeds customer data into Salesforce, making the platform “the center of gravity” for operational insight.

Consequently, it’s relatively easy to use Salesforce to answer common operational questions:

  • “What’s the European sales forecast for the current quarter?”
  • “How many products were sold this month?”
  • “Which customers have open customer support tickets?”
  • “How many leads did a marketing campaign generate?”

That's Data-driven. Where's the Problem?

I know what you’re thinking: “What’s the problem if all my business data is stored in Salesforce?

Several issues immediately come to mind.

Firstly, not all your business data belongs or needs to live in Salesforce. Replicating unnecessary data without a specific business reason will just bloat the size of your Salesforce instance and increase storage costs.

Secondly, Salesforce is an application that’s optimized for transactions and not for analysis. Each “database table” is an object with fields and reference data. An “Account” object, for example, can have many with references to other objects. Consequently, performing complex analyses can be extremely difficult and slow to execute.

Finally, the Salesforce data model doesn’t always contain the complete record history, which means that multi-dimensional historical reporting is virtually impossible. In dimensional modeling, numerical and text data are stored as facts and attributes in a star schema pattern. Each dimension provides a different perspective or axis on the transactional data.

For example, suppose we decide to record sales transactions from a retail store. In the Salesforce model, each transaction conducted at the register is logged as either a sale or a return “Order” object. This object may identify the product, customer, account, and date of the transaction. In a dimensional model, you can easily examine all the transactions by product, customer, date or some combination of them. Product, customer and date are considered dimensions of the transaction data.

In summary, it’s really the age-old discussion of when it’s appropriate to use online transaction processing (OLTP) and when to use online analytics processing (OLAP).

Augment Your Warehouse with Salesforce Data

There’s never been a better time to flip the script and put the data warehouse at the center of your decision making. This is because there are several great cloud data warehouses offerings to choose from. Solutions, such as Amazon Redshift, Azure Synapse, Google BigQuery, and Snowflake Cloud Data Platform, are all elastic, pay-as-you-go, self-service options.

So, now you can just import the data that your teams require to do their jobs into Salesforce, while letting related analytics systems handle specialty reporting.

Data Mapping Methods

Many practitioners will tell you that the Salesforce data model is extremely flexible and easy to customize. However, every company which implements a warehouse often defines a data model to suit their entire organizational reporting needs. As a result, it’s much more efficient to add Salesforce data to your organizational data model than the other way around (i.e., the black lines in the above diagram).

Syncing Salesforce with the Data Warehouse

Filling and updating the data warehouse with Salesforce data requires several phases. The first phase is called the “initial load,” which extracts the bulk of Salesforce data and lands it into the warehouse. The second phase focuses on keeping the data warehouse current with incremental Salesforce changes.

Initial Load

Not surprisingly, there are many options to load a data warehouse with Salesforce data. The most common are as follows:

  1. Manual CSV extract, and load – This is the most basic and manual technique that uses the Salesforce data loader to export objects to a CSV file.
  2. Bulk API – If you have a developer handy who understands the Salesforce Bulk API, then this is another option. The developer writes queries that return Salesforce data sets in a JSON or XML format that can then be manually or programmatically loaded into the data warehouse.
  3. ETL – ETL tools vary greatly in degrees of reliability, functionality, scalability, usability and price. However, they all work in a batch mode and will require target tables to be present in the data warehouse before successful loading.
  4. CDC – Many change data capture (CDC) solutions will provide an “initial load” function that generates the DDL for in the Salesforce objects, creates the tables in the data warehouse and loads the initial Salesforce data sets.

Incremental Updates

We should not be shocked to discover that there’s also a variety of methods to keep the warehouse fed with current data. They are as follows:

  1. Batch ETL – This procedure executes an ETL process on a scheduled basis looking for changes to the Salesforce objects since the process was last run.
  2. “Salesforce Events” – The simplest event is a “workflow rule” that can be triggered when an object is created, deleted or updated. The rules can then call an outbound message configured to update an external data warehouse via an Heroku application.
  3. CDC Streaming – For the past couple of years, Salesforce has offered a native CDC option that works in a similar manner to workflow rules. Salesforce publishes change events that can be synchronized in real-time to external data stores.

CDC has the advantage that external systems can be updated in real-time, keeping target data fresh.

Using Salesforce CDC with Qlik Replicate

Arguably the best data movement method is to use Qlik Replicate with Salesforce CDC because this enables you to stream data changes between a wide variety of heterogeneous databases, data lakes and data warehouse platforms, not just Salesforce. Also, Qilk Replicate’s agentless solution moves your data easily and efficiently without impacting the performance of production data systems. Read the official documentation for further details on how to configure Qlik Replicate and Salesforce CDC.

Data Warehouse Automation with the Qlik Data Integration Platform

Another important issue to consider is warehouse automation. Data Warehouse Automation (DWA) accelerates and simplifies the data warehouse lifecycle for faster time to insights.

While the scope of DWA goes beyond this blog post, let me just mention that the Qlik Data Integration platform automates the entire data warehouse lifecycle to accelerate the availability of analytics-ready data. Our model-driven approach helps data engineers deliver on the promise of agile data warehouses by automating every step of the data warehouse pipeline from data modeling and real-time ingestion, to data marts creation and governed data consumption.

In short, by combining Salesforce data with DWA, you can improve your data warehouse accuracy and efficiency, which can deliver demonstrable growth to your business.

To learn more about DWA, click here.

Our own @cbearman discusses augmenting your Salesforce footprint with a modern cloud #datawarehouse. Read his latest blog post.


In this article:

Keep up with the latest insights to drive the most value from your data.

Get ready to transform your entire business with data.

Follow Qlik