It started off with a simple survey, and the intentions behind it were good. To improve the customer experience, decrease churn, and boost revenue, a certain movie and TV streaming company attempted to collect data on consumer preferences.
They asked customers what types of programming they wanted to see and worked to offer them more of their desired content.
Sound logic, but it backfired.
Viewers didn’t end up watching what they stated in the survey and instead reverted to watching the same old same old. What happened? Their actions spoke much louder than their words. While customers may have aspired to watch all types of superior programming — documentaries, biopics, and Emmy nominations — their best intentions were often subverted by their desire for the familiar, feel-good content they had always watched.
I remember a similar scenario in one of my MBA statistics courses, when one of my classmates shared the results of her regression project. She had asked her fellow classmates to rank the traits they found most important in dog breeds (e.g., loyalty, affection, size). She then asked us to rank our preferred breeds of dogs based on descriptions and pictures. Turns out the traits we thought were most important in a dog were the exact opposite of the actual breed we preferred — in fact, the two rankings were nearly inverted. So, what happened?
We consciously chose qualities in our pets we thought we should choose, like cuddliness and cuteness. But when it came time to select from combinations, and the likelihood of finding the perfect dog became less of a reality, what traits were actually most important became readily apparent. (For those wondering, Pomeranians are my dog of choice.)
In theory, these data collectors are getting their data from the best source — straight from the users themselves. But in both cases, it backfired. I assure you, it’s not just limited to these two examples. I’m talking about self-reported data, and I’m here to tell you that you should trust big data before you trust self-reported data every time. Every. Time.
In this data-driven world we are conditioned to believe that all data is good data. But self-reported data is especially faulty beyond just the usual margin of error from sampling errors or respondents just randomly completing the questions to get it over with. Usually, self-reported data falls short for three reasons:
The best way to analyze user behavior, preference, satisfaction, or feeling, is to analyze their actions, not their words. Where users browse online, what they click on, what they search for, at what point they abandon your cart, what they buy, what they listen to, what others who bought that item also buy, etc., will tell you a lot more about them than they themselves ever could.
This was not possible a few years ago, but this data and petabytes more can now be consumed by BI applications, predictive analytics tools, and machine learning software to predict user behavior and uncover detailed answers that no survey ever could.
Don’t trust self-reported data, report the data for the user.