In my previous blog posts, I’ve talked about how you can aggregate data depending on the data type, as well as how you can re-express your data to get more value from it. For this post, let’s look at some of the different ways of measuring your data.
Average or Median?
The inspiration for this post comes from a virtual office discussion. The question under consideration was whether an average (mean) value accurately represented the data that we were looking at. And, the answer to that question was the very standard “it depends.” For a data set that is evenly distributed, it usually does. But, for a data set where there are outliers, you also need to look at the median value. Let’s have a look at what an average value can looks like in a skewed data set.
In this data set, we have the salary for employees of a company. As you can see, most people are earning around the same amount. But, there is an outlier: One person, presumably of a higher position in the company, is making substantially more than everyone else.
Without this outlier, we get an average that is somewhere in-between the salaries earned by the rest of the employees; however, with the outlier, we get a much higher average. This average is not a typical salary within the company.
Now, evaluating the data set by locating the median instead, we can look at what could be considered the middle value. This value won’t change much, even if we have the high value (i.e., outlier) within the data set. Adding a very high or a very low value will only shift the median value one step higher or lower.
Raw Data or Normalized Data?
Another topic we discussed was how often something is being used. Just looking at the raw numbers, it can seem like something is very popular. But, once we normalize the data, we can see that may not be the case.
So, what do I mean with this?
Let’s check out an example using hypothetical sales across a couple of countries for a theoretical company. Just looking at the numbers, we can see that that company sells the most to China.
But, is that the full picture? What if we try to normalize the data, for example, by looking at how much people are spending within each country. We can do this by finding out the population of each country.
With this information, we now have a different picture, with Sweden being the standout. It doesn’t have as high a population as China, but, per capita, the sales are better.
By normalizing your data, you get another view, offering a more accurate representation of the true popularity of, say, a given product being sold. That’s why it’s important to consider normalizing data instead of just looking at the raw statistics – you could be overlooking crucial insights.
Where to Go Next?
If you’re interested in this topic, make sure to read the whitepaper Tips to improve your data literacy using Qlik Sense available on Qlik Community.