Average or Median?
The inspiration for this post comes from a virtual office discussion. The question under consideration was whether an average (mean) value accurately represented the data that we were looking at. And, the answer to that question was the very standard “it depends.” For a data set that is evenly distributed, it usually does. But, for a data set where there are outliers, you also need to look at the median value. Let’s have a look at what an average value can looks like in a skewed data set.
In this data set, we have the salary for employees of a company. As you can see, most people are earning around the same amount. But, there is an outlier: One person, presumably of a higher position in the company, is making substantially more than everyone else.
Without this outlier, we get an average that is somewhere
in-between the salaries earned by the rest of the employees; however, with the
outlier, we get a much higher average. This average is not a typical salary
within the company.
Now, evaluating the data set by locating the median instead,
we can look at what could be considered the middle value. This value won’t
change much, even if we have the high value (i.e., outlier) within the data set.
Adding a very high or a very low value will only shift the median value one
step higher or lower.
Raw Data or Normalized Data?
Another topic we discussed was how often something is being used. Just looking at the raw numbers, it can seem like something is very popular. But, once we normalize the data, we can see that may not be the case.
So, what do I mean with this?
Let’s check out an example using hypothetical sales across a couple of countries for a theoretical company. Just looking at the numbers, we can see that that company sells the most to China.
But, is that the full picture? What if we try to normalize
the data, for example, by looking at how much people are spending within each country.
We can do this by finding out the population of each country.
With this information, we now have a different picture, with
Sweden being the standout. It doesn’t have as high a population as China, but, per
capita, the sales are better.
By normalizing your data, you get another view, offering a more accurate representation of the true popularity of, say, a given product being sold. That’s why it’s important to consider normalizing data instead of just looking at the raw statistics – you could be overlooking crucial insights.
Where to Go Next?
If you’re interested in this topic, make sure to read the whitepaper Tips to improve your data literacy using Qlik Sense available on Qlik Community.