When interpreting your data, choosing the right measure of central tendency is crucial to making informed business decisions. Two of the most common metrics are the average and median. While they may seem straightforward, they can lead to very different conclusions based on the data they summarize. Let’s dive into what each one represents and how they impact your analysis.
What is the Average?
The average can be thought of as the "balance point" or center of mass of a dataset. It considers every data point in your dataset to give a single value that represents the central tendency.
To calculate the average, you:
- Add up all the values in your dataset.
- Divide by the total number of values.
For example, let’s say you have the following set of revenues of your customer:
To calculate the average revenue per customer:
Step 1: Add the Values
$100 + $120 + $200 + $300 + $300 + $370 + $400 + $500 + $1,200 = $3,490
Step 2: Divide by the Number of Values
$3,490 ÷ 9 = $388
So, the average revenue per customer is $388.
What is the Median?
The median is the middle value when the data is sorted in ascending order. Unlike the mean, the median is less sensitive to extreme values, or outliers, and is often a better representation of central tendency when data is skewed.
To calculate the median, you:
- Sort the data from smallest to largest.
- If the dataset contains an odd number of values, the median is the middle value.
If the dataset contains an even number of values, the median is the average of the two middle values.
Let’s take the same dataset:
Step 1: Sort the Data
The data is already sorted: $100 + $120 + $200 + $300 + $300 + $370 + $400 + $500 + $1,200
Step 2: Find the Middle Value
Since there are 9 values (an odd number), the median is the fifth value: $300.
Impact of Outliers on Average vs. Median
One of the most significant differences between the average and the median is how they react to outliers—values that are significantly higher or lower than the rest of the data.
In the example above, we had only one outlier—one customer with a total revenue of $1,200—which caused the average to be 37% higher than the median. But of course, this effect can become even more extreme with more outliers.
Let’s look at another example. Suppose you have a dataset of monthly revenue from your store, with one extreme outlier: $5,000, $6,000, $6,500, $7,000, $100,000
Step 1: Calculate the Average
$5,000 + $6,000 + $6,500 + $7,000 + $100,000 = $124,500
$124,500 ÷ 5 = $24,900
Step 2: Find the Median
The sorted values are: $5,000, $6,000, $6,500, $7,000, $100,000
The median is the third value: $6,500
In this case, the average revenue is $24,900, which is heavily influenced by the $100,000 outlier. The median revenue is $6,500, which provides a more accurate picture of the typical revenue, unaffected by the extreme outlier.
In many datasets, data tends to be skewed, where a few extreme values (e.g., large orders or high-value customers) pull the average away from the true center of the data. The choice between average and median can have a huge impact on your decision-making:
-
When to use the average: If the data is normally distributed, both the mean and the median will give you very similar results. In these cases, the average is often used because it reflects the sum of all transactions.
- When to use the median: If the data is skewed or contains outliers (such as the time between orders), the median is often a better representation of the typical case. It helps you avoid being misled by extreme values.
Comments
0 comments
Please sign in to leave a comment.