Have you ever wondered why RetentionX splits your AOV into two datasets for your new and repeat customers? Analyzing data can often be a complex task, especially when faced with large data sets. One phenomenon that can lead to surprising and sometimes misleading conclusions is Simpson's Paradox. You may be thinking, that this sounds like an abstract theory that doesn't apply to your real-world problems. Don't worry - the concept is simple, but extremely relevant to your decision-making processes. Let's learn more about Simpson's Paradox and how it can affect your AOV Analysis.
What is Simpson’s Paradox?
Well, first of all, it has nothing to do with Homer Simpson. Simpson's Paradox occurs when trends that appear when a data set is divided into groups reverse themselves when the data is aggregated and the same calculations are performed. Simpson's Paradox was first described by Edward H. Simpson, a British statistician and former cryptanalyst.
But let's look at an example:
Say you and your partner are trying to find the perfect restaurant for a nice dinner. Knowing that this process could lead to hours of arguing, you decide to check some online reviews. You find that your first choice, Giovanni's Pizza Place, is recommended by a higher percentage of guests. But just as you are about to claim victory, your partner explains, using the same data, that Jack's Seafood Restaurant is recommended by a higher percentage of both men and women, making it the clear winner.
Wait, what? What is going on here? Actually, both you and your partner are right, and you have entered the world of Simpson's Paradox. The data shows that Jack's Seafood Restaurant is preferred when the data is separated, but Giovanni's Pizza Place is preferred when the data is combined!
How is this possible? The problem here is that looking only at the percentages in the separate data ignores the sample size, the number of guests who gave a review. Each fraction shows the number of guests who would recommend the restaurant out of the number asked. Jack's Seafood Restaurant has far more responses from men than from women, while the opposite is true for Giovanni's Pizza Place. Since men tend to recommend restaurants at a lower rate, this results in a lower average rating for Jack's Seafood Restaurant when the data is combined, creating a paradox.
Understanding Simpson's Paradox in Analyzing AOV
The same phenomenon can occur with your data. That's why it's important to distinguish between new and repeat customers when analyzing your customers' AOV, as they typically show different buying behaviors. New customers tend to have lower AOVs as they are just exploring your products and getting to know your brand, while repeat customers are likely to have higher basket sizes.
But let's look at a practical example of how Simpson's Paradox can affect AOV analysis:
Let's say you invest several thousand dollars to improve cross-selling during the checkout process with the goal of increasing AOV. A few months after implementation, you want to measure the success of these changes but are shocked to find that AOV has decreased by 4% assuming you made a big mistake. However, after taking a deeper dive into the AOV of new and repeat customers, your evaluation might completely change and you're actually more than happy with the first results.
Evaluating the AOV of your new and repeat customers separately shows that the AOV has increased for both groups of customers. However, by attracting far more new customers, the high sample size decreases the overall AOV as their AOV is generally lower than that of repeat customers.
Resolving the Paradox
To avoid Simpson's Paradox leading to wrong conclusions, always consider how the data is generated and what factors might not be reflected. Segmentation is key! By breaking down your customer base into smaller segments, you can uncover specific patterns. For example, run separate analyses for subscribers vs. non-subscribers, male vs. female customers, one-time buyers vs. repeat customers.
Comments
0 comments
Please sign in to leave a comment.