Decoding Your Recommendations Performance

Product recommendations, also known as “recs,” are a cornerstone to an effective ecommerce merchandising strategy. When fully optimized, recs typically increase retailer revenues by up to 5%.

read more

Retail Gazette – LFW: project technology

“Gone are the days where stores are the forefront of the marketing stage and mobile devices are used predominantly for text messages and playing snake”, says Matthieu Chouard at RichRelevance, an omnichannel personalisation specialist. This couldn’t more true than at this season’s ‘Fashion Month’, where technology dominated the catwalk.

Read more

Information Age – What the retail sector can learn from London Fashion Week's tech innovation

This season’s London Fashion Week showcased some flamboyant fashion- and equally dazzling examples of omnichannel, personalised digital marketing strategies.
Every season, the illustrious London Fashion Week gets more high tech as retailers seek to make the show an interactive, omnichannel brand experience.

Read more

Entrepreneur – Data Driven: What Amazon's Jeff Bezos Taught Me About Running a Company

My experience with Jeff Bezos changed me forever.

In 2003, Amazon hired me directly out of Stanford. I initially turned down six separate offers until management coaxed me into running the company’s Customer Behavior Research group focused on data-mining research and development. Upon arrival in Seattle, I was bounced around from one manager to another, including working directly with Bezos himself.

Read more

Journal du Net – Le point critique des valeurs aberrantes dans le test A/B

Malcolm Gladwell a récemment vulgarisé le terme « outlier » (valeur aberrante) en l’utilisant pour désigner des personnes performantes. Toutefois, dans le contexte des données, les valeurs aberrantes sont des points de données très éloignés d’autres points de données, c’est-à-dire atypiques… Read more

The Tipping Point for Outliers in A/B Testing

Malcolm Gladwell recently popularized the term ‘outlier’ when referring to successful individuals. In data terms, however, outliers are data points that are far removed from other data points, or flukes. Though they will make up a small portion of your total data population, ignoring their presence can jeopardize the validity of your findings. So, what exactly are outliers, how do you define them, and why are they important?

A common A/B test we like to perform here at RichRelevance is comparing a client’s site without our recommendations against with our recommendations to determine the value. A handful of observations (or even a single observation) in this type of experiment can skew the outcome of the entire test. For example, if the recommendation side of an A/B test has historically been winning by $500/day on average, an additional $500 order on the No Recommendation side will single-handedly nullify the apparent lift of the recommendations for that day.

This $500 purchase is considered an outlier. Outliers are defined as data points that strongly deviate from the rest of the observations in an experiment – the threshold for “strongly deviating” can be open to interpretation, but is typically three standard deviations away from the mean, which (for normally distributed data) are the highest/lowest 0.3% of observations.

Variation is to be expected in any experiment, but outliers deviate so far from expectations, and happen so infrequently, that they are not considered indicative of the behavior of the population. For this reason, we built our A/B/MVT reports to automatically remove outliers, using the three standard deviations from the mean method, before calculating results, mitigating possible client panic or anger caused by skewed test results from outliers.
At first glance, it may seem odd to proactively remove the most extreme 0.3% of observations in a test. Our product is designed to upsell, cross-sell, and generally increase basket size as much as possible. So, in an A/B test like the above, if recommendations drive an order from $100 to $200, that’s great news for the recommendations side of the test – but if the recommendations are so effective that they drive an order from $100 to $1,000, that’s bad news because a $100 order has become an outlier and now gets thrown out.

In order for a test to be statistically valid, all rules of the testing game should be determined before the test begins. Otherwise, we potentially expose ourselves to a whirlpool of subjectivity mid-test. Should a $500 order only count if it was directly driven by attributable recommendations? Should all $500+ orders count if there are an equal number on both sides? What if a side is still losing after including its $500+ orders? Can they be included then?

By defining outlier thresholds prior to the test (for RichRelevance tests, three standard deviations from the mean) and establishing a methodology that removes them, both the random noise and subjectivity of A/B test interpretation is significantly reduced. This is key to minimizing headaches while managing A/B tests.

Of course, understanding outliers is useful outside of A/B tests as well. If a commute typically takes 45 minutes, a 60-minute commute (i.e. a 15-minute-late employee) can be chalked up to variance. However, a three-hour commute would certainly be an outlier. While we’re not suggesting that you use hypothesis testing as grounds to discipline late employees, differentiating between statistical noise and behavior not representative of the population can aid in understanding when things are business as usual or when conditions have changed.

More posts