Personalization

You Should Be Able to Explain Why Your Algorithm Recommended That Product

6 min read
Abstract model explainability concept — algorithm decision made visible

At some point in almost every customer conversation we have, the question comes up in some form: "Can I see why it's recommending this?" The person asking might be a growth lead who noticed the algorithm surfacing a product that seems off. It might be a founder who wants to understand what's happening inside the black box before they trust it with their homepage. It might be a merchandiser who suspects the system is over-indexing on a single category because inventory is deep there.

In all those cases, the right answer isn't a shrug followed by "the model found a pattern." That answer destroys trust in the recommendations, and for good reason. If you can't explain a recommendation, you can't validate it, debug it, or defend it to your team. And when the algorithm makes a mistake — which it will, because all algorithms make mistakes — you can't diagnose why without explainability.

This is the case for explainability in product recommendation systems. Not regulatory compliance, not academic fairness analysis — just the operational reality that a recommendation you can't explain is a recommendation you can't manage.

What "Explainability" Means in a Recommendation Context

Explainability in machine learning has a technical literature that spans hundreds of papers — SHAP values, LIME, attention weights, saliency maps. For DTC brand recommendations, most of that sophistication is irrelevant. What merchants actually need is simpler: for any given recommendation shown to any given shopper, they should be able to see which features drove the recommendation score, at a human-readable level of granularity.

That looks something like this: "Product X was recommended because: this shopper viewed three products in the outerwear category in their last two sessions (high category affinity weight), this product has a 78% return-visit rate among shoppers who viewed it without purchasing in the same session (high browse-to-return correlation), and this product is currently in the top 15% of inventory availability in the shopper's likely size range (availability score elevated)."

That explanation is tractable. A merchandiser can read it and either confirm it makes sense or identify a problem. "Category affinity: outerwear" makes sense for a shopper who's been browsing coats. If it said "category affinity: kitchen" for a shopper who only browses outerwear, that's a flag that something in the feature pipeline has a data quality issue — maybe the session attribution logic is mis-assigning page views to the wrong user, or the category taxonomy has a product miscategorized.

We're not saying every recommendation needs a fully rendered explanation visible to the shopper — that's a UX choice that most stores don't make. We're saying that the merchant team operating the personalization system should be able to pull up feature attribution for any recommendation, on demand, without having to file a support ticket with the vendor.

The Feature Attribution Architecture in Revlance

Revlance's recommendation scoring model produces feature attribution alongside every recommendation score. The score itself is a weighted combination of approximately 15 feature families. The top contributing features are stored alongside the recommendation in a lightweight attribution object that persists for 48 hours after the recommendation is generated.

The features fall into three groups:

Shopper-side signals: category affinity scores per session history depth, price tier engagement, recency of session activity, session-local preference delta (what the shopper has been looking at in the current session), and cross-category co-engagement patterns (a shopper who consistently pairs kitchen and entertaining accessories has a different profile from one who sticks to a single category).

Product-side signals: inventory availability (including size-specific availability for apparel), recency of the product (new arrivals get a temporary elevation weight), return-visit rate (how often shoppers who viewed this product without buying came back to buy it later), and price-to-affinity match (how closely the product's price aligns with the shopper's demonstrated price tier).

Contextual signals: time of day/week conversion patterns for this product category, device type interaction history for this shopper, and acquisition source affinity (some products convert better from certain acquisition cohorts, and those patterns become part of the prior).

In the merchant dashboard, you can click on any recommended product for any profile and see which of these feature families contributed the most to the recommendation score, displayed as a percentage of total score contribution. This isn't a perfect causal explanation — feature attribution in ML models is an approximation, not a mechanical trace — but it's precise enough to be operationally useful.

When Explainability Catches Real Problems

The reason we built this into Revlance from the start rather than as a later feature addition is that we found explainability to be a debugging tool as often as it was an audit tool.

A specific example of the kind of problem it catches: a DTC home goods store with a deep inventory of candles noticed that the personalization system was recommending candle holders to a very high percentage of new visitors, regardless of their browsing behavior. The explanation in the attribution interface showed that "cross-category co-engagement: candles + candle holders" was contributing a large fraction of the recommendation score for the candle holders product — meaning the system had learned that candle shoppers also buy candle holders, and was front-loading that recommendation aggressively.

The problem: the correlation was real, but it was a correlation driven almost entirely by a single promotional period three months earlier when both products were discounted together. The model had learned a promotional artifact as a genuine preference signal. Without the explainability layer, this would have been invisible. With it, the team identified the issue, flagged the promotional period data, and we applied a correction to the training signal weighting. The candle holder recommendation rate for low-signal visitors dropped to a more appropriate level.

This kind of data quality issue is endemic to behavioral recommendation systems. Promotions, seasonal anomalies, site navigation changes, and catalog reorganizations all leave artifacts in behavioral training data. Explainability doesn't prevent these artifacts from forming — it makes them findable.

The Trust Gap Between Merchants and Algorithms

There's a pattern we see repeatedly with DTC brands adopting personalization for the first time: initial enthusiasm followed by a crisis of trust when the algorithm does something unexpected. A product the merchandising team doesn't think highly of gets surfaced prominently. A new arrival that the team is excited about doesn't show up in recommendations. A shopper who clearly wants outerwear gets shown kitchen accessories.

When these moments happen without explainability, the typical response is to build an ever-expanding list of override rules: "never show product X to segment Y," "always boost collection Z regardless of behavioral signal," "pin these ten items to the top of every new visitor's recommendations." Over time, the rule overrides accumulate until the personalization system is doing very little actual personalization — it's spending most of its recommendation surface area on what the merchandising team already knew they wanted to show.

Explainability short-circuits this pattern by giving the merchandising team a way to understand and evaluate algorithm decisions rather than override them reflexively. When the algorithm surfaces a product they didn't expect, they can look at the feature attribution and either say "yes, that makes sense — I didn't know this product had such strong return-visit rates" or "no, that's wrong, here's the data quality issue that caused it." The second case is an override that makes sense. The first case is the algorithm teaching the team something about their catalog they didn't know.

What Explainability Doesn't Solve

Feature attribution doesn't tell you whether the recommendation is good in an absolute sense — only why the model ranked it as it did. A recommendation might have high feature attribution scores across all signal types and still be a bad recommendation, because the feature data itself is stale, or because the recommendation context (a cart page with a specific purchase in progress) wasn't fully accounted for in the scoring context.

Explainability is also not a substitute for A/B testing as the primary validation mechanism for recommendation quality. Knowing why a recommendation was made is different from knowing whether it made a positive difference to conversion or AOV. Both types of evidence are necessary. Feature attribution tells you what the model thinks it knows. Test data tells you whether what the model knows translates to real-world outcomes.

The goal is a system where you trust the recommendations enough to let them run, you understand them well enough to catch errors when they happen, and you have test results that validate whether that trust is warranted. Explainability is the second part of that three-part requirement — not a replacement for the first or the third.

More from the blog

Browse all articles