Merchandising Rules vs ML: When to Use Each (And When to Use Both)

By Brianna Okafor

The framing of "rules vs machine learning" in merchandising is often presented as a progression: you start with rules, you graduate to ML. That framing is wrong in a useful way — it implies that ML is categorically superior and rules are a primitive stepping stone. In practice, rules and ML are solving different problems, have different operational profiles, and the right architecture uses both deliberately rather than treating one as a replacement for the other.

This post is a practical framework for making that distinction. What types of decisions are rules genuinely better at? What does ML handle that rules can't? And what does the interaction layer between them look like in a real personalization implementation?

What Rules Are Actually Good At

Rules encode business intent explicitly. When a merchandising team needs to guarantee a specific outcome — "the new collection hero product is pinned to position 1 for the next three weeks" — a rule is the right mechanism. It's direct, auditable, and deterministic. A VP of Growth can look at the grid, point to position 1, and confirm the rule is working. No black box, no inference required.

Rules also handle hard constraints well. "Never surface out-of-stock items" is a constraint, not a preference — it should fire unconditionally, not be weighted against some affinity score that might occasionally override it. "Suppress products below 20% margin from the homepage grid during Q4" is a business constraint that should override any model's relevance ranking. These are the kinds of decisions where you need predictability, not optimization.

Fast iteration is another area where rules win. Writing a rule takes minutes. Training or retraining an affinity model takes hours to days, depending on your data volume and pipeline architecture. When your team needs to react to a real-time business event — a viral moment, a PR crisis, a supply chain problem that makes a product category temporarily unavailable — rules are the right instrument. You can deploy a rule override in the time it takes to write it and confirm it's active.

The total failure mode for rules isn't that they're wrong — it's that they don't scale. A rule that pins the top-3 SKUs by revenue to the first row of the grid is giving a single signal-based answer for every shopper. For 10 shoppers it might be perfect. For 10,000, the relevance variance is enormous, and the rule is effectively choosing the same answer for people who have different preferences without knowing it.

What ML Handles That Rules Can't

ML-based recommendation models are fundamentally about learning patterns that are too complex or too numerous to encode manually. The affinity relationship between a specific shopper's browsing behavior and a product they haven't seen yet is not something you can write a rule for — there are too many combinations, the patterns vary by shopper, and they change over time.

Collaborative filtering — the class of algorithms that powers most recommendation systems at this layer — learns from the aggregate of what many shoppers do and finds patterns that predict individual preference. "Shoppers who hover on product X at high duration and then purchase tend to also purchase product Y within 45 days" is a pattern that can be discovered from data and acted on in recommendations. You couldn't write that rule without first knowing the pattern — which you can only know by looking at enough data to observe it.

Session-level preference inference is another ML-native capability. Using a shopper's current session behavior to update their preference vector in real time, and using that updated vector to rerank what they see next — this is a sequence-to-sequence modeling problem. The inputs (scroll events, hover events, click sequence, time on page) need to be mapped to an output (reranked product list) in a way that's responsive to the specific session's pattern. Rules can approximate parts of this — "if the shopper browsed category X, show category X first" — but the approximation gets worse as the pattern space grows.

Long-tail product discovery is another genuine ML advantage. A model trained on behavioral co-occurrence data can identify that product Z — which has never been in any merchandising rule because it's not a top seller — converts well for shoppers with a specific preference profile that the team never thought to create a rule for. The model finds the correlation in the data. Rules can only reflect correlations that a human has explicitly identified and encoded.

The Failure Mode on the ML Side

ML models are opaque by nature, and the opacity creates real operational problems. When a recommendation looks wrong — "why is the model showing this shopper a product they bought three years ago and have no current interest in?" — the answer is in the model's weight matrix, not in a readable rule. For a small team managing a personalization system, the inability to inspect and override model outputs is a genuine friction point.

Models also have warm-up requirements. A freshly deployed model with limited training data produces recommendations that are often indistinguishable from random or from simple bestseller ranking. The model needs enough behavioral signal — typically weeks of production traffic — to learn the patterns that make its recommendations better than the baseline. During that warm-up period, a rules-only fallback produces better outcomes than an undertrained model.

We're not saying you should delay ML adoption until you have a large behavioral data set — the model starts learning from day one and improves continuously. The point is that over-relying on model output before it has sufficient signal will produce a worse experience than a well-designed rules layer. The two need to coexist during the warm-up period, with rules handling more of the ranking responsibility and the model's weight increasing as confidence in its signal depth grows.

The Architecture: Rules as Constraints, ML as Scoring

The most functional implementation of rules + ML in a merchandising system follows a layered architecture. The model generates a ranked list of products for each shopper based on the full behavioral signal stack. The rules layer then applies constraints to that ranked list before it's displayed.

This layering looks something like:

Step 1 — Model scoring: Generate a per-shopper affinity-ranked list of products for the current context (collection page, homepage module, recommendation widget). This list reflects the full behavioral signal: historical purchase patterns, current session behavior, category affinity scores, price range preferences.

Step 2 — Hard suppression rules: Remove from the list any products that violate hard constraints: out-of-stock, below margin floor, in a suppressed category, already in cart. These rules fire before any display logic and are non-negotiable — the model's score for a suppressed product is irrelevant.

Step 3 — Pin/boost rules: Apply any active campaign rules: pin a specific product to position 1-2, boost a collection's newest arrivals to the top 3-6 positions, apply a promotional badge to qualifying items. These override the model's ranking for specific positions while leaving the model's output intact for the remaining positions.

Step 4 — Final display: The remaining ranked list, after suppression and pinning, is what the shopper sees.

This architecture preserves the model's intelligence for the bulk of the grid while giving the merchandising team meaningful, predictable control over the exceptions. The team can write rules without needing to understand the model's internals. The model can optimize for relevance without its output being randomly overridden in ways it can't learn from.

When to Use Rules Alone (And When That's the Right Call)

There are catalog stages where a rules-only approach is still the right decision. A store with fewer than 500 SKUs and under 10,000 sessions per month doesn't have the behavioral data density to train a meaningful affinity model. The model will essentially replicate what a bestseller sort with some category filtering would produce. In that case, a well-designed rules layer with careful position management produces equivalent results with less infrastructure complexity and more operator control.

There are also specific placement contexts where rules alone are appropriate regardless of catalog size. The "New Arrivals" section of a homepage, by definition, should show new arrivals in reverse chronological order — a model trained on historical behavioral data will deprioritize new products because they have no behavioral history, which is exactly backwards for this specific placement. Rule: sort by launch date descending, filter to last 60 days.

The signal for when rules are no longer sufficient is consistent and measurable: your overall grid CTR starts declining relative to traffic growth, your top-6 position CTR concentrates while positions 7+ collect almost nothing, and per-segment analysis shows that one shopper segment converts well on your static sort while others bounce at elevated rates. These are the indicators that your grid is optimized for the average shopper and is actively underperforming for everyone else.

Decision Criteria: A Practical Heuristic

Use rules when: the decision has a clear correct answer that your team can specify explicitly, the cost of a wrong output is high enough to require determinism (brand campaigns, compliance constraints), or the speed of deployment is more important than optimization (reactive merchandising responses).

Use ML when: the correct answer varies by shopper, the pattern space is too large to enumerate manually, or you need the recommendation to update in response to within-session behavior rather than just catalog-level attributes.

Use both when: you need deterministic business constraints respected unconditionally while also wanting the best possible relevance ranking for everything the constraints don't govern. Which, for most DTC brands at catalog scale, is most of the time.

The mistake to avoid in either direction: running a rules-only system when your catalog and audience diversity have grown beyond what rules can serve, or running a model-only system that has no mechanism for your team to guarantee specific outcomes when the business requires it. Both failures are real and common; the architecture that avoids both is the one that uses each approach for what it's actually good at.