Metica Technical Series

5 min read

Beyond A/B Testing: How Contextual Bandits Underpin Personalization, And How to Design Them

Beyond A/B Testing: How Contextual Bandits Underpin Personalization, And How to Design Them

By

Andrew Maher

ML engineering

Feb 5, 2025

From Familiarity to Innovation: It’s Time to Rethink Player Personalization

We don’t know what we don’t know—and sometimes, what we don’t know can be the difference between success and failure.

Right now, one of the most critical decisions facing game studios is whether to stick with the familiarity of A/B testing or embrace the evolving power of Contextual Multi-Armed Bandits (contextual bandits). The challenge? Many still aren’t sure what contextual bandits are or how they can transform personalization and drive growth.

Personalization is poised to become one of the biggest buzzwords of 2025—and for good reason. At Metica, we are fully on board for it and believe personalization isn’t just a trend; it’s essential for success and players are increasingly demanding it. Developing a personalization strategy is no longer optional.

But where do you even begin? That’s where contextual bandits come in, and in this blog, we’ll tell you how to think about them.

Traditionally, the gaming industry has relied heavily on A/B testing. It’s familiar, straightforward, and effective in many cases. But now, there’s a more dynamic approach to personalizing player journeys—one that adapts in real time.

Enter Contextual Bandits, an AI-driven, game-changing solution that tailors decisions to individual players based on their unique behaviors. Think of contextual bandits as your game’s secret weapon for scaling personalization, delivering experiences that keep players engaged and coming back for more. 

Our first blog on this topic got so much interest we wanted to do another blog to not only explain further as to why they’re so powerful, but to also guide you through setting one up for your game. We have also created a short video you can share with your teams.


What’s the Difference? A/B Testing vs. Contextual Bandits

A/B testing is the reliable old faithful of optimization. It compares two or more options—of, say, the contents of a live-op—and identifies the one that works best overall. The winning option is then rolled out to everyone. While effective, its limitation is clear: one-size-fits-all solutions don’t actually fit all players, and they can be slow and cumbersome to converge. 

Contextual Bandits, on the other hand, adapt dynamically, matching the right option to the right player at the right time. This creates many "winners" tailored to individual preferences. For example:

  • A/B Testing: Chooses one universal live-op for all players.

  • Contextual Bandits: Delivers a live-op tailored to players with a low gem balance while offering a more challenging option to experienced players with high gem balances.

With Contextual Bandits, you move beyond optimizing for the mythical “average” player, instead tailoring experiences to each individual, which drives engagement, retention, and revenue.


Common Examples of Misusing A/B Testing

We get it—moving from a familiar method to something new can feel daunting. But we’ve seen countless games stick with A/B testing when Contextual Bandits would provide more useful results. Here are a few examples:

  1. Ad Frequency: Determining how often ads should be shown isn’t an A/B testing question. Players respond to ads in vastly different ways—some tolerate frequent ads, others prefer fewer. The middle-ground shouldn’t be selected for all players.

  2. Game difficulty: Picking the challenge level of different parts of a game isn’t suited for A/B testing either. People enter games with different experience and skill levels, and they have varied reasons for playing a specific game. Enforcing a uniform difficulty level doesn’t account for these differences.

  3. Dynamic Offers: Determining the right offer to show a player—its contents, as well as its price—should involve Contextual Bandits. Providing to a bandit recent gameplay and spending patterns allows it to determine the best offer for each player instead of testing arbitrary numbers.


Setting Up Contextual Bandits in Your Game

Curious about implementing Contextual Bandits? Here’s how to get started:

  1. Identify Key Decision Points
    Pinpoint where Contextual Bandits could make the biggest impact, such as:

    • In-app purchases (IAPs): Optimize pricing and bundles based on player profiles.

    • Ad placements: Balance revenue generation with retention.

    • Gameplay mechanics: Deliver bespoke experiences to players that keep them engaged.

    • LiveOps Events: Tailor rewards or entry criteria to match individual play styles.

      Sense Check: Ensure you have enough data to fuel the bandit’s learning—ideally at least 1,000 daily active players or 100 meaningful interactions per variant over a week.


  2. Design Bold, Diverse Variants
    Forget safe, incremental changes. Contextual Bandits thrive on diversity. Instead of testing $4.95, $5, and $5.05, go for $1, $5, and $10.

    • Broader options uncover hidden preferences and maximize engagement.

    • Avoid redundancy; overlapping options waste time and limit insights.

      Sense Check: The bandit will pick freely from the variants you’ve specified; make sure you’re happy that any of them can be shown to any player. If not, consider redesigning the underlying optimization for a more desirable player experience.


  3. Choose the Player Context Wisely
    Carefully design the player context—the data points the bandit uses to make decisions—so the bandit can best learn player preferences:

    • Avoid overly specific fields (like install timestamp, player ID—or even UA campaign ID) because they don’t allow the bandit to generalize between players.

    • Make sure each field contributes new information; don’t introduce redundant data points (like gross and net player spend simultaneously).

    • Numeric fields (like player level or hard-currency balance) are often easier for the bandit to handle.

    • Consider the context availability. Only a few fields (like UA source) are really available for a first-install decision point. Later-game events can be supported by richer signals.

      Sense Check: Match the number of context fields to the traffic volume of the decision point. There are no clear-cut rules, but if a million players enter the decision point every day, there’s an opportunity cost to only optimizing against only a player’s level. Conversely, if only ten players a day enter the decision point, having a context vector containing one hundred fields will only add noise to the system.


  4. Pick the Right Success Metric
    What does success look like for your bandit? Align metrics with decision points:

    • IAP revenue for offer optimization.

    • Retention metrics for gameplay adjustments.

    • Engagement rates for LiveOps events.

    Sense Check: Are you measuring what truly matters? For example, if you're optimizing ad frequency, don’t just track total revenue—include session length to ensure you’re not sacrificing engagement for short-term gains. Similarly, if your goal is IAP revenue, focus on revenue per player within a defined timeframe, like three days, to capture immediate impact.



Overcoming Common Challenges

Implementing Contextual Bandits can seem daunting. But many of the obstacles you might encounter are manageable with the right approach:

  1. Low Traffic or Conversions: Start with fewer variants if player volume is low. Scale up as data availability increases.

  2. Changing Player Behavior: For sudden shifts (e.g., holiday surges), reset the optimization—or update variants to keep it relevant.

  3. Multiple Concurrent Bandits: Don’t worry (too much) about running multiple bandits in parallel. The effects of other optimizations will be evenly assigned across variants.


The Future of Personalization in Gaming

The gaming industry has only begun to explore the potential of Contextual Bandits. While A/B testing will always have its place, the future lies in real-time, data-driven personalization that delivers tailored experiences for every player.

At Metica, we believe Contextual Bandits are more than a tool—they represent a mindset for the next era of gaming. Ready to unlock your game’s potential and elevate player engagement? Let’s talk.

Want to find out more?

Book a free 15 minutes call with one of our experts