Mutually exclusive experiments

This topic describes how to use mutually exclusive experiments and prevent interaction effects that could invalidate your results. Use exclusion groups to ensure that users do not see overlapping experiments that experiment on the same feature.

How exclusion groups work

For experiments that are not mutually exclusive, Optimizely uses a seed, or unique value, for each experiment to bucket the user. The seed determines whether a user enters a particular experiment. Because the seeds are random, unique, and not mutually exclusive across experiments, some users enter multiple experiments. For example, imagine two experiments: A and B, each receives 20% traffic allocation (the percentage of total traffic that is eligible for the experiment). Here is the expected traffic allocation:

16% of traffic falls in experiment A only
16% of traffic falls in experiment B only
4% of traffic falls in both experiment A and experiment B
64% of traffic is not in any experiment

In the example above, results from experiment A and experiment B may be skewed. If users who see both A and B behave differently from users who see just A or just B, then the results for A and B are skewed by the overlap. This is called an interaction effect.

If experiments A and B are mutually exclusive, Optimizely chooses the same random seed (which is unique to the exclusion group) to bucket users in experiments A and distribute traffic to one or the other. This method ensures that experiments cannot overlap for the same users. If experiments A and B are mutually exclusive, the traffic allocation looks something like this:

20% of traffic falls in experiment A only
20% of traffic falls in experiment B only
60% of traffic is not in any experiment

Optimizely also ensures mutual exclusivity between experiments in an exclusion group that run at different times. How? By assigning bucket ranges to experiments using a stratified sample of available buckets, with strata that consist of all current and previously allocated bucket ranges.

Best practices

To guard against any possibility of interaction effects, you might consider making all your experiments mutually exclusive. But sometimes making all experiments in the project mutually exclusive requires more traffic than is available. As a best practice, we recommend that some experiments should overlap and some experiments should be mutually exclusive, depending on the traffic levels you need to reach significance and which parts of your codebase are being tested.

You are more likely to see interaction effects if:

You are running two experiments on the same area of an application.
You are running two experiments on the same flow where there is likely going to be strong overlap.
You are running an experiment that may have a significant impact on a conversion metric that is shared with other experiments.

If these points do not apply, then it is usually unnecessary to create mutually exclusive experiments. Both variations of the experiments are exposed to the other experiment proportionally.

However, there are a few scenarios when creating mutually exclusive experiments or running sequential experiments (waiting for one to end for the next to start) is recommended. Even if you ensure that experiments are mutually exclusive, it is still possible to see interaction effects from experiments running at different times. After you have experimented on some population of users, it is never possible to get a truly unbiased population for future experiments. If you are concerned about interaction effects between experiments running at different times, finish all experiments in an exclusion group before creating new experiments.

For example, suppose you created an exclusion group with four experiments (A, B, C, and D), running at 25% traffic allocation each. If you stop experiment D and start another experiment E, the experiment results of E could be biased because all users in E were previously given the treatment from D. Wait for experiments A, B, and C to finish before starting experiment E to make sure the traffic is evenly sampled across all previous experiments.

When making important decisions for your business, evaluate your risk tolerance for experiment overlap. Evaluate your prioritized roadmap to ensure that you are planning your variation designs, goals, and execution schedule to best meet your business needs.