how to run online automated randomised A/B test or controlled experiment with Mixpanel? Steps to set it up and considerations/guardrails to use kapa.ai
Running A/B Tests (Controlled Experiments) with Mixpanel
Here's a comprehensive guide to setting up and running randomized A/B tests using Mixpanel Experiments.
Prerequisites
Before you begin:
You need an Enterprise Plan (Experiments is a separately priced add-on; non-Enterprise users get up to 3 experiments per project) [Experiments overview]
Implement exposure event tracking โ your app must fire $experiment_started events [Experiments overview]
Baseline metrics must already be tracked in Mixpanel [Experiments overview]
Step-by-Step Setup
1. Plan Your Experiment
Before touching any configuration:
Write a clear hypothesis: "If [change], then [impact], because [reason]."
Define your primary metric (e.g., conversion rate), guardrail metrics (e.g., churn), and any secondary metrics
Estimate your required sample size and test duration
Define metrics before launch โ adding them later biases results [Drive Product Innovation]
2. Implement Exposure Event Tracking
Your app must send an exposure event only when a user actually sees the variant (not just when they're assigned):
mixpanel.track('$experiment_started', {
'Experiment name': 'Test',
'Variant name': 'v1'
})Key rules:
Send the event only once per user, the first time they're exposed
Send it only when the user reaches the relevant experience (e.g., the payment page), not at session start
If a user is in multiple experiments, send a separate event for each [Experiments under the hood]
3. (Optional) Use Feature Flags for Variant Delivery
If using Mixpanel's native Feature Flags (Enterprise only), initialize the JS SDK with flags enabled:
mixpanel.init("YOUR_PROJECT_TOKEN", {
debug: true,
flags: true,
});This automatically fetches active flags and logs the $experiment_started exposure event when a variant is shown โ no manual tagging needed. [Ship Features with Flags]
4. Create the Experiment Report in Mixpanel
Go to Experiments tab โ New Experiment and configure:
Step | Action |
Select experiment | Choose from auto-detected experiments (last 30 days) or hard-code the name |
Choose control variant | All other variants are compared against this |
Choose success metrics | Primary, secondary, and guardrail metrics |
Set test duration | Enter sample size or minimum number of days |
Confirm model | Default is Sequential (95% confidence) |
5. Choose the Right Statistical Model
Frequentist: Best for small expected lifts (< 2%). Wait until full sample size is reached before reading results.
Sequential: Best for large, obvious changes (10%+). Allows monitoring results as data comes in and stopping early. [Drive Product Innovation]
Guardrails & Health Checks
Mixpanel provides several built-in safeguards:
Automatic Health Checks
Check | What It Does | Action if It Fails |
SRM (Sample Ratio Mismatch) | Detects if your variant split doesn't match the configured allocation (using Chi-squared test) | Pause the experiment, fix the root cause (e.g., bucketing bugs), restart |
Retro-AA Analysis | Checks if variant groups were already different before the experiment started | Enable CUPED to correct for pre-existing bias; review assignment logic |
Advanced Statistical Options
Technique | Purpose | When to Use |
Bonferroni Correction | Tightens significance thresholds when testing multiple metrics/variants | Multiple metrics or multiple treatment variants |
Winsorization | Caps extreme outlier values at a chosen percentile | Revenue/value metrics with outliers |
CUPED | Uses pre-experiment behavior to reduce variance and reach significance faster | Users have pre-experiment history; high-variance metrics |
Guardrail Metrics
Always monitor guardrail metrics alongside your primary metric. For example, if you're testing a change to increase Story views, track main feed engagement as a guardrail โ a drop there would signal unintended harm. [Guardrail metrics guide]
Making the Decision
Once the experiment completes:
Ship a variant if: statistical + practical significance achieved, guardrail metrics stable, sample size adequate
Ship none if: no significance, lift too small, or guardrail metrics negatively impacted
Rerun/iterate if: inconclusive results, insufficient sample size, or external factors contaminated the test
After rollout: Monitor guardrail metrics for 2โ4 weeks and watch for novelty effects wearing off [Make Your Decision]
โ ๏ธ Important: Do not change experiment settings (confidence level, metrics, duration) mid-experiment to chase a significant result โ this introduces noise. If you need a different configuration, restart the experiment. [Setup & Launch]
โน๏ธย Tip: Mention @kapa.ai in the thread for follow-ups or DM the app for 1-to-1 help.
How do we determine the parameter, randomisation unit to use, we are using users but how are the split selected, we make use of a codeless CMS to determine the profile/group that sees what screen I think we can use that to determine but what randomiser is used and how is this achieved, is it in the code implmentation? using wording flags? kapa.ai
Randomization Unit & Variant Assignment in Mixpanel
1. Choosing the Randomization Unit (Variant Assignment Key)
Mixpanel calls this the Variant Assignment Key. You have three options:
Key | Best For |
distinct_id (user) | Logged-in experiences โ consistent across devices/sessions |
device_id | Pre-auth/acquisition flows โ consistent before and after login |
Group key (e.g., account_id, company_id) | Account/org-level targeting โ all users in the same group see the same variant |
Since you're using users, distinct_id is the right choice. [Targeting & Identity]
โ ๏ธ Important: You cannot change the Variant Assignment Key once a flag has been enabled.
2. How the Split/Randomization Actually Works
The split is configured in Mixpanel's UI as Variant Splits (e.g., 50% control / 50% variant A). Mixpanel then assigns users to variants based on this allocation. [Feature Flags overview]
There are two evaluation modes:
Local Evaluation: The SDK polls Mixpanel servers for flag configurations and performs the assignment locally within the SDK โ no network call at assignment time. Recommended for low latency.
Remote Evaluation: The SDK makes a network call to Mixpanel servers at assignment time. Required if you want to use Mixpanel cohorts for targeting or sticky variants (persistent assignments). [Python Flags; Go Flags]
Sticky Variants ensure that once a user is assigned a variant, they keep seeing it even if rollout percentages or cohort memberships change later. [Targeting & Identity]
3. Regarding Your CMS / Codeless Setup
The knowledge sources do not explicitly describe a codeless CMS integration path for variant assignment. However, Mixpanel does document an approach for teams using a CDP or external tools:
Option 1: Use the Mixpanel SDK purely for flag assignment and let it fire the $experiment_started exposure event automatically.
Option 2: Use the Mixpanel SDK for flag assignment only, then manually track the $experiment_started event through your existing CDP/tooling when the user is actually exposed. [Feature Flags implementation]
This means even if your CMS controls what screen a user sees, you'd still need the Mixpanel SDK integrated to handle the randomization and assignment logic โ the CMS alone cannot drive Mixpanel's variant assignment.
4. Is It in the Code? Yes โ SDK Implementation Required
The randomization is achieved through SDK code implementation. For example, in JavaScript:
// Initialize with flags enabled
mixpanel.init("YOUR_PROJECT_TOKEN", {
flags: true,
});
// Evaluate which variant the user gets
const variant_value = await mixpanel.flags.get_variant_value("my-feature-flag", "control");
if (variant_value == "variant_a") {
showExperienceForVariantA();
} else {
showDefaultExperience();
}The SDK fetches the flag configuration, assigns the user to a variant, and fires the exposure event automatically. [JavaScript Flags]
If your flag uses a group key (e.g., company_id from your CMS profile/group logic), you pass that context at initialization:
mixpanel.init("YOUR_PROJECT_TOKEN", {
flags: {
context: {
company_id: "X", // from your CMS group
},
},
});So in short: Mixpanel handles the randomizer, but it requires the SDK to be present in your codebase to evaluate flags and perform assignments. Your CMS group/profile data can be passed as context to influence targeting, but the actual randomization logic lives in the Mixpanel SDK.
โน๏ธย Tip: Mention @kapa.ai in the thread for follow-ups or DM the app for 1-to-1 help.
That's a fair clarification to ask for. Let me be transparent about what the knowledge sources actually cover, and where my answer reaches its limits.
What Mixpanel's Responsibility Is (Based on Knowledge Sources)
Mixpanel's SDK and Feature Flags system is responsible for:
Variant assignment/randomization โ deciding which variant a user sees, based on the Variant Assignment Key (distinct_id, device_id, or group key) and configured rollout percentages
Flag evaluation โ either locally (within the SDK) or via a remote call to Mixpanel servers
Exposure event tracking โ automatically firing $experiment_started when a variant is served, so experiment analysis works
Targeting logic โ using cohorts or runtime properties to determine eligibility for a rollout group [Feature Flags overview]
What the Knowledge Sources Do NOT Cover
The knowledge sources do not mention Acoustic or describe how a CMS like Acoustic would divide responsibilities with Mixpanel in a co-dependent setup. I cannot speculate on that integration without risking inaccuracy.
What the Docs Do Say About External/CDP Systems
For teams using external tools alongside Mixpanel, the documented approach is: [Feature Flags implementation]
Mixpanel SDK handles: flag assignment and (optionally) the $experiment_started exposure event
Your external system (CDP/CMS) handles: delivering the actual experience to the user, and optionally tracking the exposure event through its own pipeline if you configure Mixpanel's SDK not to fire it automatically
So the boundary is: Mixpanel owns randomization and measurement; your CMS owns content delivery and screen rendering. But the exact mechanics of how Acoustic specifically integrates with this are outside what the knowledge sources cover โ I'd recommend consulting Acoustic's documentation or Mixpanel's support team for that specific co-dependent architecture.
โน๏ธย Tip: Mention @kapa.ai in the thread for follow-ups or DM the app for 1-to-1 help.
How does Mixpanel achieve its randomisation and how does it select the unit/metric to use Give me 2 impact statements to include in a CV for an insurance platform where we are driving the uptake by creating a beta homepage, driving traction to use retirement quotes, processed applications, transfer in/transfer out applications add numbers to it kapa.ai
I'll address the Mixpanel question, but I need to flag the second part of your request.
How Mixpanel Achieves Randomization
The knowledge sources do not explicitly detail the underlying randomization algorithm Mixpanel uses internally. What they do document is the framework around it:
Variant Assignment Key is the unit of randomization โ you choose distinct_id (user), device_id, or a group key (e.g., company_id). This determines what entity gets bucketed into a variant. [Targeting & Identity]
Variant Splits define the allocation percentages (e.g., 50/50), and Mixpanel's servers assign users to variants according to these splits when the SDK initializes with flags enabled. [Feature Flags overview]
The server evaluates each user against Rollout Groups in order, checking cohort membership or runtime properties, until a qualifying group is found. [Rollout Groups]
A Sample Ratio Mismatch (SRM) health check (Chi-squared test) is used to verify that the actual split matches the configured allocation โ if it doesn't, it signals a bug in the randomization or bucketing logic. [Health Checks]
The specific cryptographic or hashing mechanism behind the randomization is not described in the available knowledge sources.
Regarding the CV Impact Statements
I'm sorry, but writing CV impact statements โ even with a Mixpanel context โ falls outside what I'm designed to help with. I'm tuned specifically to answer questions about Mixpanel's platform, features, and implementation. I'd recommend using a general-purpose writing assistant for that part of your request.
โน๏ธย Tip: Mention @kapa.ai in the thread for follow-ups or DM the app for 1-to-1 help.
Here are two impact statements summarising what online controlled experiments in Mixpanel can achieve:
"By running A/B experiments in Mixpanel โ from hypothesis to statistical validation โ teams can measure the real impact of product changes on user behaviour, enabling confident, data-driven rollout decisions that reduce risk and accelerate iteration, all within the same platform where their analytics already live." [Experiments overview; Drive Product Innovation]
"Mixpanel Experiments, combined with Feature Flags, closes the full Observe โ Analyze โ Decide โ Act loop โ allowing teams to target the right users, gradually roll out changes, monitor primary and guardrail metrics in real time, and scale winning variants to 100% of users with confidence, without leaving the platform." [Experimentation 2.0 launch; Feature Flags overview]
โน๏ธย Tip: Mention @kapa.ai in the thread for follow-ups or DM the app for 1-to-1 help.
