AB Testing
Subscription: PRO
The AB Testing module provides experiment management and statistical evaluation capabilities. It processes experiment assignments, aggregates user-level metrics, and runs Bayesian statistical analysis to determine which variant performs best.
Note: AB testing is currently only applicable to web personalization. AB testing for email campaigns is not yet supported.
Key Requirements
Before setting up the AB Testing module, ensure that:
-
The PRO subscription is active.
-
The AB Testing service is configured and operational.
-
Web personalization is configured — AB testing currently only applies to web personalization experiments.
-
CDP events are flowing for metrics calculation (the module uses CDP events to measure experiment outcomes).
How It Works
The module orchestrates the complete A/B testing evaluation workflow:
-
Experiment retrieval — Fetches active experiments from the AB Testing configuration service
-
User assignment processing — Determines which users are assigned to which experiment variants
-
Entry condition evaluation — Checks whether assigned users meet the conditions to enter the experiment (e.g., must trigger a specific event after assignment)
-
Metrics aggregation — Aggregates user-level metrics for each variant based on CDP events
-
Bayesian analysis — Runs statistical evaluation to determine variant performance and winning probability
-
Results publishing — Sends evaluation results back to the AB Testing service
Metric Types
The module supports three types of metrics:
COUNT
Counts the number of events of a specific type per user per day.
Example: Number of page views, number of add-to-cart actions
The metric value shown for a variant is the average count per user (including users with zero events). Before analysis, the top 1% of users by event count is excluded per variant to reduce distortion from outliers such as bots generating abnormal volumes of events.
REVENUE
Sums monetary amounts from events per user per day.
Example: Total purchase amount, average order value
The metric value shown for a variant is the average revenue per user (ARPU), calculated across all assigned users — including users who made no purchase (counted as €0). ARPU is modeled directly using a Bayesian bootstrap, rather than decomposing it into a separate conversion-rate model multiplied by an average-order-value model. This avoids unstable results in low-purchase-volume experiments, where the older approach could produce contradictory outcomes (e.g. a variant showing both the lower average value and the higher probability of being best). As with COUNT, the top 1% of users by revenue is excluded per variant before analysis to reduce distortion from outliers (e.g. a single very large order).
BINARY
Tracks whether a user triggered a specific event (yes/no).
Example: Did the user complete a checkout? Did the user click the campaign?
The metric value shown for a variant is the conversion rate — the share of users who triggered the event at least once (0–1), not a raw count. Outlier filtering does not apply to BINARY metrics, since the underlying value is already capped at 0 or 1 per user.
Every configured metric now always reports a result for every variant in the experiment, including variants with zero qualifying events or zero purchases — these are shown with a metric value of 0 rather than being omitted from the results.
Evaluation Output
For each metric and variant combination, the module calculates:
|
Metric |
Description |
|---|---|
|
Metric value |
Aggregated metric value for the variant |
|
Number of users |
Count of unique users in the variant |
|
Uplift vs baseline |
Percentage improvement compared to the control group |
|
Probability of beating baseline |
Bayesian probability that the variant outperforms the control (0–1) |
|
Probability of being the best |
Bayesian probability that the variant is the best among all variants (0–1) |
A probability of beating baseline of 0.95 means there is 95% confidence that the variant performs better than the control group.
Entry Conditions
Users can enter an experiment in two ways:
-
Immediate entry — Users enter the experiment as soon as they are assigned to a variant. Metrics are counted from the assignment date.
-
Conditional entry — Users must trigger a specific CDP event after being assigned before they enter the experiment. Only events after the entry condition is met count towards metrics.
This allows you to measure the impact of a personalization only for users who actually encounter it.
Configuration
|
Property |
Description |
Default |
|---|---|---|
|
Schedule (Cron) |
How often evaluation runs |
Disabled |
|
Exclude Evaluation Dates |
Toggle to enable dropping specific dates from evaluation |
Disabled |
|
Excluded Dates |
List of dates ( |
Empty |
|
CPU |
CPU allocation (500m, 1000m) |
500m |
|
Memory |
Memory allocation (4Gi, 8Gi, 16Gi) |
4Gi |
Experiment setup (variants, metrics, entry conditions) is configured through the AB Testing service, not through the module's cockpit properties.
Excluding evaluation dates
If specific dates are known to have tracking or instrumentation issues (e.g. a tag manager outage, a broken event integration), you can exclude them from evaluation without needing to recreate the experiment:
-
Enable the Exclude Evaluation Dates toggle.
-
Add the affected dates in
YYYY-MM-DDformat to the Excluded Dates list.
Excluded dates are dropped symmetrically from every variant (control and treatment alike), both from the aggregated metric values and from the daily active user counts used as the basis for those metrics — so the comparison between variants stays fair. By default this feature is disabled and no dates are excluded, so existing modules are unaffected until explicitly configured.
Edge Cases
-
Multiple variant assignments — If a user is assigned to multiple variants within the same experiment, they are excluded from results entirely (indicates a data quality issue).
-
Entry condition not met — Users who never trigger the required entry event are excluded from metric aggregation but still tracked in assignment data.
-
No CDP events for a variant — Variants with no matching events still appear in the results with a metric value of 0 and are counted in daily active user counts.
-
Outlier users — The top 1% of users by metric value are excluded per variant before analysis for COUNT and REVENUE metrics (not BINARY), to prevent bots (e.g. abnormally high event counts) or extreme purchases (e.g. a single very large order) from skewing results.
-
Staging environments — Evaluation still runs on staging, but results are not sent to the AB Testing service, to avoid overwriting production evaluation results with stale data.
-
Excluded evaluation dates — Any dates configured under Exclude Evaluation Dates are dropped from both the metric values and the user counts, for every variant, before analysis runs.
Use Cases
-
Web personalization testing — Measure the impact of personalized vs. non-personalized recommendations on your website.
-
Feature rollout — Gradually roll out new web personalization features and measure their impact before full deployment.
-
Content testing — Test different content variants (product ordering, layout, recommendation strategies) on engagement metrics.
-
Click-based engagement testing — Web personalization can send "personalization clicked" events to the CDP when click tracking is enabled. These can be configured as a COUNT metric (number of clicks) or BINARY metric (did the user click at all) to measure how a personalization influences engagement, not just conversion. Note: today this only tells you whether and how often a personalization was clicked — it does not yet distinguish which product within the personalization was clicked (see note below).
A note on personalization click data: in the future, "personalization clicked" events will also capture which product was clicked, not just that a click occurred. This is not yet surfaced in AB Testing metrics — today, a click-based metric can only measure whether/how often a personalization was clicked, not which specific product drove the click. This is a candidate for a future enhancement once the CXP pipeline is in production.