AB Testing

Subscription: PRO

The AB Testing module provides experiment management and statistical evaluation capabilities. It processes experiment assignments, aggregates user-level metrics, and runs Bayesian statistical analysis to determine which variant performs best.

Note: AB testing is currently only applicable to web personalization. AB testing for email campaigns is not yet supported.

‌

Key Requirements

Before setting up the AB Testing module, ensure that:

The PRO subscription is active.
The AB Testing service is configured and operational.
Web personalization is configured — AB testing currently only applies to web personalization experiments.
CDP events are flowing for metrics calculation (the module uses CDP events to measure experiment outcomes).

‌

How It Works

The module orchestrates the complete A/B testing evaluation workflow:

Experiment retrieval — Fetches active experiments from the AB Testing configuration service
User assignment processing — Determines which users are assigned to which experiment variants
Entry condition evaluation — Checks whether assigned users meet the conditions to enter the experiment (e.g., must trigger a specific event after assignment)
Metrics aggregation — Aggregates user-level metrics for each variant based on CDP events
Bayesian analysis — Runs statistical evaluation to determine variant performance and winning probability
Results publishing — Sends evaluation results back to the AB Testing service

Metric Types

The module supports three types of metrics:

COUNT

Counts the number of events of a specific type per user per day.

Example: Number of page views, number of add-to-cart actions

The metric value shown for a variant is the average count per user (including users with zero events). Before analysis, the top 1% of users by event count is excluded per variant to reduce distortion from outliers such as bots generating abnormal volumes of events.

REVENUE

Sums monetary amounts from events per user per day.

Example: Total purchase amount, average order value

The metric value shown for a variant is the average revenue per user (ARPU), calculated across all assigned users — including users who made no purchase (counted as €0). ARPU is modeled directly using a Bayesian bootstrap, rather than decomposing it into a separate conversion-rate model multiplied by an average-order-value model. This avoids unstable results in low-purchase-volume experiments, where the older approach could produce contradictory outcomes (e.g. a variant showing both the lower average value and the higher probability of being best). As with COUNT, the top 1% of users by revenue is excluded per variant before analysis to reduce distortion from outliers (e.g. a single very large order).

BINARY

Tracks whether a user triggered a specific event (yes/no).

Example: Did the user complete a checkout? Did the user click the campaign?

The metric value shown for a variant is the conversion rate — the share of users who triggered the event at least once (0–1), not a raw count. Outlier filtering does not apply to BINARY metrics, since the underlying value is already capped at 0 or 1 per user.

Every configured metric now always reports a result for every variant in the experiment, including variants with zero qualifying events or zero purchases — these are shown with a metric value of 0 rather than being omitted from the results.

‌

Evaluation Output

For each metric and variant combination, the module calculates:

Metric	Description
Metric value	Aggregated metric value for the variant
Number of users	Count of unique users in the variant
Uplift vs baseline	Percentage improvement compared to the control group
Probability of beating baseline	Bayesian probability that the variant outperforms the control (0–1)
Probability of being the best	Bayesian probability that the variant is the best among all variants (0–1)

A probability of beating baseline of 0.95 means there is 95% confidence that the variant performs better than the control group.

‌

Entry Conditions

Users can enter an experiment in two ways:

Immediate entry — Users enter the experiment as soon as they are assigned to a variant. Metrics are counted from the assignment date.
Conditional entry — Users must trigger a specific CDP event after being assigned before they enter the experiment. Only events after the entry condition is met count towards metrics.

This allows you to measure the impact of a personalization only for users who actually encounter it.

‌

Configuration

Property	Description	Default
Schedule (Cron)	How often evaluation runs	Disabled
Exclude Evaluation Dates	Toggle to enable dropping specific dates from evaluation	Disabled
Excluded Dates	List of dates (`YYYY-MM-DD`) to drop from evaluation once the toggle above is enabled	Empty
CPU	CPU allocation (500m, 1000m)	500m
Memory	Memory allocation (4Gi, 8Gi, 16Gi)	4Gi

Experiment setup (variants, metrics, entry conditions) is configured through the AB Testing service, not through the module's cockpit properties.

Excluding evaluation dates

If specific dates are known to have tracking or instrumentation issues (e.g. a tag manager outage, a broken event integration), you can exclude them from evaluation without needing to recreate the experiment:

Enable the Exclude Evaluation Dates toggle.
Add the affected dates in YYYY-MM-DD format to the Excluded Dates list.

Excluded dates are dropped symmetrically from every variant (control and treatment alike), both from the aggregated metric values and from the daily active user counts used as the basis for those metrics — so the comparison between variants stays fair. By default this feature is disabled and no dates are excluded, so existing modules are unaffected until explicitly configured.

‌

Edge Cases

Multiple variant assignments — If a user is assigned to multiple variants within the same experiment, they are excluded from results entirely (indicates a data quality issue).
Entry condition not met — Users who never trigger the required entry event are excluded from metric aggregation but still tracked in assignment data.
No CDP events for a variant — Variants with no matching events still appear in the results with a metric value of 0 and are counted in daily active user counts.
Outlier users — The top 1% of users by metric value are excluded per variant before analysis for COUNT and REVENUE metrics (not BINARY), to prevent bots (e.g. abnormally high event counts) or extreme purchases (e.g. a single very large order) from skewing results.
Staging environments — Evaluation still runs on staging, but results are not sent to the AB Testing service, to avoid overwriting production evaluation results with stale data.
Excluded evaluation dates — Any dates configured under Exclude Evaluation Dates are dropped from both the metric values and the user counts, for every variant, before analysis runs.

‌

Use Cases

Web personalization testing — Measure the impact of personalized vs. non-personalized recommendations on your website.
Feature rollout — Gradually roll out new web personalization features and measure their impact before full deployment.
Content testing — Test different content variants (product ordering, layout, recommendation strategies) on engagement metrics.
Click-based engagement testing — Web personalization can send "personalization clicked" events to the CDP when click tracking is enabled. These can be configured as a COUNT metric (number of clicks) or BINARY metric (did the user click at all) to measure how a personalization influences engagement, not just conversion. Note: today this only tells you whether and how often a personalization was clicked — it does not yet distinguish which product within the personalization was clicked (see note below).

A note on personalization click data: in the future, "personalization clicked" events will also capture which product was clicked, not just that a click occurred. This is not yet surfaced in AB Testing metrics — today, a click-based metric can only measure whether/how often a personalization was clicked, not which specific product drove the click. This is a candidate for a future enhancement once the CXP pipeline is in production.