diff --git a/website/docs/feature-flag-tutorials/use-cases/a-b-testing.md b/website/docs/feature-flag-tutorials/use-cases/a-b-testing.md index 791a2cc9aaf6..082fb956d023 100644 --- a/website/docs/feature-flag-tutorials/use-cases/a-b-testing.md +++ b/website/docs/feature-flag-tutorials/use-cases/a-b-testing.md @@ -1,112 +1,235 @@ --- -title: How to do A/B Testing +title: How to do A/B Testing using Feature Flags slug: /feature-flag-tutorials/use-cases/a-b-testing --- -## What is A/B Testing? +Feature flags are a great way to run A/B or multivariate tests with minimal code modifications, and Unleash offers built-in features that make it easy to get started. In this tutorial, we will walk through how to do an A/B test using Unleash with your application. -**A/B testing** is a randomized controlled experiment where you test two or more versions of a feature to see which version performs better. If you have more than two versions, it's known as multivariate testing. Coupled with analytics, A/B and multivariate testing enable you to better understand your users and how to serve them better. +## How to Perform A/B Testing with Feature Flags -Feature flags are a great way to run A/B tests to decouple them from your code, and Unleash ships with features to make it easy to get started with. In this tutorial, we will walk through how to do an A/B test using Unleash with your application. +To follow along with this tutorial, you need access to an Unleash instance to create and manage feature flags. Head over to our [Quick Start documentation](/quickstart) for options, including running locally or using an [Unleash SaaS instance](https://www.getunleash.io/pricing?). -## How to Perform A/B Testing with Unleash +With Unleash set up, you can use your application to talk to Unleash through one of our [SDKs](/reference/sdks). -To follow along with this tutorial, you will need an Unleash instance. If you’d prefer to self-host Unleash, read our [Quickstart guide](/quickstart). Alternatively, if you’d like your project to be hosted by Unleash, go to [www.getunleash.io](https://www.getunleash.io/pricing?_gl=1*1ytmg93*_gcl_au*MTY3MTQxNjM4OS4xNzIxOTEwNTY5*_ga*OTkzMjI0MDMwLjE3MDYxNDc3ODM.*_ga_492KEZQRT8*MTcyNzQzNTQwOS4yMzcuMS4xNzI3NDM1NDExLjU4LjAuMA). +In this tutorial, you will learn how to set up and run an A/B test using feature flags. You will learn: -With Unleash set up, you can use your application to talk to Unleash through one of our SDKs. +1. [How to use feature flags to define variants of your application for testing](#create-a-feature-flag) +2. [Target specific users for each test variant](#target-users-for-ab-testing) +3. [Manage cross-session visibility of test variants](#manage-user-session-behavior) +4. [Connect feature flag impression data to conversion outcomes](#track-ab-testing-for-your-key-performance-metrics) +5. [Roll out the winning variant to all users](#rollout-the-winning-variant-to-all-users) -To conduct an A/B test, we will need to create the feature flag that will implement an activation strategy. In the next section, we will explore what strategies are and how they are configured in Unleash. +You will also learn about how to [automate advanced A/B testing strategies](#multi-arm-bandit-tests-to-find-the-winning-variant) such as multi-arm bandit testing using feature flags. -In the projects view, the Unleash platform shows a list of feature flags that you’ve generated. Click on the ‘New Feature Flag' button to create a new feature flag. +### Create a Feature Flag -![Create a new feature flag in Unleash.](/img/react-tutorial-create-new-flag.png) +To do A/B testing, we'll create a feature flag to implement the rollout strategy. After that, we'll explore what strategies are and how they are configured in Unleash. -Next, you will create a feature flag on the platform and turn it on for your app. +In the Unleash Admin UI, open a project and click **New feature flag**. -Flags can be used with different purposes and we consider experimentation important enough to have its own flag type. Experimentation flags have a lifetime expectancy suited to let you run an experiment and gather enough data to know whether it was a success or not. Learn more about [feature flag types](/reference/feature-toggles#feature-flag-types) in our documentation. +![Create a new feature flag in the Unleash Admin UI.](/img/use-case-new-flag.png) -The feature flag we are creating is considered an ‘Experimentation’ flag type. The project will be ‘Default’ or the named project in which you are working in for the purpose of this tutorial. As the number of feature flags grows, you can organize them in your projects. +Next, you will create a feature flag and turn it on. -Read our docs on [Projects](/reference/projects) to learn more about how to configure and manage them for your team/organization. A description of the flag can help properly identify its specific purposes. However, this field is optional. +Feature flags can be used for different purposes and we consider experimentation important enough to have its own flag type. Experimentation flags have a lifetime expectancy suited for running an experiment and gathering enough data to know whether the experiment was a success or not. The feature flag we are creating is considered an Experiment flag type. -![Create a feature flag by filling out the form fields.](/img/react-tutorial-create-flag-form.png) +![Create a feature flag by filling out the form fields.](/img/use-case-create-experiment-flag.png) -Once you have completed the form, you can click ‘Create feature flag’. +Once you have completed the form, click **Create feature flag**. -Your new feature flag has been created and is ready to be used. Upon returning to your projects view, enable the flag for your development environment, which makes it accessible to use in your app. +Your new feature flag is now ready to be used. Next, we will configure the A/B testing strategy for your flag. -![Enable the development environment for your feature flag for use in your application.](/img/tutorial-enable-dev-env.png) +### Target Users for A/B Testing -Next, we will configure the A/B testing strategy for your new flag. +With an A/B testing strategy, you’ll be able to: -### Implementing a Default Activation Strategy for A/B Testing +- Determine the percentage of users exposed to the new feature +- Determine the percentage of users that get exposed to each version of the feature -An important Unleash concept that enables developers to perform an A/B test is an [activation strategy](/reference/activation-strategies). An activation strategy defines who will be exposed to a particular flag or flags. Unleash comes pre-configured with multiple activation strategies that let you enable a feature only for a specified audience, depending on the parameters under which you would like to release a feature. +To target users accordingly, let's create an [activation strategy](/reference/activation-strategies). This Unleash concept defines who will be exposed to a particular flag. Unleash comes pre-configured with multiple activation strategies that let you enable a feature only for a specified audience, depending on the parameters under which you would like to release a feature. ![Anatomy of an activation strategy](/img/anatomy-of-unleash-strategy.png) -Different strategies use different parameters. Predefined strategies are bundled with Unleash. The default strategy is the gradual rollout strategy with 100% rollout, which basically means that the feature is enabled for all users. In this case, we have only enabled the flag in the development environment for all users in the previous section. +Different strategies use different parameters. Predefined strategies are bundled with Unleash. The default strategy is a gradual rollout to 100%, which means that the feature is enabled for all users. In this tutorial, we'll adjust the percentage of users who have access to the feature. +:::note Activation strategies are defined on the server. For server-side SDKs, activation strategy implementation is done on the client side. For frontend SDKs, the feature is calculated on the server side. +::: -There are two more advanced extensions of a default strategy that you will see available to customize in the form: +Open your feature flag and click **Add strategy**. + +![Add your first strategy from the flag view in Unleash.](/img/use-case-experiment-add-strategy.png) + +The gradual rollout strategy form has multiple fields that control the rollout of your feature. You can name the strategy something relevant to the A/B test you’re creating, but this is an optional field. -- [Strategy Variants](/reference/strategy-variants) -- [Strategy Constraints](/reference/strategy-constraints) +![In the gradual rollout form, you can configure the parameters of your A/B tests and releases.](/img/use-case-experiment-gradual-rollout.png) -Variants and constraints are not required for A/B testing. These additional customizations can be built on top of the overall strategy should you need more granular conditions for your feature beyond the rollout percentage. +Next, configure the rollout percentage so only a certain portion of your users are targeted. For example, you can adjust the dial so that 35% of all users are targeted. The remaining percentage of users will not experience any variation of the new feature. Adjust the rollout dial to set the percentage of users the feature targets, or keep it at 100% to target all users. + +There are two more advanced extensions of a default strategy that you will see available to customize in the form: -[Strategy variants](/reference/strategy-variants) can expose a particular version of a feature to select user bases when a flag is enabled. From there, a way to use the variants is to view the performance metrics and see which is more efficient. We can create several variations of this feature to release to users and gather performance metrics to determine which one yields better results. +- [Strategy variants](/reference/strategy-variants) +- [Strategy constraints](/reference/strategy-constraints) -For A/B testing, _strategy variants_ are most applicable for more granular conditions of a feature release. In the next section, we’ll explore how to apply a strategy variant on top of an A/B test for more advanced use cases. +With strategy variants and constraints, you can extend your overall strategy. They help you define more granular conditions for your feature beyond the rollout percentage. We recommend using strategy variants to configure an A/B test. -### Applying Strategy Variants +[Strategy variants](/reference/strategy-variants) let you expose a particular version of a feature to select user bases when a flag is enabled. You can then collect data to determine which variant performs better, which we'll cover later in this tutorial. -Using strategy variants in your activation strategy is the canonical way to run A/B tests with Unleash and your application. You can expose a particular version of the feature to select user bases when a flag is enabled. From there, a way to use the variants is to view the performance metrics and see which is more efficient. +Using strategy variants in your activation strategy is the canonical way to run A/B tests with Unleash and your application. + +![This diagram breaks down how strategy variants sit on top activation strategies for flags in Unleash.](/img/tutorial-building-blocks-strategy-variants.png) A variant has four components that define it: -- **name**: This must be unique among the strategy's variants. When working with a feature with variants in a client, you will typically use the variant's name to find out which variant it is. -- **weight**: The weight is the likelihood of any one user getting this specific variant. See the weights section for more info. -- **value** -- **(optional) payload**: A variant can also have an associated payload. Use this to deliver more data or context. See the payload section for more details. +- a name: This must be unique among the strategy's variants. You typically use the name to identify the variant in your client. +- a weight: The [variant weight](/reference/strategy-variants#variant-weight) is the likelihood of any one user getting this specific variant. +- an optional payload: A variant can also have an associated [payload](/reference/strategy-variants#variant-payload) to deliver more data or context. The type defines the data format of the payload and can be one of the following options: `string`, `json`, `csv`, or `number`. +- a value: specifies the payload data associated with the variant. Define this if you want to return a value other than `enabled`/`disabled`. It must correspond with the payload type. -While teams may have different goals for measuring performance, Unleash enables you to configure a strategy for the feature variants within your application/service and the platform. +Open the gradual rollout strategy, select the **Variants** tab, and click **Add variant**. Enter a unique name for the variant. For the purpose of this tutorial, we’ve created 2 variants: `variantA` and `variantB`. In a real-world use case, we recommend more specific names to be comprehensible and relevant to the versions of the feature you’re referencing. Create additional variants if you need to test more versions. -## A/B Testing with Enterprise Security Automation +Next, decide the percentage of users to target for each variant, known as the variant weight. By default, 50% of users will be targeted between 2 variants. For example, 50% of users within the 35% of users targeted from the rollout percentage you defined earlier would experience `variantA`. Toggle **Custom percentage** to change the default variant weights. -For large-scale organizations, managing feature flags across many teams can be complex and challenging. Unleash was architected for your feature flag management to be scalable and traceable for enterprises, which boosts overall internal security posture while delivering software efficiently. +![You can configure multiple strategy variants for A/B testing within the gradual rollout form.](/img/use-case-experiment-variants.png) -After you have implemented an A/B test, we recommend managing it by: +### Manage User Session Behavior -- Tracking performance of feature releases within your application -- Reviewing audit logs of each change to your flag configurations over time by project collaborators within your organization, which is exportable for reporting -- Reviewing and approving change requests to your flags and strategy configurations +Unleash is built to give developers confidence in their ability to run A/B tests effectively. One critical component of implementing A/B testing strategies is maintaining a consistent experience for each user across multiple user sessions. -Read our documentation on how to effectively manage [feature flags at scale](/topics/feature-flags/best-practices-using-feature-flags-at-scale) while reducing security risks. Let’s walk through these recommended Unleash features in the subsequent sections. +For example, user `uuid1234` should be the target of `variantA` regardless of their session. The original subset of users that get `variantA` will continue to experience that variation of the feature over time. At Unleash, we call this [stickiness](/reference/stickiness). You can define the parameter of stickiness in the gradual rollout form. By default, stickiness is calculated by `sessionId` and `groupId`. -### Enabling Impression Data +### Track A/B Testing for your Key Performance Metrics -Once you have created a feature flag and configured your A/B test, you can use Unleash to collect insights about the ongoing results of the test. One way to collect this data is through enabling [impression data](/reference/impression-data#impression-event-data) per feature flag. Impression data contains information about a specific feature flag activation check. It’s important to review data from an A/B test, as this could inform you on how (and if) users interact with the feature you have released. +An A/B testing strategy is most useful when you can track the results of a feature rollout to users. When your team has clearly defined the goals for your A/B tests, you can use Unleash to analyze how results tie back to key metrics, like conversion rates or time spent on a page. One way to collect this data is by enabling [impression data](/reference/impression-data) per feature flag. Impression data contains information about a specific feature flag activation check. -Strategy variants are meant to work with impression data. You get the name of the variant to your analytics which allows you a better understanding of what happened, rather than seeing a simple true/false from your logs. +To enable impression data for your rollout, navigate to your feature flag form and turn the toggle on. -To enable impression data for your flag, navigate to your feature flag form and turn the toggle on. +![Enable impression data in the strategy rollout form for your flag.](/img/use-case-experiment-enable-impression-data.png) -Next, in your application code, use the SDK to capture the impression events as they are being emitted in real time. Follow [language and framework-specific tutorials](/languages-and-frameworks) to learn how to capture the events and send them to data analytics and warehouse platforms of your choice. +Next, in your application code, use the SDK to capture the impression events as they are being emitted in real time. +Your client SDK will emit an impression event when it calls `isEnabled` or `getVariant`. Some front-end SDKs emit impression events only when a flag is enabled. You can define custom event types to track specific user actions. If you want to confirm that users from your A/B test have the new feature, Unleash will receive the `isEnabled` event. If you have created variants, the `getVariant` event type will be sent to Unleash. -Now that the application is capturing impression events, you can configure the correct data fields and formatting to send to any analytics tool or data warehouse you use. +Strategy variants are meant to work with impression data. You get the name of the variant sent to your analytics tool, which allows you a better understanding of what happened, rather than seeing a simple true/false from your logs. -#### Collect Event Type Data +The output from the impression data in your app may look like this code snippet: -Your client SDK will emit an impression event when it calls `isEnabled` or `getVariant`. Some front-end SDKs emit impression events only when a flag is enabled. +```js +{ + "eventType": "getVariant", + "eventId": "c41aa58b-d2c7-45cf-b668-7267f465e01a", + "context": { + "sessionId": 386689528, + "appName": "my-example-app", + "environment": "default" + }, + "enabled": true, + "featureName": "ab-testing-example", + "impressionData": true, + "variant": "variantA" +} +``` -You can define custom event types to track specific user actions. If you want to confirm that users from your A/B test have the new feature, Unleash will receive the `isEnabled` event. If you have created one or more variations of the same feature, known as strategy variants, the `getVariant` event type will be sent to Unleash. +In order to capture impression events in your app, follow our [language and framework-specific tutorials](/languages-and-frameworks). -### Automating A/B Tests with Actions & Signals +Now that your application is capturing impression events, you can configure the correct data fields and formatting to send to any analytics tool or data warehouse you use. -Unleash provides the ability to automate your feature flags using [actions](/reference/actions) and [signals](/reference/signals). When running A/B tests, you can configure your projects to execute tasks in response to application metrics and thresholds you define. If an experimentation feature that targets a part of your user base logs errors, your actions can automatically disable the feature so your team is given the time to triage while still providing a seamless, alternative experience to users. In another case, you can use actions to modify the percentage of users targeted for variations of a feature based off users engaging with one variation more than the other. +Here are two code examples of collecting impression data in an application to send to Google Analytics: -A/B tests are performed safely and strategically with extra safeguards when you automate your flags based on user activity and other metrics of your choice. +Example 1 + +```js +unleash.on(UnleashEvents.Impression, (e: ImpressionEvent) => { + // send to google analytics, something like + gtag("event", "screen_view", { + app_name: e.context.appName, + feature: e.featureName, + treatment: e.enabled ? e.variant : "Control", // in case we use feature disabled for control + }); +}); +``` + +Example 2 + +```js +unleash.on(UnleashEvents.Impression, (e: ImpressionEvent) => { + if (e.enabled) { + // send to google analytics, something like + gtag("event", "screen_view", { + app_name: e.context.appName, + feature: e.featureName, + treatment: e.variant, // in case we use a variant for the control treatment + }); + } +}); +``` + +In these example code snippets, `e` references the event object from the impression data output. Map these values to plug into the appropriate functions that make calls to your analytics tools and data warehouses. + +In some cases like in Example 1, you may want to use the "disabled feature" state as the "Control group". + +Alternatively, in Example 2, you can expose the feature to 100% of users and use two variants: "Control" and "Test". In either case, the variants are always used for the "Test" group. The difference is determined by how you use the "Control" group. + +An advantage of having your feature disabled for the Control group is that you can use metrics to see how many of the users are exposed to experiment(s) in comparison to the ones that are not. If you use only variants (for both the test and control group), you may see the feature metric as 100% exposed and would have to look deeper into the variant to know how many were exposed. + +Here is an example of a payload that is returned from Google Analytics that includes impression event data: + +```js +{ + "client_id": "unleash_client" + "user_id": "uuid1234" + "timestamp_micros": "1730407349525000" + "non_personalized_ads": true + "events": [ + { + "name":"select_item" + "params": { + "items":[] + "event":"screen_view" + "app_name":"myAppName" + "feature":"myFeatureName" + "treatment":"variantValue" + } + } + ] +} +``` + +By enabling impression data for your feature flag and listening to events within your application code, you can leverage this data flowing to your integrated analytics tools to make informed decisions faster and adjust your strategies based on real user behavior. + +### Rollout the Winning Variant to All Users + +After you have implemented your A/B test and measured the performance of a feature to a subset of users, you can decide which variant is the most optimal experience to roll out to all users in production. + +Unleash gives you control over which environments you release your feature to, when you release the feature, and to whom. Every team's release strategy may vary, but the overarching goal of A/B testing is to select the most effective experience for users, whether it be a change in your app's UI, a web performance improvement, or backend optimizations. + +When rolling out the winning variant, your flag may already be on in your production environment. Adjust the rollout strategy configurations to release to 100% of your user base in the Unleash Admin. + +After the flag has been available to 100% of users over time, archive the flag and clean up your codebase. + +## A/B Testing with Enterprise Automation + +With Unleash, you can automate your feature flags using [actions](/reference/actions) and [signals](/reference/signals). When running A/B tests, configure your projects to execute tasks in response to application metrics and thresholds you define. If an experimentation feature that targets a part of your user base logs errors, your actions can automatically disable the feature so your team is given the time to triage while still providing a seamless, alternative experience to users. In another case, you can use actions to modify the percentage of users targeted for variations of a feature based off users engaging with one variation more than the other. + +### Multi-arm Bandit Tests to Find the Winning Variant + +When running complex multivariate tests with numerous combinations, automating the process of finding the best variation of a feature is the most optimal, cost-effective approach for organizations with a large user base. [Multi-arm bandit tests](https://en.wikipedia.org/wiki/Multi-armed_bandit) are a powerful technique used in A/B testing to allocate traffic to different versions of a feature or application in a way that maximizes the desired outcome, such as conversion rate or click-through rate. This approach offers several advantages over traditional A/B testing and is a viable solution for large enterprise teams. + +The variants you created with Unleash would be the "arms" in the multi-bandit context. You can use a multi-arm bandit algorithm, such as [epsilon-greedy](https://www.geeksforgeeks.org/epsilon-greedy-algorithm-in-reinforcement-learning/) or [Thompson sampling](https://en.wikipedia.org/wiki/Thompson_sampling), to dynamically allocate traffic based on the performance of each variant. Experiment with different variants to gather more information. Allocate more traffic to the variants that are performing better. As the test progresses, the algorithm will adjust the traffic allocation to favor the variants that are showing promising results. After completing the test, you can analyze the data to determine the winning variant. By dynamically allocating traffic based on performance, multi-arm bandit tests can identify the winning variant more quickly than traditional A/B testing. + +![This is a graph comparing traditional A/B testing and multi-arm bandit selection.](/img/use-case-ab-testing-vs-bandit.png) + +> [Image Source: Matt Gershoff](https://blog.conductrics.com/balancing-earning-with-learning-bandits-and-adaptive-optimization/) + +To use Unleash to conduct a multi-arm bandit test, follow these steps: + +1. Collect the necessary data from each variant’s performance by enabling impression data for your feature flag +2. Capture impression events in your application code +3. Funnel the impression events captured from your application code to an external analytics tool +4. Create [signal endpoints](/reference/signals) in Unleash and point them to your external analytics tools +5. Create [actions](/reference/actions) in Unleash that can react to your signals Learn how to configure [actions](/reference/actions) and [signals](/reference/signals) from our documentation to get started. + +This approach minimizes the "regret" associated with allocating traffic to lower-performing variants. Multi-arm bandit tests using Unleash can adapt to changing conditions, such as seasonal fluctuations or user behavior changes. In some cases, they can be used to ensure that users are not exposed to suboptimal experiences for extended periods. + +A/B tests are performed safely and strategically with extra safeguards when you automate your flags based on user activity and other metrics of your choice. diff --git a/website/static/img/tutorial-building-blocks-strategy-variants.png b/website/static/img/tutorial-building-blocks-strategy-variants.png new file mode 100644 index 000000000000..e1cfd4ecaa29 Binary files /dev/null and b/website/static/img/tutorial-building-blocks-strategy-variants.png differ diff --git a/website/static/img/use-case-ab-testing-vs-bandit.png b/website/static/img/use-case-ab-testing-vs-bandit.png new file mode 100644 index 000000000000..58283439e6d8 Binary files /dev/null and b/website/static/img/use-case-ab-testing-vs-bandit.png differ diff --git a/website/static/img/use-case-create-experiment-flag.png b/website/static/img/use-case-create-experiment-flag.png new file mode 100644 index 000000000000..623f218daed5 Binary files /dev/null and b/website/static/img/use-case-create-experiment-flag.png differ diff --git a/website/static/img/use-case-experiment-add-strategy.png b/website/static/img/use-case-experiment-add-strategy.png new file mode 100644 index 000000000000..bc498afd5207 Binary files /dev/null and b/website/static/img/use-case-experiment-add-strategy.png differ diff --git a/website/static/img/use-case-experiment-enable-impression-data.png b/website/static/img/use-case-experiment-enable-impression-data.png new file mode 100644 index 000000000000..d4f1e2357430 Binary files /dev/null and b/website/static/img/use-case-experiment-enable-impression-data.png differ diff --git a/website/static/img/use-case-experiment-gradual-rollout.png b/website/static/img/use-case-experiment-gradual-rollout.png new file mode 100644 index 000000000000..d87b950b5ea2 Binary files /dev/null and b/website/static/img/use-case-experiment-gradual-rollout.png differ diff --git a/website/static/img/use-case-experiment-variants.png b/website/static/img/use-case-experiment-variants.png new file mode 100644 index 000000000000..6ed1a41e5135 Binary files /dev/null and b/website/static/img/use-case-experiment-variants.png differ diff --git a/website/static/img/use-case-new-flag.png b/website/static/img/use-case-new-flag.png new file mode 100644 index 000000000000..6904b7aa7501 Binary files /dev/null and b/website/static/img/use-case-new-flag.png differ