Redis

Flagsmith · Dec 26, 2024 · 2674b9a · 2674b9a
1 parent 56e8a9f
commit 2674b9a
Show file tree

Hide file tree

Showing 4 changed files with 161 additions and 175 deletions.
diff --git a/docs/docs/deployment/hosting/real-time/benchmark.js b/docs/docs/deployment/hosting/real-time/benchmark.js
@@ -3,7 +3,6 @@
 
 import { sleep } from 'k6';
 import http from 'k6/http';
-import exec from 'k6/execution';
 
 export const options = {
     discardResponseBodies: true,
@@ -12,7 +11,7 @@ export const options = {
         subscribe: {
             exec: 'subscribers',
             executor: 'ramping-vus',
-            stages: [{ duration: '1m', target: 20000 }],
+            stages: [{ duration: '1m', target: 10000 }],
         },
         // Publish an update to the same environment every 10s
         publish: {
@@ -36,7 +35,7 @@ export function subscribers() {
 
 export function publish() {
     const body = JSON.stringify({
-        updated_at: exec.vu.iterationInScenario,
+        updated_at: new Date().toISOString(),
     });
     http.post(`http://localhost:8088/sse/environments/${env}/queue-change`, body, {
         headers: {

diff --git a/docs/docs/deployment/hosting/real-time/deployment.md b/docs/docs/deployment/hosting/real-time/deployment.md
@@ -4,50 +4,46 @@ sidebar_label: Deployment
 sidebar_position: 30
 ---
 
-## NATS with JetStream
+## Redis
 
-First, deploy NATS with JetStream enabled. NATS nodes can be deployed in
-[many different ways](https://docs.nats.io/nats-concepts/overview). If you don't require high availability for real-time
-flags, a single NATS can support tens of thousands of clients.
+First, deploy a Redis-compatible store, such as [Valkey](https://valkey.io/). Many managed options for Redis-compatible
+stores are offered by different cloud providers. Some examples:
 
-The only required configuration is to enable [JetStream](https://docs.nats.io/nats-concepts/jetstream). In the
-[NATS Helm chart](https://github.com/nats-io/k8s/tree/main/helm/charts/nats), set `config.jetstream` to `true`.
+- [Amazon ElastiCache (AWS)](https://aws.amazon.com/elasticache/features/)
+- [Memorystore for Valkey (GCP)](https://cloud.google.com/memorystore/docs/valkey/product-overview)
+- [Azure cache for Redis](https://azure.microsoft.com/en-us/products/cache)
 
-If you are using the [NATS CLI](https://github.com/nats-io/natscli) to launch NATS, use the `-js` or `--jetstream` flag:
-
-```
-nats-server --jetstream
-```
-
-See the [JetStream docs](https://docs.nats.io/running-a-nats-service/configuration/resource_management) for more
-details.
+Clustering Redis is recommended for high availability, but not required. All
+[current Redis versions](https://redis.io/docs/latest/operate/rs/installing-upgrading/product-lifecycle/) are supported.
 
 ### Authentication
 
-By default, NATS allows reading or writing to any subject without authentication. Several
-[authentication methods](https://docs.nats.io/running-a-nats-service/configuration/securing_nats/auth_intro) can be
-enabled if required.
-
-The SSE service can only authenticate to NATS using URL-based methods.
-[Token authentication](https://docs.nats.io/running-a-nats-service/configuration/securing_nats/auth_intro/tokens) is the
-simplest method if you don't have other requirements.
+Redis must not require a password for the default user. Options for authentication will be added in the future.
 
 ## SSE service
 
-Run the `flagsmith/sse:v4.0.0-beta` image, setting thse environment variables:
+Run the `flagsmith/sse` image, setting these environment variables:
+
+| Variable name                   | Description                                                                                                                     | Default      |
+| ------------------------------- | ------------------------------------------------------------------------------------------------------------------------------- | ------------ |
+| `SSE_AUTHENTICATION_TOKEN`      | Shared secret for authentication on the `/queue-change` endpoint                                                                | **Required** |
+| `REDIS_HOST`                    | Hostname of the Redis load balancer                                                                                             | `localhost`  |
+| `REDIS_PORT`                    | Port number to use when connecting to Redis                                                                                     | 6379         |
+| `REDIS_SCAN_COUNT`              | Number of Redis keys to [`SCAN`](https://redis.io/docs/latest/commands/scan/) at once when updating the in-memory cache         | 500          |
+| `CACHE_UPDATE_INTERVAL_SECONDS` | How long (in seconds) to wait between each `SCAN` to update the in-memory cache                                                 | 1            |
+| `USE_CLUSTER_MODE`              | Whether to connect to Redis with [Cluster Mode](https://redis.io/docs/latest/operate/oss_and_stack/management/scaling/) enabled | `False`      |
+| `REDIS_USE_SSL`                 | Whether to connect to Redis with SSL                                                                                            | `False`      |
+| `MAX_STREAM_AGE`                | How long (in seconds) to keep SSE connections alive for. If negative, connections are kept open indefinitely                    | 30           |
+| `STREAM_DELAY`                  | How long (in seconds) to wait before checking the internal cache for updates                                                    | 1            |
 
-| Variable name              | Description                                                      | Default                 |
-| -------------------------- | ---------------------------------------------------------------- | ----------------------- |
-| `NATS_URL`                 | URL of any NATS node                                             | `nats://127.0.0.1:4222` |
-| `LISTEN_ADDR`              | Addresses to listen for HTTP connections on                      | `:8088`                 |
-| `SSE_AUTHENTICATION_TOKEN` | Shared secret for authentication on the `/queue-change` endpoint | **Required**            |
+The SSE service will expose its endpoints to `0.0.0.0:8000`.
 
 ## API and task processor
 
 The Flagsmith API and task processor need to know about the SSE service. On both the API and task processor, set these
 environment variables:
 
-- `SSE_SERVER_BASE_URL` points to the SSE service load balancer. For example: http://my-sse-service:8088
+- `SSE_SERVER_BASE_URL` points to the SSE service load balancer. For example: `http://my-sse-service:8000`
 - `SSE_AUTHENTICATION_TOKEN` can be set to any non-empty string, as long as the SSE service and task processor share the
   same value.
 
@@ -58,3 +54,76 @@ its environments are updated.
 
 Lastly, client applications should set their Flagsmith SDK's realtime endpoint URL to the load balancer for the SSE
 service.
+
+## Example: Docker Compose
+
+The following Docker Compose file defines a simple Flagsmith deployment. The highlighted lines are required to support
+real-time flag updates.
+
+```yaml title="compose.yaml"
+services:
+ # highlight-start
+ valkey:
+  image: valkey/valkey:latest
+  # highlight-end
+
+  # highlight-start
+ sse:
+  image: flagsmith/sse:3.3.0
+  environment:
+   SSE_AUTHENTICATION_TOKEN: changeme
+   REDIS_HOST: valkey
+  depends_on:
+   - valkey
+ # highlight-end
+
+ flagsmith:
+  # highlight-start
+  image: flagsmith/flagsmith-private-cloud:latest
+  # highlight-end
+  environment:
+   # highlight-start
+   SSE_AUTHENTICATION_TOKEN: changeme
+   SSE_SERVER_BASE_URL: 'http://sse:8000'
+   # highlight-end
+   DATABASE_URL: postgresql://postgres:password@postgres:5432/flagsmith
+   USE_POSTGRES_FOR_ANALYTICS: 'true'
+   ENVIRONMENT: production
+   DJANGO_ALLOWED_HOSTS: '*'
+   ALLOW_ADMIN_INITIATION_VIA_CLI: 'true'
+   FLAGSMITH_DOMAIN: 'localhost:8000'
+   DJANGO_SECRET_KEY: secret
+   ENABLE_ADMIN_ACCESS_USER_PASS: 'true'
+   TASK_RUN_METHOD: TASK_PROCESSOR
+  ports:
+   - '8000:8000'
+  depends_on:
+   - postgres
+
+ # The flagsmith_processor service is only needed if TASK_RUN_METHOD set to TASK_PROCESSOR
+ # in the application environment
+ flagsmith_processor:
+  image: flagsmith/flagsmith:latest
+  environment:
+   # highlight-start
+   SSE_AUTHENTICATION_TOKEN: changeme
+   SSE_SERVER_BASE_URL: 'http://sse:8000'
+   # highlight-end
+   DATABASE_URL: postgresql://postgres:password@postgres:5432/flagsmith
+   USE_POSTGRES_FOR_ANALYTICS: 'true'
+  depends_on:
+   - flagsmith
+  command: run-task-processor
+
+ postgres:
+  image: postgres:15.5-alpine
+  environment:
+   POSTGRES_PASSWORD: password
+   POSTGRES_DB: flagsmith
+  container_name: flagsmith_postgres
+  volumes:
+   - pgdata:/var/lib/postgresql/data
+
+volumes:
+ pgdata:
+```
diff --git a/docs/docs/deployment/hosting/real-time/index.md b/docs/docs/deployment/hosting/real-time/index.md
@@ -22,103 +22,79 @@ We assume you already have the [Flagsmith API](/deployment/hosting/locally-api.m
 ## Infrastructure
 
 The **real-time flag updates system** is supported by additional infrastructure that your existing Flagsmith deployment
-integrates will use:
+integrates with:
 
 - **Server-sent events (SSE) service containers**, running the private
-  [`flagsmith/sse`](https://hub.docker.com/repository/docker/flagsmith/sse) Docker image (tag `v4.0.0-beta` or later).
-  These serve the real-time endpoint that Flagsmith clients can connect to, and receive updates from the
+  [`flagsmith/sse`](https://hub.docker.com/repository/docker/flagsmith/sse) Docker image. These serve the real-time
+  endpoint that Flagsmith clients can connect to, and receive updates from the
   [task processor](/deployment/configuration/task-processor).
-- A **[NATS](https://docs.nats.io/)** cluster with [JetStream](https://docs.nats.io/nats-concepts/jetstream) persistent
-  storage, which guarantees at-least-once delivery for updates.
-  [What is NATS\?](https://docs.nats.io/nats-concepts/what-is-nats)
+- A Redis-compatible key-value store that the [task processor](/deployment/configuration/task-processor.md) and SSE
+  service can connect to.
 
 This diagram shows how all the components initiate connections to each other:
 
 ```mermaid
 graph LR
     api[Task processor]
     sse[SSE service]
-    nats[NATS cluster]
+    nats[Redis]
     client1[Client]
     client2[Client]
 
     api -->|Publish| sse
-    sse -->|Publish| nats
-    sse -->|Subscribe| nats
+    sse -->|Fetch| nats
     client1 -->|Connect| sse
     client2 -->|Connect| sse
 ```
 
-The task processor publishes to the SSE service instead of NATS to support a previous architecture for real-time flags
-that did not use NATS. We may add the option to have the task processor or Flagsmith API publish directly to NATS in the
-future.
-
 ## How it works
 
-Real-time flags use a fully distributed and horizontally scalable architecture. Any SSE service instance or NATS server
-can respond to any client's request. All components can be scaled out or in as needed. Stateful or sticky sessions are
-not used.
+Real-time flags use a fully distributed and horizontally scalable architecture. Any SSE service instance can respond to
+any client's request. All components can be scaled out or in as needed. Stateful or sticky sessions are not used.
 
 The following sequence diagram shows how a Flagsmith client application connects to the real-time updates stream and
 receives messages when the environment is updated.
 
 ```mermaid
 sequenceDiagram
+    loop Every second, background
+      SSE service->>Redis: Fetch update timestamps for all environments
+      SSE service-->SSE service: Update in-memory cache
+    end
     Client->>SSE service: Subscribe to environment updates
-    SSE service->>Client: Send latest known update timestamp
-    SSE service->>NATS: Subscribe to environment's subject
+    SSE service->>Client: Send latest update timestamp
+    loop Every second, per subscriber
+      SSE service-->SSE service: Read latest update from in-memory cache
+      SSE service->>Client: Send latest update timestamp, if changed
+      Client-->Client: Store latest update timestamp
+    end
     Task processor->>SSE service: Notify environment updated
-    SSE service->>NATS: Publish to environment's subject
-    NATS->>SSE service: Receive environment message
-    SSE service->>Client: Notify environment updated
-    SSE service->>NATS: Acknowledge update message
-    Client-->Client: Store latest update timestamp
+    SSE service->>Redis: Store latest update timestamp
 ```
 
 ### SSE service
 
 The **server-sent events (SSE)** service provides the real-time API endpoints that Flagsmith clients connect to. Clients
 connect to any service instance and hold an HTTP connection open for as long as they want to receive updates over SSE.
 
-This service also accepts HTTP requests from the Flagsmith task processor to get notified of environment updates. NATS
-is used as the messaging system to ensure these updates are distributed to clients.
+This service also accepts HTTP requests from the Flagsmith task processor to get notified of environment updates. Redis
+is used as the storage layer to distribute these updates to clients.
 
 [HTTP/2 is recommended](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events)
-for client connections, especially if the clients are web browsers. h2c (HTTP/2 over plaintext TCP) is supported, but
-TLS is strongly recommended for performance and security.
-
-### NATS
-
-NATS persists a subject for each environment using a JetStream stream. SSE service instances subscribe and publish to
-these subjects and fan out the updates over SSE to the relevant connected clients.
-
-Persistent storage is required to guarantee at-least-once delivery. In practice, this is very little storage as messages
-are small (~100 bytes) and persist in the stream for a short time.
-
-While NATS can function without persistent storage, the SSE service requires it. This allows us to support use cases
-such as emergency circuit breakers with higher reliability than if we were using only Core NATS, and mitigates the
-impact of any SSE instance suddenly becoming unavailable.
+for client connections, especially if the clients are web browsers.
 
-<details>
+### Redis
 
-<summary>
+Redis stores a key for each environment containing a timestamp of when it was last updated. SSE service instances will
+periodically fetch all environment's keys and cache them in memory. This cache is used to publish update notifications
+to connected clients.
 
-What does "at-least-once delivery" mean exactly?
-
-</summary>
-
-NATS provides an at-least-once delivery guarantee only for the SSE service. If NATS acknowledges a write, all SSE
-service instances with clients subscribed to that environment are guaranteed to eventually receive the update at least
-once. This guarantee does not extend to the clients of the SSE service.
-
-Each SSE service instance creates one NATS consumer per connected client. Messages are acknowledged to NATS only if the
-SSE service was able to write out the response for that client over TCP. This does not guarantee that the client
-actually received the message, e.g. if intermediate proxies accept these messages but do not deliver them to clients.
-
-</details>
+If Redis or its stored data are unavailable, clients will not be able to receive updates.
 
 ## How to use it
 
+Refer to the [deployment guide](deployment) for instructions on setting up the required infrastructure.
+
 The `flagsmith/sse` service provides the following HTTP endpoints:
 
 | Method | Route                                          | Called by           | Description                                                    | Authentication                   |