Skip to content

Commit

Permalink
Redis
Browse files Browse the repository at this point in the history
  • Loading branch information
rolodato committed Dec 26, 2024
1 parent 56e8a9f commit 2674b9a
Show file tree
Hide file tree
Showing 4 changed files with 161 additions and 175 deletions.
5 changes: 2 additions & 3 deletions docs/docs/deployment/hosting/real-time/benchmark.js
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@

import { sleep } from 'k6';
import http from 'k6/http';
import exec from 'k6/execution';

export const options = {
discardResponseBodies: true,
Expand All @@ -12,7 +11,7 @@ export const options = {
subscribe: {
exec: 'subscribers',
executor: 'ramping-vus',
stages: [{ duration: '1m', target: 20000 }],
stages: [{ duration: '1m', target: 10000 }],
},
// Publish an update to the same environment every 10s
publish: {
Expand All @@ -36,7 +35,7 @@ export function subscribers() {

export function publish() {
const body = JSON.stringify({
updated_at: exec.vu.iterationInScenario,
updated_at: new Date().toISOString(),
});
http.post(`http://localhost:8088/sse/environments/${env}/queue-change`, body, {
headers: {
Expand Down
125 changes: 97 additions & 28 deletions docs/docs/deployment/hosting/real-time/deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,50 +4,46 @@ sidebar_label: Deployment
sidebar_position: 30
---

## NATS with JetStream
## Redis

First, deploy NATS with JetStream enabled. NATS nodes can be deployed in
[many different ways](https://docs.nats.io/nats-concepts/overview). If you don't require high availability for real-time
flags, a single NATS can support tens of thousands of clients.
First, deploy a Redis-compatible store, such as [Valkey](https://valkey.io/). Many managed options for Redis-compatible
stores are offered by different cloud providers. Some examples:

The only required configuration is to enable [JetStream](https://docs.nats.io/nats-concepts/jetstream). In the
[NATS Helm chart](https://github.com/nats-io/k8s/tree/main/helm/charts/nats), set `config.jetstream` to `true`.
- [Amazon ElastiCache (AWS)](https://aws.amazon.com/elasticache/features/)
- [Memorystore for Valkey (GCP)](https://cloud.google.com/memorystore/docs/valkey/product-overview)
- [Azure cache for Redis](https://azure.microsoft.com/en-us/products/cache)

If you are using the [NATS CLI](https://github.com/nats-io/natscli) to launch NATS, use the `-js` or `--jetstream` flag:

```
nats-server --jetstream
```

See the [JetStream docs](https://docs.nats.io/running-a-nats-service/configuration/resource_management) for more
details.
Clustering Redis is recommended for high availability, but not required. All
[current Redis versions](https://redis.io/docs/latest/operate/rs/installing-upgrading/product-lifecycle/) are supported.

### Authentication

By default, NATS allows reading or writing to any subject without authentication. Several
[authentication methods](https://docs.nats.io/running-a-nats-service/configuration/securing_nats/auth_intro) can be
enabled if required.

The SSE service can only authenticate to NATS using URL-based methods.
[Token authentication](https://docs.nats.io/running-a-nats-service/configuration/securing_nats/auth_intro/tokens) is the
simplest method if you don't have other requirements.
Redis must not require a password for the default user. Options for authentication will be added in the future.

## SSE service

Run the `flagsmith/sse:v4.0.0-beta` image, setting thse environment variables:
Run the `flagsmith/sse` image, setting these environment variables:

| Variable name | Description | Default |
| ------------------------------- | ------------------------------------------------------------------------------------------------------------------------------- | ------------ |
| `SSE_AUTHENTICATION_TOKEN` | Shared secret for authentication on the `/queue-change` endpoint | **Required** |
| `REDIS_HOST` | Hostname of the Redis load balancer | `localhost` |
| `REDIS_PORT` | Port number to use when connecting to Redis | 6379 |
| `REDIS_SCAN_COUNT` | Number of Redis keys to [`SCAN`](https://redis.io/docs/latest/commands/scan/) at once when updating the in-memory cache | 500 |
| `CACHE_UPDATE_INTERVAL_SECONDS` | How long (in seconds) to wait between each `SCAN` to update the in-memory cache | 1 |
| `USE_CLUSTER_MODE` | Whether to connect to Redis with [Cluster Mode](https://redis.io/docs/latest/operate/oss_and_stack/management/scaling/) enabled | `False` |
| `REDIS_USE_SSL` | Whether to connect to Redis with SSL | `False` |
| `MAX_STREAM_AGE` | How long (in seconds) to keep SSE connections alive for. If negative, connections are kept open indefinitely | 30 |
| `STREAM_DELAY` | How long (in seconds) to wait before checking the internal cache for updates | 1 |

| Variable name | Description | Default |
| -------------------------- | ---------------------------------------------------------------- | ----------------------- |
| `NATS_URL` | URL of any NATS node | `nats://127.0.0.1:4222` |
| `LISTEN_ADDR` | Addresses to listen for HTTP connections on | `:8088` |
| `SSE_AUTHENTICATION_TOKEN` | Shared secret for authentication on the `/queue-change` endpoint | **Required** |
The SSE service will expose its endpoints to `0.0.0.0:8000`.

## API and task processor

The Flagsmith API and task processor need to know about the SSE service. On both the API and task processor, set these
environment variables:

- `SSE_SERVER_BASE_URL` points to the SSE service load balancer. For example: http://my-sse-service:8088
- `SSE_SERVER_BASE_URL` points to the SSE service load balancer. For example: `http://my-sse-service:8000`
- `SSE_AUTHENTICATION_TOKEN` can be set to any non-empty string, as long as the SSE service and task processor share the
same value.

Expand All @@ -58,3 +54,76 @@ its environments are updated.

Lastly, client applications should set their Flagsmith SDK's realtime endpoint URL to the load balancer for the SSE
service.

## Example: Docker Compose

The following Docker Compose file defines a simple Flagsmith deployment. The highlighted lines are required to support
real-time flag updates.

```yaml title="compose.yaml"
services:
# highlight-start
valkey:
image: valkey/valkey:latest
# highlight-end

# highlight-start
sse:
image: flagsmith/sse:3.3.0
environment:
SSE_AUTHENTICATION_TOKEN: changeme
REDIS_HOST: valkey
depends_on:
- valkey
# highlight-end

flagsmith:
# highlight-start
image: flagsmith/flagsmith-private-cloud:latest
# highlight-end
environment:
# highlight-start
SSE_AUTHENTICATION_TOKEN: changeme
SSE_SERVER_BASE_URL: 'http://sse:8000'
# highlight-end
DATABASE_URL: postgresql://postgres:password@postgres:5432/flagsmith
USE_POSTGRES_FOR_ANALYTICS: 'true'
ENVIRONMENT: production
DJANGO_ALLOWED_HOSTS: '*'
ALLOW_ADMIN_INITIATION_VIA_CLI: 'true'
FLAGSMITH_DOMAIN: 'localhost:8000'
DJANGO_SECRET_KEY: secret
ENABLE_ADMIN_ACCESS_USER_PASS: 'true'
TASK_RUN_METHOD: TASK_PROCESSOR
ports:
- '8000:8000'
depends_on:
- postgres

# The flagsmith_processor service is only needed if TASK_RUN_METHOD set to TASK_PROCESSOR
# in the application environment
flagsmith_processor:
image: flagsmith/flagsmith:latest
environment:
# highlight-start
SSE_AUTHENTICATION_TOKEN: changeme
SSE_SERVER_BASE_URL: 'http://sse:8000'
# highlight-end
DATABASE_URL: postgresql://postgres:password@postgres:5432/flagsmith
USE_POSTGRES_FOR_ANALYTICS: 'true'
depends_on:
- flagsmith
command: run-task-processor

postgres:
image: postgres:15.5-alpine
environment:
POSTGRES_PASSWORD: password
POSTGRES_DB: flagsmith
container_name: flagsmith_postgres
volumes:
- pgdata:/var/lib/postgresql/data

volumes:
pgdata:
```
84 changes: 30 additions & 54 deletions docs/docs/deployment/hosting/real-time/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,103 +22,79 @@ We assume you already have the [Flagsmith API](/deployment/hosting/locally-api.m
## Infrastructure

The **real-time flag updates system** is supported by additional infrastructure that your existing Flagsmith deployment
integrates will use:
integrates with:

- **Server-sent events (SSE) service containers**, running the private
[`flagsmith/sse`](https://hub.docker.com/repository/docker/flagsmith/sse) Docker image (tag `v4.0.0-beta` or later).
These serve the real-time endpoint that Flagsmith clients can connect to, and receive updates from the
[`flagsmith/sse`](https://hub.docker.com/repository/docker/flagsmith/sse) Docker image. These serve the real-time
endpoint that Flagsmith clients can connect to, and receive updates from the
[task processor](/deployment/configuration/task-processor).
- A **[NATS](https://docs.nats.io/)** cluster with [JetStream](https://docs.nats.io/nats-concepts/jetstream) persistent
storage, which guarantees at-least-once delivery for updates.
[What is NATS\?](https://docs.nats.io/nats-concepts/what-is-nats)
- A Redis-compatible key-value store that the [task processor](/deployment/configuration/task-processor.md) and SSE
service can connect to.

This diagram shows how all the components initiate connections to each other:

```mermaid
graph LR
api[Task processor]
sse[SSE service]
nats[NATS cluster]
nats[Redis]
client1[Client]
client2[Client]
api -->|Publish| sse
sse -->|Publish| nats
sse -->|Subscribe| nats
sse -->|Fetch| nats
client1 -->|Connect| sse
client2 -->|Connect| sse
```

The task processor publishes to the SSE service instead of NATS to support a previous architecture for real-time flags
that did not use NATS. We may add the option to have the task processor or Flagsmith API publish directly to NATS in the
future.

## How it works

Real-time flags use a fully distributed and horizontally scalable architecture. Any SSE service instance or NATS server
can respond to any client's request. All components can be scaled out or in as needed. Stateful or sticky sessions are
not used.
Real-time flags use a fully distributed and horizontally scalable architecture. Any SSE service instance can respond to
any client's request. All components can be scaled out or in as needed. Stateful or sticky sessions are not used.

The following sequence diagram shows how a Flagsmith client application connects to the real-time updates stream and
receives messages when the environment is updated.

```mermaid
sequenceDiagram
loop Every second, background
SSE service->>Redis: Fetch update timestamps for all environments
SSE service-->SSE service: Update in-memory cache
end
Client->>SSE service: Subscribe to environment updates
SSE service->>Client: Send latest known update timestamp
SSE service->>NATS: Subscribe to environment's subject
SSE service->>Client: Send latest update timestamp
loop Every second, per subscriber
SSE service-->SSE service: Read latest update from in-memory cache
SSE service->>Client: Send latest update timestamp, if changed
Client-->Client: Store latest update timestamp
end
Task processor->>SSE service: Notify environment updated
SSE service->>NATS: Publish to environment's subject
NATS->>SSE service: Receive environment message
SSE service->>Client: Notify environment updated
SSE service->>NATS: Acknowledge update message
Client-->Client: Store latest update timestamp
SSE service->>Redis: Store latest update timestamp
```

### SSE service

The **server-sent events (SSE)** service provides the real-time API endpoints that Flagsmith clients connect to. Clients
connect to any service instance and hold an HTTP connection open for as long as they want to receive updates over SSE.

This service also accepts HTTP requests from the Flagsmith task processor to get notified of environment updates. NATS
is used as the messaging system to ensure these updates are distributed to clients.
This service also accepts HTTP requests from the Flagsmith task processor to get notified of environment updates. Redis
is used as the storage layer to distribute these updates to clients.

[HTTP/2 is recommended](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events)
for client connections, especially if the clients are web browsers. h2c (HTTP/2 over plaintext TCP) is supported, but
TLS is strongly recommended for performance and security.

### NATS

NATS persists a subject for each environment using a JetStream stream. SSE service instances subscribe and publish to
these subjects and fan out the updates over SSE to the relevant connected clients.

Persistent storage is required to guarantee at-least-once delivery. In practice, this is very little storage as messages
are small (~100 bytes) and persist in the stream for a short time.

While NATS can function without persistent storage, the SSE service requires it. This allows us to support use cases
such as emergency circuit breakers with higher reliability than if we were using only Core NATS, and mitigates the
impact of any SSE instance suddenly becoming unavailable.
for client connections, especially if the clients are web browsers.

<details>
### Redis

<summary>
Redis stores a key for each environment containing a timestamp of when it was last updated. SSE service instances will
periodically fetch all environment's keys and cache them in memory. This cache is used to publish update notifications
to connected clients.

What does "at-least-once delivery" mean exactly?

</summary>

NATS provides an at-least-once delivery guarantee only for the SSE service. If NATS acknowledges a write, all SSE
service instances with clients subscribed to that environment are guaranteed to eventually receive the update at least
once. This guarantee does not extend to the clients of the SSE service.

Each SSE service instance creates one NATS consumer per connected client. Messages are acknowledged to NATS only if the
SSE service was able to write out the response for that client over TCP. This does not guarantee that the client
actually received the message, e.g. if intermediate proxies accept these messages but do not deliver them to clients.

</details>
If Redis or its stored data are unavailable, clients will not be able to receive updates.

## How to use it

Refer to the [deployment guide](deployment) for instructions on setting up the required infrastructure.

The `flagsmith/sse` service provides the following HTTP endpoints:

| Method | Route | Called by | Description | Authentication |
Expand Down
Loading

0 comments on commit 2674b9a

Please sign in to comment.