Skip to content

Commit

Permalink
docs: add sops (#147)
Browse files Browse the repository at this point in the history
Signed-off-by: flakey5 <73616808+flakey5@users.noreply.github.com>
Co-authored-by: Brian Muenzenmeyer <brian.muenzenmeyer@gmail.com>
  • Loading branch information
flakey5 and bmuenzenmeyer authored Dec 22, 2024
1 parent ccc7553 commit 643077a
Show file tree
Hide file tree
Showing 13 changed files with 322 additions and 38 deletions.
6 changes: 4 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,9 @@ The steps below will give you a general idea of how to prepare your local enviro
npm run test:e2e
```
10. Once you're happy with your changes, add and commit them to your branch, then push the branch to your fork.
10. To run the worker locally, see [Dev Setup](./docs/dev-setup.md).
11. Once you're happy with your changes, add and commit them to your branch, then push the branch to your fork.

```bash
git add .
Expand All @@ -119,7 +121,7 @@ The steps below will give you a general idea of how to prepare your local enviro
> [!IMPORTANT]\
> Before committing and opening a Pull Request, please go first through our [Commit](#commit-guidelines) and [Pull Request](#pull-request-policy) guidelines outlined below.

11. Create a Pull Request.
12. Create a Pull Request.

### CLI Commands

Expand Down
12 changes: 12 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Documentation

Documentation for the Release Worker.

## Table of Contents

- [Architecture](./architecture.md)
- [Dev Setup](./dev-setup.md)
- [Debugging Production](./debugging-prod.md)
- [Deploying](./deploying.md)
- [R2](./r2.md)
- [Node.js Release Process](./release-process.md)
55 changes: 55 additions & 0 deletions docs/architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Architecture

Documentation on the architecture of the worker (i.e. how it works, how it fits into Node.js' infrastructure, etc.).

## Network Request Flow

A high-level overview of how a request flows through Node.js' infrastructure:

```mermaid
flowchart LR
request[Request] --> cloudflare(Cloudflare Routing Rules)
cloudflare -- /dist/, /download/, /docs/, /api/, /metrics/ --> worker@{ shape: procs, label: "Release Worker"}
cloudflare -- /... --> website(Website)
worker -- Cache miss --> r2[(R2 bucket)]
worker -- Error --> originServer(Origin Server)
originServer
website
r2
```

## Worker Request Flow

The Release Worker uses a middleware approach to routing requests.

When an instance of the worker starts up, it registers a number of routes and their middlewares.
It then builds a "chain" of middlewares to call in the same order they're given to handle the request.

When a request hits the worker, the router gives it to the first middleware in the chain.
That middleware can then either handle the request and return a response or pass it onto the next middleware.
This goes on until the request is handled or we run out of middlewares to handle the request, upon which we throw an error.

We currently have the following middlewares (in no particular order):

- [CacheMiddleware](../src/middleware/cacheMiddleware.ts) - Caches responses to GET request.
- [R2Middleware](../src/middleware/r2Middleware.ts) - Fetches resource from R2.
- [OriginMiddleware](../src/middleware/originMiddleware.ts) - Fetches resource from the origin server.
Used as a fallback if the R2 middleware fails.
- [NotFoundMiddleware](../src/middleware/notFoundMiddleware.ts) - Handles not found requests.
- [OptionsMiddleware](../src/middleware/optionsMiddleware.ts) - Handles OPTIONS requests.
- [SubstituteMiddleware](../src/middleware/subtituteMiddleware.ts) - Handles requests that need URL substituing (i.e. `/dist/latest/` -> `/dist/<latest version>`) and then feeds them back into the router.

### Diagram

```mermaid
flowchart TD
request[Request] --> worker(Release Worker)
worker --> routerHandle("Router.handle")
routerHandle -- HTTP GET --> cacheMiddleware("Cache Middleware")
routerHandle -- HTTP HEAD --> r2Middleware
routerHandle -- HTTP OPTIONS --> optionsMiddleware("Options Middleware")
routerHandle -- Request --> substituteMiddleware("Substitute Middleware")
substituteMiddleware -- Substituted Request --> routerHandle
cacheMiddleware -- Cache miss --> r2Middleware("R2 Middleware")
r2Middleware -- Error --> originMiddleware("Origin Middleware")
```
16 changes: 16 additions & 0 deletions docs/debugging-prod.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Debugging Prod

Steps to aid with debugging the Release Worker's production environment.

> [!NOTE]
> This is mostly meant for Node.js Web Infra team members.
> Some of these steps require access to resources only made available to Collaborators.
## Steps

- Check [Sentry](https://nodejs-org.sentry.io/issues/?project=4506191181774848).
All errors should be reported here.

- If a local reproduction is found, Cloudflare has an implementation of [Chrome's DevTools](https://developers.cloudflare.com/workers/observability/dev-tools/).

- Cloudflare provides basic stats on the worker's Cloudflare dash page [here](https://dash.cloudflare.com/07be8d2fbc940503ca1be344714cb0d1/workers/services/view/dist-worker/production).
19 changes: 0 additions & 19 deletions docs/deploy.md

This file was deleted.

11 changes: 11 additions & 0 deletions docs/deploying.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Deploying the Worker

Guide on how to deploy the Release Worker.

## Staging Deployments

The Release Worker is automatically deployed to its staging environment when a new commit is pushed to the `main` branch through the [Deploy Worker](https://github.com/nodejs/release-cloudflare-worker/actions/workflows/deploy.yml) action.

## Production Deployments

The Release Worker is deployed to its production environment by a Collaborator manually running the [Deploy Worker](https://github.com/nodejs/release-cloudflare-worker/actions/workflows/deploy.yml) action.
39 changes: 22 additions & 17 deletions docs/dev-setup.md
Original file line number Diff line number Diff line change
@@ -1,33 +1,38 @@
# Dev Setup

Guide to setting up this worker for development.
Documentation on how to run the Release Worker locally.

## Have Node Installed
## Steps

Node needs to be installed for the thing that serves Node downloads (latest LTS/even numbered major recommended)
### 1. Prepare environment

## Install Dependencies
Read and follow the [Getting Started](../CONTRIBUTING.md) guide to get your local environment setup.

Run `npm install`
### 2. Setup your Cloudflare account

## Testing
Currently we run the worker in [remote mode](https://developers.cloudflare.com/workers/testing/local-development/#develop-using-remote-resources-and-bindings) as there isn't a nice way to locally populate an R2 bucket.
This means that, to run the Release Worker locally, you must have a Cloudflare account that has an R2 bucket named
`dist-prod`.
You will also need to populate the bucket yourself.

To run unit tests, `npm run test:unit`. To run e2e (end-to-end) tests, `npm run test:e2e`.
Both of these will hopefully change in the future to make running the Release Worker easier.

See the [/test](../tests/) folder for more info on testing.
### 3. Create secrets for directory listings

## Running Locally
This step is optional but recommended.

Spin up a Workerd instance on your machine that serves this worker
The Release Worker uses R2's S3 API for directory listings.
In order for directory listings to work, you need to make an R2 API key for your `dist-prod` bucket and provide it to the worker.

### Login to Cloudflare Dash From Wrangler CLI
Generating the API key can be done through the Cloudflare dashboard [here](https://dash.cloudflare.com/?account=/r2/api-tokens).

Run `wrangler login`
Then, make a `.dev.vars` file in the root of this repository with the following:

### R2 Bucket
```
S3_ACCESS_KEY_ID=<your access key id>
S3_ACCESS_KEY_SECRET=<your access key secret>
```

Create a R2 bucket named `dist-prod`. This is the bucket that the worker read from. It will either need to have a copy of Node's dist folder in it or something mimicing the folder there.
### 4. Run the worker

### Starting the Local Server

Run `npm start`. This starts a Workerd instance in remote mode.
Start the worker locally with `npm start`. You may be prompted to log into your Cloudflare account.
34 changes: 34 additions & 0 deletions docs/r2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# R2

## What is it?

[R2](https://developers.cloudflare.com/r2/) is Cloudflare's blob storage provider.
We use it to store all of the release assets stored by the Release Worker.

## Noteworthy points

### Directories

R2 stores files flatly, meaning a directory does not exist in R2.

However, R2 allows characters such as slashes (/) in an object's name.
For directories we can then specify a prefix (like `nodejs/release/`) and R2 will only return objects that has a name that starts with that prefix.

### Bindings API

R2 allows integration with Workers through their [bindings API](https://developers.cloudflare.com/r2/api/workers/workers-api-usage/).
We use this when fetching files.

### S3 API

Due to some performance issues we were seeing with R2's `list` binding command, we opted to use R2's S3 API for listing directories.

### Buckets

We have two R2 buckets:

- `dist-staging` - Holds staged releases. This bucket is private and should not be publicly accessible.

- `dist-prod` - Holds released versions of Node.js. Everything in this bucket should be considered publicly accessible.

(see [Release Process](./release-process.md) for more information on how we use these buckets)
55 changes: 55 additions & 0 deletions docs/release-process.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Release Process

Documentation on the general order of events that happen when releasing a new version of Node.js

> [!NOTE]
> This focuses on the flow of release assets (binaries, doc files).
> This may not include the full process for releases (i.e. getting necessary approvals).
## Release types

### Mainline releases

Mainline releases refer to the main release branch of Node.js

### Nightly releases

Node.js has multiple release branches that are promoted nightly.

- `nightly` - Nightly builds from the `main` Node.js branch
- `v8-canary` - Builds with the latest V8 canary
- `rc` - Release candidates
- `test` - Test builds

<details>
<summary><b>Deprecated release branches</b></summary>

These branches no longer receive new releases.

- `chakracore-nightly` - Chakracore nightly builds
- `chakracore-rc` - Chakracore release candidates
- `chakracore-release` - Chakracore releases

</details>

## Release flow

### 1. Release CI is triggered

New builds are scheduled on the release CI (https://ci-release.nodejs.org).
These builds compile Node.js on the various platforms and compile the docs.

Upon a build completing successfully, the build's output (binaries, doc files) will then be uploaded to the origin server and the `dist-staging` bucket in Node.js' Cloudflare account.

The release assets synced to the origin server are under `/home/staging/nodejs/` path.
The release assets synced to the `dist-staging` bucket are under the `/nodejs/` [_prefix_](./r2.md#directories).

### 2. Release promotion

When a release is ready to be released, it is promoted.
For mainline releases, this is done by the releaser running the [`release.sh`](https://github.com/nodejs/node/tree/main/tools/release.sh) script in the Node.js repository.
For nightly releases, this is done once a day by [automated tooling](https://github.com/nodejs/build/blob/main/ansible/www-standalone/tools/promote/promote_nightly.sh).

On the origin server, the release's assets are copied from `/home/staging/nodejs/` to `/home/dist/nodejs/`.

For R2, the release's assets are copied from the `dist-staging` bucket to the `dist-prod` bucket.
9 changes: 9 additions & 0 deletions docs/sops/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Standard Operating Procedures

Documents detailing standardized processes for the Release Worker.

## Table of Contents

- [Incident Flow](./incident-flow.md)
- [Rolling Back a Release](./rolling-back-a-release.md)
- [Switching between the Worker and Origin Server](./switch-between-worker-and-origin.md)
43 changes: 43 additions & 0 deletions docs/sops/incident-flow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Incident Flow

Procedure for what to do if there's an incident with the Release Worker.

## Steps

1. If the incident was caused by a recent change, try
[rollbacking the release](./rolling-back-a-release.md).

2. If the incident affects traffic towards the Release Worker, update the Node.js status page (https://status.nodejs.org).
If it is a ongoing security incident that we cannot disclose publicly yet, do not includes the details of the incident in the status page.

- Optional, but preferably updates will be echoed on social media.

- For any prolonged incidents, please consider pinning an issue tracking the incident so as to avoid spam.

- Please also monitor any issues in repositories such as this one,
[nodejs/node](https://github.com/nodejs/node),
and [nodejs/nodejs.org](https://github.com/nodejs/nodejs.org)
for users asking about the incident and link them to the status page.

3. [Steps for debugging the worker when it's deployed](../debugging.md)

4. If there is an ongoing security incident requiring code changes, a force push to the `main` branch can be performed by a [Collaborator](../CONTRIBUTING.md#contributing) if there is reasonable risk that opening a PR with the change would allow more bad actors to exploit the vulnerability.
The code changes must still be approved by another Collaborator before the force push is performed, however.

5. If the issue requires support from Cloudflare, try reaching out through the
`ext-nodejs-cloudflare` channel in the OpenJS Slack.

6. If needed, create an issue on this repository to serve as a discussion board
for any changes that need to be made to avoid the same incident from
happening again.

## What qualifies an an incident?

There is no exact criteria, however, these cases will most likely call for an incident to be declared:

1. The production deployment of the Release Worker is unavailable to the public or is otherwise operating in a way that impacts users' abilities to interact with it en masse.
This includes behaviors that we are responsible for and those that Cloudflare is responsible for.

2. There is a ongoing security issue that involves the production deployment of the Release Worker.

Note the Node.js Web Infrastructure, Build, and TSC teams can declare an incident wherever they see fit, however.
34 changes: 34 additions & 0 deletions docs/sops/rolling-back-a-release.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Rolling Back A Release

> [!WARNING]
> Rolling back a release should only be done when necessary,
> such as a quick-fix for an on-going incident,
> and by a [Collaborator](../CONTRIBUTING.md#contributing).
> The Web Infrastructure team should be aware each time this happens.
## Option A: via Github Actions

This is the preferred way, but takes a little bit longer.

1. Create a new branch

2. [Revert the commit](https://git-scm.com/docs/git-revert)

3. Push & create a new PR

4. Merge PR & Deploy it

If the rollback is prompted by an incident where the worker is entirely unavailable (i.e. all requests failing) or there is a security vulnerability present,
a Collaborator may forcibly push the commit reverting the release onto the `main` branch.

## Option B: via Cloudflare Dash

This requires `Workers Admin` permissions on Node.js' Cloudflare account.

1. Go to the Release Worker's [deployment page](https://dash.cloudflare.com/?account=/workers/services/view/dist-worker/production/deployments)

2. Find the previously deployed version in the table

3. Click the three dots on the right side of the version's entry, then click `Rollback to v...`

4. Make a revert commit to reflect the change in Git [see Option A](#option-a-via-github-actions).
Loading

0 comments on commit 643077a

Please sign in to comment.