Skip to content

Latest commit

 

History

History
295 lines (221 loc) · 11.5 KB

INFRASTRUCTURE.md

File metadata and controls

295 lines (221 loc) · 11.5 KB

shields.io

This diagram shows the current changelog.com setup:

%% https://fontawesome.com/search
graph TD
    classDef link stroke:#59b287,stroke-width:3px;
    
    %% Code & assets
    subgraph GitHub
        repo{{ fab:fa-github thechangelog/changelog.com }}:::link
        click repo "https://github.com/thechangelog/changelog.com"

        cicd[/ fa:fa-circle-check GitHub Action - Ship It \]:::link
        click cicd "https://github.com/thechangelog/changelog.com/actions/workflows/ship_it.yml"
        
        automation[\ fab:fa-golang Dagger Go SDK /]:::link
        click automation "https://github.com/thechangelog/changelog.com/blob/master/magefiles/magefiles.go"

        registry(( fab:fa-github ghcr.io )):::link
        click registry "https://github.com/orgs/thechangelog/packages"

        chat(( fab:fa-slack Slack )):::link
        click chat "https://changelog.slack.com/archives/C03SA8VE2"

        repo -.-> |.github/workflows/ship_it.yml| cicd
        cicd --> |magefiles/magefiles.go| automation
        
        cicd --> |success #dev| chat
    end
    
    repo -.- |2022.fly| app
    
    registry --> |ghcr.io/changelog/changelog-prod| app
    container --> |flyctl deploy| app
        
    repo -.- |fly.io/dagger-engine-2023-05-20| container

    %% PaaS - https://fly.io/dashboard/changelog
    subgraph Fly.io
    
        proxy{fa:fa-globe Proxy}
        proxy ==> |https| app


        container([ fa:fa-project-diagram Dagger Engine 2023-05-20 ]):::link
        click container "https://fly.io/apps/dagger-engine-2023-05-20"
            
        app(( fab:fa-phoenix-framework App changelog-2022-03-13.fly.dev )):::link
        style app fill:#488969;
        click app "https://fly.io/apps/changelog-2022-03-13"
            
        dbw([ fa:fa-database PostgreSQL Leader 2023-07-31 ]):::link
        click dbw "https://fly.io/apps/changelog-postgres-2023-07-31"
            
        dbr1([ fa:fa-database PostgreSQL Replica 2023-07-31 ])

        app <==> |pgsql| dbw
        dbw -.-> |replication| dbr1

        automation --> |wireguard| container
        container --> |ghcr.io/changelog/changelog-runtime| registry
        container --> |ghcr.io/changelog/changelog-prod| registry

        metricsdb([ fa:fa-chart-line Prometheus ])
        metrics[ fa:fa-columns Grafana fly-metrics.net ]:::link
        click metrics "https://fly-metrics.net"
        metrics --- |promql| metricsdb
        metricsdb -.- |metrics| app
        metricsdb -.- |metrics| dbw
        metricsdb -.- |metrics| container
    end

    %% Secrets
    secrets(( fa:fa-key 1Password )):::link
    click secrets "https://changelog.1password.com/"
    secrets -.-> |secrets| app
    secrets -.-> |secrets| repo

    %% Search
    search(( fa:fa-magnifying-glass Typesense ))
    app -...-> |search| search

    %% Exceptions
    exceptions(( fa:fa-car-crash Sentry )):::link
    click exceptions "https://sentry.io/organizations/changelog-media/issues/?project=5668962"
    app -...-> |exceptions| exceptions

    %% CDN - https://manage.fastly.com/configure/services/7gKbcKSKGDyqU7IuDr43eG
    subgraph Fastly
        apex[ changelog.com ]:::link
        click apex "https://changelog.com"
        
        subgraph Ashburn
            cdn[ cdn.changelog.com ]
        end
    end

    subgraph AWS.S3
        logs[ fab:fa-aws changelog-logs ]
    end
    apex & cdn-.-> |logs| logs

    %% Observability
    observability(( fa:fa-bug Honeycomb )):::link
    click observability "https://ui.honeycomb.io/changelog/datasets/changelog_opentelemetry/home"
    app -....-> |traces| observability
    logs -.-> |logs| observability
    
    %% Object storage
    apex ==> |https| proxy
    subgraph Cloudflare.R2
        assets[ fab:fa-cloudflare changelog-assets changelog.place ]
    end
    cdn ==> |https| assets

    %% Monitoring
    subgraph BetterStack
        status[ fa:fa-layer-group status.changelog.com ]:::link
        click status "https://status.changelog.com"

        monitoring(( fa:fa-table-tennis Uptime )):::link
        click monitoring "https://uptime.betterstack.com/team/133302/monitors"
        monitoring -....-> |monitors| apex
        monitoring -.-> |monitors| cdn
        monitoring -.-> |monitors| proxy
        monitoring -.-> |monitors| status
    end
Loading

Note Continue live editing this Mermaid diagram

Let's dig into how all the above pieces fit together.

A three-tier monolith

TL;DR:

  • Front-end
    • Fastly
    • Fly.io Proxy
    • Cloudflare R2
  • Application
    • Elixir / Phoenix
  • Database
    • PostgreSQL

changelog.com is a monolithic Elixir application built with the Phoenix web framework. It uses PostgreSQL for persistence & Node.js to digest & compile static assets (CSS & JS).

Static assets, including all our mp3 episodes, are stored on Cloudflare R2. They are served via Fastly, specifically https://cdn.changelog.com. In summary:

Fastly (cdn.changelog.com)
↓
Cloudflare R2 (changelog.place)

The production instance of our application is running on Fly.io. All https://changelog.com requests are served via Fastly. Each Fastly request gets proxied to our application instance via the Fly.io Proxy. In summary:

Fastly (changelog.com)
↓
Fly.io Proxy
↓
Application (changelog-2022-03-13.fly.dev)

The production database - PostgreSQL - is running on Fly.io too. It is a replicated setup, with one leader & one replica. In summary:

Application (changelog-2022-03-13.fly.dev)
↓
PostgreSQL Leader
↓
PostgreSQL Replica

Production deploys

Each commit made against our primary branch gets deployed straight into production. The "Ship It!" GitHub Actions workflow is responsible for this. From the workflow jobs perspective, it is fairly standard:

Secrets

All our secrets are stored in 1Password, in the Shared Vault. Currently, they are manually declared in Fly.io via flyctl. They are pasted manually in GitHub Actions secrets.

Metrics & observability

Since our application & database are running on Fly.io, we benefit from free infrastructure metrics: https://fly-metrics.net

All logs from Fastly are streamed into Honeycomb.io. This allows us to ask unknown questions about how various HTTP clients interact with our content. It also helps us explore how Fastly interacts with Fly.io.

We also send app traces via OpenTelemetry to Honeycomb.io.

App errors - e.g. Plug.Conn.InvalidQueryError - show up in Sentry.io.

BetterStack.com monitors our public HTTPS endpoints & alerts us when they become unhealthy.

Search

We use Typesense for search. It's near-instant & it just works.

What is missing?

The above is what we have so far. While we like to keep things simple, our setup is a constant work in progress. We keep making small improvements all the time, and we talk about them every 10 weeks in the context of our Ship It! Kaizen episodes. For example, this diagram and document were created in the context of 🎧 Kaizen 8: 24 improvements & a lot more. If you would prefer to stay in reading mode, check out GitHub discussion #433.

If anything on this page is missing, or could be clearer, please open an issue. Thank you very much!


How to upgrade our PostgreSQL instance running on Fly.io?

  1. Provision a new PostgreSQL instance
flyctl postgres create \
    --org changelog --region iad \
    --name changelog-postgres-2023-07-31 \
    --initial-cluster-size 2 \
    --vm-size performance-2x \
    --volume-size 10
  1. Connect to newly created instance (we want to use the new pd_dump, with the latest improvements)
flyctl ssh console --app changelog-postgres-2023-07-31
  1. Create new db
createdb changelog --host localhost --username postgres
  1. Dump database to local file
pg_dump --host postgres-2022-03-12.internal --username postgres changelog > changelog.sql
  1. Restore database from local file
psql --host localhost --username postgres --single-transaction changelog < changelog.sql
psql --host localhost --command 'ANALYZE VERBOSE;' changelog postgres 

Note If a previous restore failed, run dropdb --force --host localhost --username postgres changelog, then createdb ... again.

  1. Configure app to use new PostgreSQL instance
flyctl secrets set DB_HOST=changelog-postgres-2023-07-31.flycast DB_PASS=<NEW_DB_PASSWORD> --app changelog-2022-03-13