- Getting a million users is infinitely harder than scaling a system to handle a million users. Most systems could run comfortably on a Raspberry Pi
- Setting up containers, load balancing, and service discovery on light hardware
- Ask HN: Any recommended resources to develop system thinking? (2018)
- Distributed Systems in One Lesson by Tim Berglund (2017)
- Træfik - Modern HTTP reverse proxy and load balancer that makes deploying microservices easy.
- Kit - Standard library for microservices written in Go.
- Fear and Loathing in Lock-Free Programming (2017)
- Reliable Systems Series: Model-Based Testing (2018)
- Awesome Distributed Systems
- Kong - Cloud-Native API Gateway & Service Mesh.
- Disque - Distributed message broker.
- Mesh - Tool for building distributed applications.
- Raft - Raft distributed consensus algorithm implemented in Rust.
- libp2p specification - Technical specifications for the libp2p networking stack.
- Class materials for a distributed systems lecture series
- Raft Consensus Algorithm
- Qri - Global dataset version control system (GDVCS) built on the distributed web.
- Project Oak - Meaningful control of data in distributed systems.
- mudb - Collection of modules for building realtime client-server networked applications.
- Verdi - Framework for formally verifying distributed systems implementations in Coq.
- PingCAP Talent Plan - Series of training courses about writing distributed systems in Go and Rust.
- Protocol Labs - Build protocols, systems, and tools to improve internet.
- Dark Crystal - Open source R&D affinity. Exploring the potential of new and existing technologies in crypto-space to encourage horizontal group collaboration.
- Protozoa - Web developers, facilitators, crypto-engineers. Experts in Node.js & distributed systems.
- Akka - Build highly concurrent, distributed, and resilient message-driven applications on the JVM.
- Distributed Components - Provides reusable infrastructure for formally verifying distributed systems using the Coq proof assistant.
- Practical Networked Applications in Rust, Part 1: Non-Networked Key-Value Store (HN)
- LF - Fully Decentralized Fully Replicated Key/Value Store.
- Awesome Consensus - Curated selection of artisanal consensus algorithms and hand-crafted distributed lock services.
- Rezolus - Tool for collecting detailed systems performance telemetry and exposing burst patterns through high-resolution telemetry.
- Cadence - Distributed, scalable, durable, and highly available orchestration engine to execute asynchronous long-running business logic in a scalable and resilient way.
- Pilosa - Open source, distributed bitmap index that dramatically accelerates queries across multiple, massive data sets.
- Finagle - Fault tolerant, protocol-agnostic RPC system.
- Distributed - Distributed task scheduler for Dask.
- How To Build A Modern Distributed Compute Platform (2018)
- Chaos Monkey - Resiliency tool that helps applications tolerate random instance failures.
- Faust - Python Stream Processing.
- "Consistency without consensus in production systems" by Peter Bourgon (2014)
- Distributed consensus reading list
- Titanoboa - Community version of fully distributed, highly scalable and fault tolerant workflow orchestration platform for JVM.
- Buoyant - Helps you deploy and run Linkerd, the fully open source, ultralight service mesh.
- Grappa - Runtime system for scaling irregular applications on commodity clusters.
- MIT Distributed Systems course (2020)
- Correctness proofs of distributed systems with Isabelle/HOL (2019)
- Apache Mesos - Cluster manager that provides efficient resource isolation and sharing across distributed applications, or frameworks.
- Gleam - Fast, efficient, and scalable distributed map/reduce system, DAG execution, in memory or on disk, written in pure Go, runs standalone or distributedly.
- Learning Distributed Systems - Cloud Native Podcast
- etcd - Distributed reliable key-value store for the most critical data of a distributed system.
- etcdadm - Command-line tool for operating an etcd cluster. It makes it easy to create a new cluster, add a member to, or remove a member from an existing cluster.
- Learning to build distributed systems (2019) (Lobsters)
- SwarmKit - Toolkit for orchestrating distributed systems at any scale. It includes primitives for node discovery, raft-based consensus, task scheduling and more.
- How to get started with infrastructure and distributed systems (2016)
- Advanced Napkin Math: Estimating System Performance from First Principles (2019) (Code)
- Golimit - Uber ringpop based distributed and decentralized rate limiter.
- System Design lectures (2020)
- Awesome Scalability - Patterns of Scalable, Reliable, and Performant Large-Scale Systems.
- LeetCode System Design Questions
- Grokking the System Design Interview
- Amazon Builders' Library - How Amazon builds and operates software.
- Distributed Systems Wiki (Code)
- Jepsen - Distributed Systems Safety Research.
- ION - Distributed RTC system written by pure go and flutter.
- Challenges with distributed systems (HN)
- Smallstep - End-to-End Encryption for Distributed Systems.
- Systems design for Advanced Beginners (2020)
- Performance Under Load (2018)
- Veneur - Distributed, fault-tolerant pipeline for runtime data.
- Going multi-region