Part of Distributed Algorithms
- gotwopc: Replica + Working
- committer: Hooks, More OOPs, Didn't read in detail
- 2PC-TextBook: Good
- Distributed Txn: Berkeley CS186 Class
- Participants send heartbeats periodically. This could be done using
hashicorp/memberlist
(ie SWIM gossip protocol) - If the coordinator detects the failure, then it can spin up a new Participant that can read from the durable log (assuming the log is persisted) to create a clone.
- When the old participant comes back online, the coordinator asks to recycle.
- For coordinator failure: 3PC, Paxos Commit.
- Distributed Dead Lock Detection: Periodically each of the Participants sends its wait-for graph to a designated deadlock master node. Then create a union and create a global wait-for graph to detect global deadlock.
- Also read about 2PL: Growing Phase, Shrinking Phase.link
- For locking, each participant has a lock table on their own node/machine. For Table level locks, you can have a central node, to track locks on the whole table/database. In MatrixOrigin, we add locks to
mo_tables
ormo_locks
table.