Generalized Consensus: Ordering Decisions (multigres.com)

🤖 AI Summary
This piece lays out a framework for ordering leadership and coordination decisions in generalized consensus settings where cohorts can be much larger than traditional three- or five-node setups. It introduces the idea of external coordinators—agents that perform health checks and drive failovers or planned leader changes—so that cohort nodes don’t all need to shoulder coordination. The write-up contrasts pragmatic lock-and-timeout approaches (used in systems like Vitess) with a theory-first, lock-free design that must not rely on elapsed time or perfect clocks, and it frames the problem around satisfying a rule that newer agents must be able to supersede previous ones. Technically, the solution centers on term numbers (Raft-style) that are globally unique, monotonic, and persisted on cohort nodes. Nodes must honor requests only from matching-term agents and reject lower-term requests; they can be recruited into higher terms. Encounter (causal) ordering via overlapping node visits—rather than unreliable timestamps—provides precise sequencing, and term numbers act as long-lived authorities that span many sub-requests (e.g., log positions) so a new term reliably supersedes prior leadership. Key implications: scalability (fewer coordinators placed across AZs), safety without time assumptions (circumventing FLP pitfalls), and the necessity of persistent term storage to prevent regressions after restarts.
Loading comments...
loading comments...