Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Raft State Management for Load Balancers #641

Merged
merged 9 commits into from
Dec 21, 2024

Conversation

sinadarbouy
Copy link
Collaborator

@sinadarbouy sinadarbouy commented Dec 19, 2024

Ticket(s)

N/A

Description

Add Raft State Management for Load Balancers

This PR enhances the distributed state management capabilities by integrating load balancer states into the Raft consensus system. Key changes include:

  • Refactored RoundRobin and WeightedRoundRobin load balancers to store state in Raft FSM
  • Updated ConsistentHash to use string-based hash keys instead of uint64
    • This change addresses a known JSON unmarshaling issue where json.Unmarshal uses floats when unmarshalling numbers into interface values
    • Using string-based keys provides better compatibility and prevents potential precision loss while maintaining performance through string-based dictionary lookups
  • Introduced new Raft commands for load balancer state management:
    • CommandAddRoundRobinNext
    • CommandUpdateWeightedRR
    • CommandUpdateWeightedRRBatch
  • Added proper state persistence and restoration in Raft snapshots
  • Updated tests to use Raft for state management
  • Improved thread safety with proper mutex usage

These changes ensure consistent load balancer behavior across the cluster by maintaining state through Raft consensus.

Related PRs

Development Checklist

  • I have added a descriptive title to this PR.
  • I have squashed related commits together.
  • I have rebased my branch on top of the latest main branch.
  • I have performed a self-review of my own code.
  • I have commented on my code, particularly in hard-to-understand areas.
  • I have added docstring(s) to my code.
  • I have made corresponding changes to the documentation (docs).
  • I have updated docs using make gen-docs command.
  • I have added tests for my changes.
  • I have signed all the commits.

Legal Checklist

- Implemented round-robin state to enhance leader election and task distribution.
- Updated Raft state machine to incorporate round-robin logic.
- Added tests to ensure correct round-robin behavior in various scenarios.

This change improves the efficiency and fairness of task handling within the Raft cluster.
- Introduced a mutex (`proxyStateMutex`) to synchronize access to proxy state checks in `testProxy` function.
- Modified the `testProxy` function to lock the mutex before checking the state of `AvailableConnections` and `busyConnections`.
- Updated the test logic to ensure that one of the proxies is in the expected state, preventing race conditions where the second goroutine could access the connection state prematurely.
- Removed the `proxy` parameter from `testProxy` function calls as it is no longer needed.

This change addresses a race condition that could cause the second goroutine to access connection states before they are properly synchronized, ensuring reliable test results.
Add support for storing weighted round-robin load balancer state in Raft FSM
to ensure consistency across cluster nodes. Changes include:

- Add WeightedProxy and WeightedRRPayload structs for state management
- Store proxy weights in Raft FSM using weightedRRStates map
- Update WeightedRoundRobin to use Raft for weight tracking
- Add new CommandUpdateWeightedRR command type
- Remove local weight tracking in favor of distributed state

This change ensures that proxy weights remain consistent across cluster nodes
during failover and leader changes.
Improve performance of WeightedRoundRobin.NextProxy by reducing Raft operations:
- Replace multiple individual Raft updates with a single batch operation
- Introduce new CommandUpdateWeightedRRBatch command type
- Collect all proxy weight updates in memory before applying
- Reduce number of Raft.Apply calls from N+1 to 1 (where N is number of proxies)

This change significantly reduces the number of Raft consensus operations
needed for weight updates in the weighted round-robin load balancer.
- Add JSON marshaling error handling in WeightedRoundRobin
- Simplify proxy state validation logic in server tests
- Clean up test formatting

The main changes improve error propagation and code clarity by:
- Properly handling JSON marshaling errors in NextProxy method
- Refactoring conditional logic in server tests to be more readable
- Removing unnecessary empty lines in test files
Copy link

github-actions bot commented Dec 19, 2024

Overview

Image reference ghcr.io/gatewayd-io/gatewayd:8fbdb8a gatewaydio/gatewayd:latest
- digest 3535e2168e31 383013efa302
- tag 8fbdb8a latest
- provenance b6df86a
- vulnerabilities critical: 0 high: 1 medium: 1 low: 0 critical: 0 high: 1 medium: 1 low: 0
- platform linux/amd64 linux/amd64
- size 20 MB 18 MB (-2.3 MB)
- packages 144 140 (-4)
Base Image alpine:3
also known as:
3.20
3.20.3
latest
alpine:3.20
also known as:
3
3.20.3
latest
- vulnerabilities critical: 0 high: 0 medium: 1 low: 0 critical: 0 high: 0 medium: 1 low: 0
Packages and Vulnerabilities (5 package changes and 0 vulnerability changes)
  • ➖ 3 packages removed
  • ♾️ 2 packages changed
  • 134 packages unchanged
Changes for packages of type apk (3 changes)
Package Version
ghcr.io/gatewayd-io/gatewayd:8fbdb8a
Version
gatewaydio/gatewayd:latest
ca-certificates 20240705-r0
openssl 3.3.2-r0
pax-utils 1.3.7-r2
Changes for packages of type golang (2 changes)
Package Version
ghcr.io/gatewayd-io/gatewayd:8fbdb8a
Version
gatewaydio/gatewayd:latest
♾️ github.com/gatewayd-io/gatewayd (devel) 0.0.0-20241214123014-b6df86a6fe94
♾️ stdlib go1.23.4 1.23.4

Add support for persisting and restoring weighted round-robin load balancer
states in the Raft FSM snapshots. This ensures the weighted round-robin
configuration survives cluster restarts and leader changes.

Changes:
- Add weightedRRStates to FSMSnapshot struct
- Update Snapshot() to copy weighted round-robin states
- Extend Restore() and Persist() to handle weighted round-robin data
Add commented example of grpcAddress field in peer configuration,
which specifies the gRPC endpoint for raft peer communication.
Add test cases covering:
- Weighted round-robin operations (single and batch updates)
- Round-robin index management
- Invalid command handling
- FSM snapshot restoration
- Node shutdown scenarios
@sinadarbouy sinadarbouy marked this pull request as ready for review December 19, 2024 16:53
@sinadarbouy sinadarbouy requested a review from mostafa December 19, 2024 16:53
network/roundrobin.go Outdated Show resolved Hide resolved
network/weightedroundrobin.go Show resolved Hide resolved
Replace usage of LeaderElectionTimeout with a new dedicated ApplyTimeout constant
(2 seconds) for Raft command applications across different load balancing
strategies (ConsistentHash, RoundRobin, WeightedRoundRobin).

This change provides better separation of concerns by using a more appropriate
timeout value for command applications rather than reusing the leader election
timeout.
Copy link
Member

@mostafa mostafa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🚀

@mostafa mostafa merged commit 86ec724 into main Dec 21, 2024
5 checks passed
@mostafa mostafa deleted the feature/raft-loadbalancer-state branch December 21, 2024 23:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants