Optimizing for High Availability and Minimal Latency in Distributed Databases with Kubernetes and Calico Cluster Mesh
Calico Cluster mesh extends Kubernetes' inherent capabilities, providing seamless service discovery across multiple Kubernetes clusters. This advanced feature allows Kubernetes services, including headless services, to discover and connect with each other across cluster boundaries without the need for an additional control plane, such as a Service mesh.
This example outlines the setup of two AWS EKS clusters with cross-region connectivity. Each cluster is placed within its own Virtual Private Cloud (VPC), and these VPCs are connected to allow direct network communication between the clusters using VPC peering. The configuration ensures that EKS cluster nodes in one VPC can communicate with cluster nodes in the other VPC.
The EKS clusters are configured with Calico Cluster mesh, enabling direct, low-latency communication between clusters. This allows services in different clusters to discover and connect seamlessly, simplifying cross-cluster interactions and enhancing network efficiency without the need for additional networking layers or external routing mechanisms.
We'll use Terraform, an infrastructure-as-code tool, to deploy this reference architecture automatically. We'll walk you through the deployment process and then demonstrate how to utilize Calico Cluster mesh on AWS
First, ensure that you have installed the following tools locally.
Make sure you have completed the prerequisites and then clone the Terraform blueprint:
git clone https://github.com/tigera-solutions/multi-cluster-stateful-workloads-with-cluster-mesh.git
Switch to the aws
subdirectory:
cd multi-cluster-stateful-workloads-with-cluster-mesh/aws
Optional: Edit the terraform.tfvars file to customize the configuration.
Examine terraform.tfvars.
region1 = "us-east-1"
region2 = "us-west-2"
vpc1_cidr = "10.0.0.0/16"
vpc2_cidr = "10.1.0.0/16"
cluster1_name = "iad"
cluster2_name = "pdx"
cluster_version = "1.27"
instance_type = "m5.xlarge"
desired_size = 3
ssh_keyname = "your-ssh-keyname"
pod_cidr1 = "192.168.1.0/24"
pod_cidr2 = "192.168.2.0/24"
calico_version = "v3.26.4"
calico_encap = "VXLAN"
Note
Make sure that the ssh_keyname
exists for both region1 and region2 in your AWS account. This terraform assumes they already exist in both regions.
Initialize and apply the Terraform configurations:
terraform init
terraform apply
Enter yes
at command prompt to apply
Update your kubeconfig with the EKS cluster credentials as indicated in the Terraform output:
aws eks --region <REGION1> update-kubeconfig --name <CLUSTER_NAME1> --alias <CLUSTER_NAME1>
aws eks --region <REGION2> update-kubeconfig --name <CLUSTER_NAME2> --alias <CLUSTER_NAME2>
Check the status of Calico in your EKS cluster:
kubectl --context iad get tigerastatus
kubectl --context pdx get tigerastatus
Join your EKS cluster to Calico Cloud as illustrated:
join-eks-to-calico-cloud.mp4
To set your context to a specific cluster use the following commands
kubectl config use-context iad
Do one and then the other.
kubectl config use-context pdx
Check the cluster status:
kubectl --context iad get tigerastatus
kubectl --context pdx get tigerastatus
Set the flow logs flush interval:
kubectl --context iad patch felixconfiguration default --type='merge' -p '{
"spec": {
"dnsLogsFlushInterval": "15s",
"l7LogsFlushInterval": "15s",
"flowLogsFlushInterval": "15s",
"flowLogsFileAggregationKindForAllowed": 1,
"flowLogsEnableHostEndpoint": true
}
}'
kubectl --context pdx patch felixconfiguration default --type='merge' -p '{
"spec": {
"dnsLogsFlushInterval": "15s",
"l7LogsFlushInterval": "15s",
"flowLogsFlushInterval": "15s",
"flowLogsFileAggregationKindForAllowed": 1,
"flowLogsEnableHostEndpoint": true
}
}'
Run the setup-mesh.sh script:
cd ..
sh setup-mesh.sh
The setup-mesh.sh
script automates the creation of a Calico Cluster mesh as outlined in the Tigera documentation, enabling secure and efficient connections between multiple Kubernetes clusters. Below is a breakdown of the specific Kubernetes resources it creates and configures:
-
In the source cluster, it:
- Applies Calico federation manifests to install federation roles, rolebindings, and a service account needed for cross-cluster communication.
- Creates a secret that stores the service account token. This token ensures secure connections between clusters by providing authentication and authorization.
-
Generates a kubeconfig file using the service account token. This file contains all necessary details (like the cluster API server address and credentials) for secure access to the source cluster.
-
In the destination cluster, it:
- Creates a secret that includes the kubeconfig from the source cluster. This enables the destination cluster to securely communicate with the source cluster.
- Configures a
RemoteClusterConfiguration
resource, which is used to manage the mesh connection settings and policies. - Applies specific RBAC roles and role bindings to allow designated components access to the secret, ensuring they can establish and maintain secure cross-cluster communication.
setup-mesh-dot-sh.mp4
Check logs for remote cluster connection status:
kubectl --context iad logs deployment/calico-typha -n calico-system | grep "Sending in-sync update"
kubectl --context pdx logs deployment/calico-typha -n calico-system | grep "Sending in-sync update"
2024-02-27 01:51:06.156 [INFO][13] wrappedcallbacks.go 487: Sending in-sync update for RemoteClusterConfiguration(pdx)
2024-02-27 01:51:03.300 [INFO][13] wrappedcallbacks.go 487: Sending in-sync update for RemoteClusterConfiguration(iad)
You should see similar messages for each of the clusters in your Cluster mesh.
Return to the project root and apply the manifests:
kubectl --context iad apply -f multi-cluster-rs-iad.yaml
kubectl --context iad apply -f netshoot.yaml
kubectl --context pdx apply -f multi-cluster-rs-pdx.yaml
kubectl --context pdx apply -f netshoot.yaml
Test the configuration of each Service:
kubectl --context pdx get svc
kubectl --context pdx exec -it netshoot -- ping -c 1 multi-cluster-rs-pdx
kubectl --context pdx exec -it netshoot -- ping -c 1 multi-cluster-rs-iad
kubectl --context iad get svc
kubectl --context iad exec -it netshoot -- ping -c 1 multi-cluster-rs-iad
kubectl --context iad exec -it netshoot -- ping -c 1 multi-cluster-rs-pdx
By accessing the headless service names within each cluster, we can observe how they resolve to endpoint addresses in both the local and the remote clusters. We can confirm that there is service discovery and connectivity across the clusters.
When we scale the StatefulSet in each cluster, we can see that each replica of the StatefulSet can be directly accessed through a DNS name, following the pattern ${podname}.${federated service name}
.
kubectl --context iad patch sts multi-cluster-rs-iad --patch '{"spec":{"replicas":2}}'
kubectl --context pdx patch sts multi-cluster-rs-pdx --patch '{"spec":{"replicas":2}}'
kubectl --context pdx exec -it netshoot -- ping -c 1 multi-cluster-rs-iad-0.multi-cluster-rs-iad
kubectl --context pdx exec -it netshoot -- ping -c 1 multi-cluster-rs-iad-1.multi-cluster-rs-iad
kubectl --context iad exec -it netshoot -- ping -c 1 multi-cluster-rs-pdx-0.multi-cluster-rs-pdx
kubectl --context iad exec -it netshoot -- ping -c 1 multi-cluster-rs-pdx-1.multi-cluster-rs-pdx
By directly accessing each StatefulSet pod via its DNS name across clusters, we enable database workloads to allow clients to connect directly to their chosen pod, regardless of the cluster in which the pods are running.
demo-cross-cluster-service-discovery.mp4
In the Calico Cloud Service Graph, you can observe cross-cluster communication by visualizing the network traffic flows between different clusters. The Service Graph not only shows the existence of cross-cluster connectivity but also allows you to analyze the efficiency and behavior of the data flows, facilitating a deeper understanding of the network dynamics in a multi-cluster environment.
observe-cluster-mesh.mp4
To teardown and remove the resources created in this example:
terraform state rm helm_release.calico_cluster1
terraform state rm helm_release.calico_cluster2
terraform destroy --auto-approve