From e5c69f561e6d8becabc70e5782a1311e7ec5bb1c Mon Sep 17 00:00:00 2001 From: Jozef Kralik Date: Mon, 17 Jun 2024 15:28:03 +0000 Subject: [PATCH 01/13] disaster-recovery tutorial for HUB and DPS --- .../disaster-recovery.md | 2 +- .../disaster-recovery-replica-set.md | 681 ++++++++++++++++++ 2 files changed, 682 insertions(+), 1 deletion(-) create mode 100644 content/en/docs/tutorials/disaster-recovery-replica-set.md diff --git a/content/en/docs/features/monitoring-and-diagnostics/disaster-recovery.md b/content/en/docs/features/monitoring-and-diagnostics/disaster-recovery.md index c4be6514..f5278fce 100644 --- a/content/en/docs/features/monitoring-and-diagnostics/disaster-recovery.md +++ b/content/en/docs/features/monitoring-and-diagnostics/disaster-recovery.md @@ -74,7 +74,7 @@ To back up the database, two approaches can be used: ![active-backup-replica-set](/docs/features/monitoring-and-diagnostics/static/disaster-recovery-active-replica-set-backup.drawio.svg) - The primary and standby cluster MongoDB members are in the same MongoDB replica set. The standby cluster members are configured as [hidden](https://www.mongodb.com/docs/manual/core/replica-set-hidden-member), [delayed](https://www.mongodb.com/docs/manual/core/replica-set-delayed-member/), and with [zero priority](https://www.mongodb.com/docs/manual/core/replica-set-priority-0-member/). When the primary cluster goes down, the standby cluster MongoDB members are promoted to standby state—one of them will become primary by administrator. After the primary is back online, the primary cluster members will be demoted to hidden. For switching back, the primary cluster members will be promoted to secondary MongoDB members and standby cluster members will be demoted. **This approach is supported by the plgd hub helm chart because it complies with the MongoDB Community Server license.** For setup instructions, please refer to this [tutorial](). + The primary and standby cluster MongoDB members are in the same MongoDB replica set. The standby cluster members are configured as [hidden](https://www.mongodb.com/docs/manual/core/replica-set-hidden-member), [delayed](https://www.mongodb.com/docs/manual/core/replica-set-delayed-member/), and with [zero priority](https://www.mongodb.com/docs/manual/core/replica-set-priority-0-member/). When the primary cluster goes down, the standby cluster MongoDB members are promoted to standby state—one of them will become primary by administrator. After the primary is back online, the primary cluster members will be demoted to hidden. For switching back, the primary cluster members will be promoted to secondary MongoDB members and standby cluster members will be demoted. **This approach is supported by the plgd hub helm chart because it complies with the MongoDB Community Server license.** For setup instructions, please refer to this [tutorial](/docs/tutorials/disaster-recovery-replica-set/). * **Cluster to cluster synchronization** diff --git a/content/en/docs/tutorials/disaster-recovery-replica-set.md b/content/en/docs/tutorials/disaster-recovery-replica-set.md new file mode 100644 index 00000000..71bae8b2 --- /dev/null +++ b/content/en/docs/tutorials/disaster-recovery-replica-set.md @@ -0,0 +1,681 @@ +--- +title: 'Disaster Recovery via Replica Set' +description: 'How to perform disaster recovery with a Replica Set?' +date: '2024-06-17' +categories: [architecture, d2c, provisioning, disaster-recovery] +keywords: [architecture, d2c, provisioning, disaster-recovery] +weight: 15 +--- + +The plgd-hub Helm charts support disaster recovery via a MongoDB replica set because the source of truth is stored in the MongoDB database. It is required that devices have configured **device provisioning endpoints** for both clusters' device provisioning services. In this tutorial, we have two MicroK8s clusters: primary and standby. Each of them uses three root CA certificates: + +- `external CA certificate pair`: Used for public APIs (CoAP, HTTPS, gRPC) and is the same for both clusters. +- `internal CA certificate pair`: Used for plgd services to communicate with each other, MongoDB, and NATs. Each cluster has its own internal CA certificate. +- `storage CA certificate pair`: Used for MongoDB. Each cluster has its own storage CA certificate. + +We also use an `authorization CA certificate` to communicate with the OAuth2 authorization server. In this tutorial, `mock-oauth-server` and its certificate are signed by the `external CA certificate pair`. Thus, we have only one `authorization CA certificate` for both clusters, which is the `external CA certificate`. + +The goal is to ensure that only MongoDBs from the primary and standby clusters can communicate with each other, while plgd services can only connect to the MongoDB in their respective clusters. All APIs will be available on the root domain `primary.plgd.cloud` for the primary cluster and `standby.plgd.cloud` for the standby cluster. Additionally, MongoDB members are exposed via the LoadBalancer service type, and each member needs its own DNS name. + +| For the primary cluster we have | For the standby cluster we have | +| --- | --- | +| `mongodb-0.primary.plgd.cloud` | `mongodb-0.standby.plgd.cloud` | +| `mongodb-1.primary.plgd.cloud` | `mongodb-1.standby.plgd.cloud` | +| `mongodb-2.primary.plgd.cloud` | `mongodb-2.standby.plgd.cloud` | +| `mongodb.primary.plgd.cloud` | `----` | + +The `mongodb.primary.plgd.cloud` is used for external access to the MongoDB replica set for the standby cluster. This DNS record is an alias for all members of the primary cluster. + +This DNS needs to be resolved to the external IP address of the LoadBalancer. The external IP address of the LoadBalancer is used to connect to the MongoDB replica set from the other cluster. For clouds, you can use the [external-dns](https://github.com/kubernetes-sigs/external-dns/) tool to create DNS records in AWS Route53 / Google Cloud DNS / Azure DNS. +In this tutorial, we show how to get the IPs of MongoDB services, and we will set them manually in /etc/hosts, then restart the dnsmasq daemon to load these changes on the computer with IP 192.168.1.1. + +{{< note >}} +It is also recommended to set up a firewall between clusters with source IP address filtering to mitigate DDOS attacks on MongoDB. The default port for MongoDB is 27017. Alternatively, use a VPN to interconnect clusters. +{{< /note >}} + +## Installation + +### MicroK8s Prerequisites + +The following addons are expected to be enabled on both clusters, with **Kubernetes v1.24+** installed. + +```yaml +addons: + enabled: + cert-manager # (core) Cloud-native certificate management + dns # (core) CoreDNS + ha-cluster # (core) Configure high availability on the current node + helm # (core) Helm - the package manager for Kubernetes + helm3 # (core) Helm 3 - the package manager for Kubernetes + hostpath-storage # (core) Storage class; allocates storage from host directory + ingress # (core) Ingress controller for external access + metallb # (core) Loadbalancer for your Kubernetes cluster +``` + +The [dns](https://microk8s.io/docs/addon-dns) addon is configured to use a DNS server that hosts all records for `primary.plgd.cloud` and `standby.plgd.cloud` domains. To configure DNS in MicroK8s, you can use the following command: + +```bash +microk8s disable dns +microk8s enable dns:192.168.1.1 +``` + +For [metallb](https://microk8s.io/docs/addon-metallb), we need to set up the IP address pool for the LoadBalancer service type. The IP address pool needs to be accessible from the network where the MicroK8s is running. It is important that the IP address is not used by any other device in the network and that the DHCP server is not assigning this IP address to any device. + +Example for the primary cluster: + +```bash +microk8s disable metallb +microk8s enable metallb:192.168.1.200-192.168.1.219 +``` + +Example for the standby cluster: + +```bash +microk8s disable metallb +microk8s enable metallb:192.168.1.220-192.168.1.239 +``` + +### Creating Certificates + +To create certificates, you can use the cert-tool Docker image to generate root CA certificates for the services. + +1. Create the external CA certificate pair (same for both clusters): + + ```bash + mkdir -p .tmp/certs/external + docker run \ + --rm -v $(pwd)/.tmp/certs/external:/certs \ + --user $(id -u):$(id -g) \ + ghcr.io/plgd-dev/hub/cert-tool:vnext \ + --cmd.generateRootCA --outCert=/certs/tls.crt --outKey=/certs/tls.key \ + --cert.subject.cn=external.root.ca --cert.validFor=876000h + ``` + +2. Create the internal CA certificate pair for the primary cluster: + + ```bash + mkdir -p .tmp/primary/certs/internal + docker run \ + --rm -v $(pwd)/.tmp/primary/certs/internal:/certs \ + --user $(id -u):$(id -g) \ + ghcr.io/plgd-dev/hub/cert-tool:vnext \ + --cmd.generateRootCA --outCert=/certs/tls.crt --outKey=/certs/tls.key \ + --cert.subject.cn=primary.internal.root.ca --cert.validFor=876000h + ``` + +3. Create the storage CA certificate pair for the primary cluster: + + ```bash + mkdir -p .tmp/primary/certs/storage + docker run \ + --rm -v $(pwd)/.tmp/primary/certs/storage:/certs \ + --user $(id -u):$(id -g) \ + ghcr.io/plgd-dev/hub/cert-tool:vnext \ + --cmd.generateRootCA --outCert=/certs/tls.crt --outKey=/certs/tls.key \ + --cert.subject.cn=primary.storage.root.ca --cert.validFor=876000h + ``` + +4. Create the internal CA certificate pair for the standby cluster: + + ```bash + mkdir -p .tmp/standby/certs/internal + docker run \ + --rm -v $(pwd)/.tmp/standby/certs/internal:/certs \ + --user $(id -u):$(id -g) \ + ghcr.io/plgd-dev/hub/cert-tool:vnext \ + --cmd.generateRootCA --outCert=/certs/tls.crt --outKey=/certs/tls.key \ + --cert.subject.cn=standby.internal.root.ca --cert.validFor=876000h + ``` + +5. Create the storage CA certificate pair for the standby cluster: + + ```bash + mkdir -p .tmp/standby/certs/storage + docker run \ + --rm -v $(pwd)/.tmp/standby/certs/storage:/certs \ + --user $(id -u):$(id -g) \ + ghcr.io/plgd-dev/hub/cert-tool:vnext \ + --cmd.generateRootCA --outCert=/certs/tls.crt --outKey=/certs/tls.key \ + --cert.subject.cn=standby.storage.root.ca --cert.validFor=876000h + ``` + +### Setting up cert-manager on the Primary Cluster + +Ensure that you have cert-manager installed. + +1. Create an external TLS secret for issuing certificates: + + ```bash + kubectl -n cert-manager create secret tls external-plgd-ca-tls \ + --cert=.tmp/certs/external/tls.crt \ + --key=.tmp/certs/external/tls.key + ``` + +2. Create a ClusterIssuer that points to `external-plgd-ca-tls`: + + ```bash + cat < values.yaml +global: + domain: "$DOMAIN" + hubId: "$HUB_ID" + ownerClaim: "$OWNER_CLAIM" + standby: $STANDBY + extraCAPool: + authorization: | + $AUTHORIZATION_CA_IN_PEM + internal: | + $INTERNAL_CA_IN_PEM + $STORAGE_PRIMARY_CA_IN_PEM + $EXTERNAL_CA_IN_PEM + storage: | + $STORAGE_PRIMARY_CA_IN_PEM + $STORAGE_STANDBY_CA_IN_PEM + $INTERNAL_CA_IN_PEM +mockoauthserver: + enabled: true + oauth: + - name: "plgd.dps" + clientID: "test" + clientSecret: "test" + grantType: "clientCredentials" + redirectURL: "https://$DOMAIN/things" + scopes: ['openid'] + - name: "plgd.web" + clientID: "test" + clientSecret: "test" + redirectURL: "https://$DOMAIN/things" + scopes: ['openid'] + useInUi: true +mongodb: + tls: + extraDnsNames: + - "mongodb.$DOMAIN" + externalAccess: + enabled: true + service: + type: LoadBalancer + publicNames: + - "mongodb-0.$DOMAIN" + - "mongodb-1.$DOMAIN" + - "mongodb-2.$DOMAIN" + annotationsList: + - external-dns.alpha.kubernetes.io/hostname: "mongodb-0.$DOMAIN" + - external-dns.alpha.kubernetes.io/hostname: "mongodb-1.$DOMAIN" + - external-dns.alpha.kubernetes.io/hostname: "mongodb-2.$DOMAIN" +certmanager: + storage: + issuer: + kind: ClusterIssuer + name: storage-plgd-ca-issuer + internal: + issuer: + kind: ClusterIssuer + name: internal-plgd-ca-issuer + default: + ca: + issuerRef: + kind: ClusterIssuer + name: external-plgd-ca-issuer +httpgateway: + apiDomain: "$DOMAIN" +grpcgateway: + domain: "$DOMAIN" +certificateauthority: + domain: "$DOMAIN" +coapgateway: + service: + type: NodePort + nodePort: 15684 +resourcedirectory: + publicConfiguration: + coapGateway: "coaps+tcp://$DOMAIN:15684" +deviceProvisioningService: + apiDomain: "$DOMAIN" + service: + type: NodePort + image: + dockerConfigSecret: | + { + "auths": { + "ghcr.io": { + "auth": "$DOCKER_AUTH_TOKEN" + } + } + } + enrollmentGroups: + - id: "5db6ccde-05e1-480b-a522-c1591ad7dfd2" + owner: "1" + attestationMechanism: + x509: + certificateChain: |- + $MANUFACTURER_CERTIFICATE_CA + hub: + coapGateway: "$DOMAIN:15684" + certificateAuthority: + grpc: + address: "$DOMAIN:443" + authorization: + provider: + name: "plgd.dps" + clientId: "test" + clientSecret: "test" + audience: "https://$DOMAIN" +EOF +helm upgrade -i -n plgd --create-namespace -f values.yaml hub plgd/plgd-hub +helm upgrade -i -n plgd --create-namespace -f values.yaml dps plgd/plgd-dps +``` + +Now we need to get the IP addresses of the MongoDB members and set them to the DNS. The external IP address of the LoadBalancer is used to connect to the MongoDB replica set from the other cluster. + +```bash +kubectl -n plgd get services | grep mongodb | grep LoadBalancer | awk '{print $1 ":" $4}' +mongodb-0-external:192.168.1.202 +mongodb-1-external:192.168.1.200 +mongodb-2-external:192.168.1.201 +``` + +Next, we need to set the DNS records for the primary cluster to the DNS server running on `192.168.1.1`. The `mongodb.primary.plgd.cloud` is an alias to all members of the primary cluster. + +```bash +echo " +192.168.1.202 mongodb-0.primary.plgd.cloud mongodb.primary.plgd.cloud +192.168.1.200 mongodb-1.primary.plgd.cloud mongodb.primary.plgd.cloud +192.168.1.201 mongodb-2.primary.plgd.cloud mongodb.primary.plgd.cloud +" | sudo tee -a /etc/hosts +sudo systemctl restart dnsmasq +``` + +After some time for the pods to start, you can access the Hub at `https://primary.plgd.cloud`. + +### Deploy plgd on Standby Cluster + +Deploying plgd to the standby cluster is similar to deploying it to the primary cluster. The differences are that the domain is `standby.plgd.cloud`, different internal and storage certificates are used, the standby flag is set to `true`, MongoDB is configured to use the master DB at `mongodb.primary.plgd.cloud`, and the `mongodb-standby-tool` job is enabled to configure the MongoDB replica set. + +```bash +# Set variables +DOMAIN="standby.plgd.cloud" +PRIMARY_MONGO_DB="mongodb.primary.plgd.cloud" +HUB_ID="d03a1bb4-0a77-428c-b78c-1c46efe6a38e" +OWNER_CLAIM="https://plgd.dev/owner" +STANDBY=true +DOCKER_AUTH_TOKEN="" + +# Read certificate files +AUTHORIZATION_CA_IN_PEM=$(cat .tmp/certs/external/tls.crt) +EXTERNAL_CA_IN_PEM=$(cat .tmp/certs/external/tls.crt) +INTERNAL_CA_IN_PEM=$(cat .tmp/standby/certs/internal/tls.crt) +STORAGE_PRIMARY_CA_IN_PEM=$(cat .tmp/primary/certs/storage/tls.crt) +STORAGE_STANDBY_CA_IN_PEM=$(cat .tmp/standby/certs/storage/tls.crt) +MANUFACTURER_CERTIFICATE_CA="" + +# Create values.yaml file +cat < values.yaml +global: + domain: "$DOMAIN" + hubId: "$HUB_ID" + ownerClaim: "$OWNER_CLAIM" + standby: $STANDBY + extraCAPool: + authorization: | + $AUTHORIZATION_CA_IN_PEM + internal: | + $INTERNAL_CA_IN_PEM + $STORAGE_STANDBY_CA_IN_PEM + $EXTERNAL_CA_IN_PEM + storage: | + $STORAGE_PRIMARY_CA_IN_PEM + $STORAGE_STANDBY_CA_IN_PEM + $INTERNAL_CA_IN_PEM +mockoauthserver: + enabled: true + oauth: + - name: "plgd.dps" + clientID: "test" + clientSecret: "test" + grantType: "clientCredentials" + redirectURL: "https://$DOMAIN/things" + scopes: ['openid'] + - name: "plgd.web" + clientID: "test" + clientSecret: "test" + redirectURL: "https://$DOMAIN/things" + scopes: ['openid'] + useInUi: true +mongodb: + standbyTool: + enabled: true + replicaSet: + standby: + members: + - "mongodb-0.$DOMAIN:27017" + - "mongodb-1.$DOMAIN:27017" + - "mongodb-2.$DOMAIN:27017" + externalAccess: + enabled: true + externalMaster: + enabled: true + host: "$PRIMARY_MONGO_DB" + service: + type: LoadBalancer + publicNames: + - "mongodb-0.$DOMAIN" + - "mongodb-1.$DOMAIN" + - "mongodb-2.$DOMAIN" + annotationsList: + - external-dns.alpha.kubernetes.io/hostname: "mongodb-0.$DOMAIN" + - external-dns.alpha.kubernetes.io/hostname: "mongodb-1.$DOMAIN" + - external-dns.alpha.kubernetes.io/hostname: "mongodb-2.$DOMAIN" +certmanager: + storage: + issuer: + kind: ClusterIssuer + name: storage-plgd-ca-issuer + internal: + issuer: + kind: ClusterIssuer + name: internal-plgd-ca-issuer + default: + ca: + issuerRef: + kind: ClusterIssuer + name: external-plgd-ca-issuer +httpgateway: + apiDomain: "$DOMAIN" +grpcgateway: + domain: "$DOMAIN" +certificateauthority: + domain: "$DOMAIN" +coapgateway: + service: + type: NodePort + nodePort: 15684 +resourcedirectory: + publicConfiguration: + coapGateway: "coaps+tcp://$DOMAIN:15684" +deviceProvisioningService: + apiDomain: "$DOMAIN" + service: + type: NodePort + image: + dockerConfigSecret: | + { + "auths": { + "ghcr.io": { + "auth": "$DOCKER_AUTH_TOKEN" + } + } + } + enrollmentGroups: + - id: "5db6ccde-05e1-480b-a522-c1591ad7dfd2" + owner: "1" + attestationMechanism: + x509: + certificateChain: |- + $MANUFACTURER_CERTIFICATE_CA + hub: + coapGateway: "$DOMAIN:15684" + certificateAuthority: + grpc: + address: "$DOMAIN:443" + authorization: + provider: + name: "plgd.dps" + clientId: "test" + clientSecret: "test" + audience: "https://$DOMAIN" +EOF +helm upgrade -i -n plgd --create-namespace -f values.yaml hub plgd/plgd-hub +helm upgrade -i -n plgd --create-namespace -f values.yaml dps plgd/plgd-dps +``` + +Next, we need to get the IP addresses of the MongoDB members and set them to the DNS server running on `192.168.1.1`, similar to the primary cluster. + +```bash +kubectl -n plgd get services | grep mongodb | grep LoadBalancer | awk '{print $1 ":" $4}' +echo " +192.168.1.222 mongodb-0.standby.plgd.cloud +192.168.1.220 mongodb-1.standby.plgd.cloud +192.168.1.221 mongodb-2.standby.plgd.cloud +" | sudo tee -a /etc/hosts +sudo systemctl restart dnsmasq +``` + +<< note >> +It is important that the `global.standby` flag is set to `true`, which means that plgd pods are not running on the standby cluster. +<< /note >> + +Once the MongoDB pods are running, we need to run the `mongodb-standby-tool` job to configure the MongoDB replica set. This configuration demotes the secondary members to hidden members. + +```bash +kubectl -n plgd patch job/$(kubectl -n standby-mock-plgd-cloud get jobs | grep mongodb-standby-tool | awk '{print $1}') --type=strategic --patch '{"spec":{"suspend":false}}' +``` + +Now the job will create the pod and configure the MongoDB replica set. + +## Disaster Recovery + +<< note >> +This steps could be used in case of planned maintenance. +<< /note >> + +### How to Switch to the Standby Cluster + +When the primary cluster is down, you need to switch to the standby cluster. + +#### Promote the Standby Cluster + +First, promote the hidden members to secondary members. To do this, upgrade the Helm chart with the `mongodb.standbyTool.mode` set to `active`. The active mode reconfigures the MongoDB replica set, promoting hidden members to secondary members and demoting the previous members to hidden. + +```bash +helm upgrade -i -n plgd --create-namespace -f values.yaml --set mongodb.standbyTool.mode=active hub plgd/plgd-hub +``` + +Next, delete the `mongodb-standby-tool` job and resume it to configure the MongoDB replica set. + +```bash +kubectl -n plgd delete job/$(kubectl -n plgd get jobs | grep mongodb-standby-tool | awk '{print $1}') +kubectl -n plgd patch job/$(kubectl -n plgd get jobs | grep mongodb-standby-tool | awk '{print $1}') --type=strategic --patch '{"spec":{"suspend":false}}' +``` + +The final step is to run plgd pods on the standby cluster. Set the `global.standby` flag to `false` and upgrade the Helm chart. + +```bash +helm upgrade -i -n plgd --create-namespace -f values.yaml --set global.standby=false hub plgd/plgd-hub +helm upgrade -i -n plgd --create-namespace -f values.yaml --set global.standby=false dps plgd/plgd-dps +``` + +After rotating the device provisioning endpoints, the devices will connect to the standby cluster. + +#### Turn Off plgd Pods on the Primary Cluster + +When the primary cluster is back up, set the `global.standby` flag to `true` and upgrade the Helm chart. + +```bash +helm upgrade -i -n plgd --create-namespace -f values.yaml --set global.standby=true hub plgd/plgd-hub +helm upgrade -i -n plgd --create-namespace -f values.yaml --set global.standby=true dps plgd/plgd-dps +``` + +### How to Switch Back to the Primary Cluster + +When the primary cluster is ready for devices, switch back to the primary cluster. + +#### Demote the Standby Cluster + +First, promote the primary cluster's MongoDB hidden members to secondary members and demote the standby cluster's MongoDB secondary members to hidden. Upgrade the Helm chart with the `mongodb.standbyTool.mode` set to `standby`. + +```bash +helm upgrade -i -n plgd --create-namespace -f values.yaml --set mongodb.standbyTool.mode=standby hub plgd/plgd-hub +``` + +Next, delete the `mongodb-standby-tool` job and resume it to configure the MongoDB replica set. + +```bash +kubectl -n plgd delete job/$(kubectl -n plgd get jobs | grep mongodb-standby-tool | awk '{print $1}') +kubectl -n plgd patch job/$(kubectl -n plgd get jobs | grep mongodb-standby-tool | awk '{print $1}') --type=strategic --patch '{"spec":{"suspend":false}}' +``` + +The final step is to run plgd pods on the standby cluster. Set the `global.standby` flag to `true` and upgrade the Helm chart. + +```bash +helm upgrade -i -n plgd --create-namespace -f values.yaml --set global.standby=true hub plgd/plgd-hub +helm upgrade -i -n plgd --create-namespace -f values.yaml --set global.standby=true dps plgd/plgd-dps +``` + +#### Turn On plgd Pods on the Primary Cluster + +When the standby cluster is ready for devices, switch back to the primary cluster. Set the `global.standby` flag to `false` and upgrade the Helm chart. + +```bash +helm upgrade -i -n plgd --create-namespace -f values.yaml --set global.standby=false hub plgd/plgd-hub +helm upgrade -i -n plgd --create-namespace -f values.yaml --set global.standby=false dps plgd/plgd-dps +``` + +After rotating the device provisioning endpoints, the devices will connect to the primary cluster. From 8dc045b598c0ba993edad18ad3b1fe19b4c6f925 Mon Sep 17 00:00:00 2001 From: Jozef Kralik Date: Wed, 19 Jun 2024 09:13:22 +0200 Subject: [PATCH 02/13] Update content/en/docs/tutorials/disaster-recovery-replica-set.md Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> --- .../tutorials/disaster-recovery-replica-set.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/content/en/docs/tutorials/disaster-recovery-replica-set.md b/content/en/docs/tutorials/disaster-recovery-replica-set.md index 71bae8b2..6acc1b55 100644 --- a/content/en/docs/tutorials/disaster-recovery-replica-set.md +++ b/content/en/docs/tutorials/disaster-recovery-replica-set.md @@ -29,9 +29,9 @@ The `mongodb.primary.plgd.cloud` is used for external access to the MongoDB repl This DNS needs to be resolved to the external IP address of the LoadBalancer. The external IP address of the LoadBalancer is used to connect to the MongoDB replica set from the other cluster. For clouds, you can use the [external-dns](https://github.com/kubernetes-sigs/external-dns/) tool to create DNS records in AWS Route53 / Google Cloud DNS / Azure DNS. In this tutorial, we show how to get the IPs of MongoDB services, and we will set them manually in /etc/hosts, then restart the dnsmasq daemon to load these changes on the computer with IP 192.168.1.1. -{{< note >}} +{{< warning >}} It is also recommended to set up a firewall between clusters with source IP address filtering to mitigate DDOS attacks on MongoDB. The default port for MongoDB is 27017. Alternatively, use a VPN to interconnect clusters. -{{< /note >}} +{{< /warning >}} ## Installation @@ -588,9 +588,9 @@ echo " sudo systemctl restart dnsmasq ``` -<< note >> +{{< note >}} It is important that the `global.standby` flag is set to `true`, which means that plgd pods are not running on the standby cluster. -<< /note >> +{{< /note >}} Once the MongoDB pods are running, we need to run the `mongodb-standby-tool` job to configure the MongoDB replica set. This configuration demotes the secondary members to hidden members. @@ -602,9 +602,9 @@ Now the job will create the pod and configure the MongoDB replica set. ## Disaster Recovery -<< note >> -This steps could be used in case of planned maintenance. -<< /note >> +{{< note >}} +These steps could be used in case of planned maintenance. +{{< /note >}} ### How to Switch to the Standby Cluster From 6dc7ef4ec284b1cfee05cabe755f83975bf3cb81 Mon Sep 17 00:00:00 2001 From: Jozef Kralik Date: Wed, 19 Jun 2024 11:36:44 +0000 Subject: [PATCH 03/13] update paths to new setup --- .../device-provisioning-service/troubleshooting.md | 2 +- content/en/docs/deployment/hub/advanced.md | 11 ++++++----- 2 files changed, 7 insertions(+), 6 deletions(-) diff --git a/content/en/docs/deployment/device-provisioning-service/troubleshooting.md b/content/en/docs/deployment/device-provisioning-service/troubleshooting.md index 5480eb56..74526249 100644 --- a/content/en/docs/deployment/device-provisioning-service/troubleshooting.md +++ b/content/en/docs/deployment/device-provisioning-service/troubleshooting.md @@ -82,5 +82,5 @@ If your device is unable to connect to the Hub, follow these steps: If your device can connect to the DPS service but is unable to retrieve certificates from the certificate authority or obtain an authorization code due to lack of trust, follow these steps: -- For the certificate authority, you need to append the certificate authority for that endpoint to the `global.authorizationCAPool` and set `deviceProvisioningService.enrollmentGroups[].hub.certificateAuthority.grpc.tls.caPool` to `/certs/extra/ca.crt` as described in the [Customize client certificates for DPS](/docs/deployment/device-provisioning-service/advanced#customize-client-certificates-for-dps) section. Alternatively, you can create an extra volume, mount it, and set the `deviceProvisioningService.enrollmentGroups[].hub.certificateAuthority.grpc.tls.caPool` field to the CA in that volume. +- For the certificate authority, you need to append the certificate authority for that endpoint to the `global.extraCAPool.authorization` and set `deviceProvisioningService.enrollmentGroups[].hub.certificateAuthority.grpc.tls.caPool` to `/certs/extra/ca.crt` as described in the [Customize client certificates for DPS](/docs/deployment/device-provisioning-service/advanced#customize-client-certificates-for-dps) section. Alternatively, you can create an extra volume, mount it, and set the `deviceProvisioningService.enrollmentGroups[].hub.certificateAuthority.grpc.tls.caPool` field to the CA in that volume. - For the authorization provider, follow similar steps as for the certificate authority, but set `enrollmentGroups.[].hub.authorization.provider.http.tls.caPool`. diff --git a/content/en/docs/deployment/hub/advanced.md b/content/en/docs/deployment/hub/advanced.md index bb5bfeef..658ad26f 100644 --- a/content/en/docs/deployment/hub/advanced.md +++ b/content/en/docs/deployment/hub/advanced.md @@ -47,11 +47,12 @@ used by plgd hub services. For including custom authorization CA pool into autho ```yaml global: - # -- Custom CA certificate for authorization endpoint in PEM format - authorizationCAPool: |- - -----BEGIN CERTIFICATE----- - your custom authorization CA pool in PEM format - -----END CERTIFICATE----- + extraCAPool: + # -- Custom CA certificate for authorization endpoint in PEM format + authorization: |- + -----BEGIN CERTIFICATE----- + your custom authorization CA pool in PEM format + -----END CERTIFICATE----- ``` {{< warning >}} From fc40da407bbbc8e628a3b66e17937b5967abe272 Mon Sep 17 00:00:00 2001 From: Jozef Kralik Date: Fri, 21 Jun 2024 14:45:55 +0000 Subject: [PATCH 04/13] fix tutorial --- .../disaster-recovery-replica-set.md | 111 ++++++++++-------- 1 file changed, 60 insertions(+), 51 deletions(-) diff --git a/content/en/docs/tutorials/disaster-recovery-replica-set.md b/content/en/docs/tutorials/disaster-recovery-replica-set.md index 6acc1b55..3e071aa3 100644 --- a/content/en/docs/tutorials/disaster-recovery-replica-set.md +++ b/content/en/docs/tutorials/disaster-recovery-replica-set.md @@ -19,6 +19,7 @@ The goal is to ensure that only MongoDBs from the primary and standby clusters c | For the primary cluster we have | For the standby cluster we have | | --- | --- | +| `primary.plgd.cloud` | `standby.plgd.cloud` | | `mongodb-0.primary.plgd.cloud` | `mongodb-0.standby.plgd.cloud` | | `mongodb-1.primary.plgd.cloud` | `mongodb-1.standby.plgd.cloud` | | `mongodb-2.primary.plgd.cloud` | `mongodb-2.standby.plgd.cloud` | @@ -139,6 +140,12 @@ To create certificates, you can use the cert-tool Docker image to generate root --cert.subject.cn=standby.storage.root.ca --cert.validFor=876000h ``` +### Preparing Device Provisioning Service dependencies + +The Device Provisioning Service (DPS) requires a certificate for the manufacturer. The certificate is used to authenticate the manufacturer when enrolling devices which need to stored in the file `.tmp/certs/manufacturer/tls.crt`. + +To download proprietary device provisioning service docker image, you need to have a token for the GitHub Container Registry. The token need to stored in the file `.tmp/tokens/plgd-docker-auth-token.txt`. + ### Setting up cert-manager on the Primary Cluster Ensure that you have cert-manager installed. @@ -291,15 +298,15 @@ DOMAIN="primary.plgd.cloud" HUB_ID="d03a1bb4-0a77-428c-b78c-1c46efe6a38e" OWNER_CLAIM="https://plgd.dev/owner" STANDBY=false -DOCKER_AUTH_TOKEN="" # Read certificate files -AUTHORIZATION_CA_IN_PEM=$(cat .tmp/certs/external/tls.crt) -INTERNAL_CA_IN_PEM=$(cat .tmp/primary/certs/internal/tls.crt) -EXTERNAL_CA_IN_PEM=$(cat .tmp/certs/external/tls.crt) -STORAGE_PRIMARY_CA_IN_PEM=$(cat .tmp/primary/certs/storage/tls.crt) -STORAGE_STANDBY_CA_IN_PEM=$(cat .tmp/standby/certs/storage/tls.crt) -MANUFACTURER_CERTIFICATE_CA="" +AUTHORIZATION_CA_IN_PEM=.tmp/certs/external/tls.crt +INTERNAL_CA_IN_PEM=.tmp/primary/certs/internal/tls.crt +EXTERNAL_CA_IN_PEM=.tmp/certs/external/tls.crt +STORAGE_PRIMARY_CA_IN_PEM=.tmp/primary/certs/storage/tls.crt +STORAGE_STANDBY_CA_IN_PEM=.tmp/standby/certs/storage/tls.crt +MANUFACTURER_CERTIFICATE_CA=.tmp/certs/manufacturer/tls.crt +DOCKER_AUTH_TOKEN=.tmp/tokens/plgd-docker-auth-token.txt # Create values.yaml file cat < values.yaml @@ -310,15 +317,15 @@ global: standby: $STANDBY extraCAPool: authorization: | - $AUTHORIZATION_CA_IN_PEM +$(sed 's/^/ /' $AUTHORIZATION_CA_IN_PEM) internal: | - $INTERNAL_CA_IN_PEM - $STORAGE_PRIMARY_CA_IN_PEM - $EXTERNAL_CA_IN_PEM +$(sed 's/^/ /' $INTERNAL_CA_IN_PEM) +$(sed 's/^/ /' $STORAGE_PRIMARY_CA_IN_PEM) +$(sed 's/^/ /' $EXTERNAL_CA_IN_PEM) storage: | - $STORAGE_PRIMARY_CA_IN_PEM - $STORAGE_STANDBY_CA_IN_PEM - $INTERNAL_CA_IN_PEM +$(sed 's/^/ /' $STORAGE_PRIMARY_CA_IN_PEM) +$(sed 's/^/ /' $STORAGE_STANDBY_CA_IN_PEM) +$(sed 's/^/ /' $INTERNAL_CA_IN_PEM) mockoauthserver: enabled: true oauth: @@ -386,7 +393,7 @@ deviceProvisioningService: { "auths": { "ghcr.io": { - "auth": "$DOCKER_AUTH_TOKEN" + "auth": "$(cat $DOCKER_AUTH_TOKEN)" } } } @@ -396,7 +403,7 @@ deviceProvisioningService: attestationMechanism: x509: certificateChain: |- - $MANUFACTURER_CERTIFICATE_CA +$(sed 's/^/ /' $MANUFACTURER_CERTIFICATE_CA) hub: coapGateway: "$DOMAIN:15684" certificateAuthority: @@ -437,7 +444,7 @@ After some time for the pods to start, you can access the Hub at `https://primar ### Deploy plgd on Standby Cluster -Deploying plgd to the standby cluster is similar to deploying it to the primary cluster. The differences are that the domain is `standby.plgd.cloud`, different internal and storage certificates are used, the standby flag is set to `true`, MongoDB is configured to use the master DB at `mongodb.primary.plgd.cloud`, and the `mongodb-standby-tool` job is enabled to configure the MongoDB replica set. +Deploying plgd to the standby cluster is similar to deploying it to the primary cluster. The differences are that the domain is `standby.plgd.cloud`, different internal and storage certificates are used, the standby flag is set to `true`, NATs is disabled and MongoDB is configured to use the master DB at `mongodb.primary.plgd.cloud`, and the `mongodb-standby-tool` job is enabled to configure the MongoDB replica set. ```bash # Set variables @@ -446,15 +453,15 @@ PRIMARY_MONGO_DB="mongodb.primary.plgd.cloud" HUB_ID="d03a1bb4-0a77-428c-b78c-1c46efe6a38e" OWNER_CLAIM="https://plgd.dev/owner" STANDBY=true -DOCKER_AUTH_TOKEN="" # Read certificate files -AUTHORIZATION_CA_IN_PEM=$(cat .tmp/certs/external/tls.crt) -EXTERNAL_CA_IN_PEM=$(cat .tmp/certs/external/tls.crt) -INTERNAL_CA_IN_PEM=$(cat .tmp/standby/certs/internal/tls.crt) -STORAGE_PRIMARY_CA_IN_PEM=$(cat .tmp/primary/certs/storage/tls.crt) -STORAGE_STANDBY_CA_IN_PEM=$(cat .tmp/standby/certs/storage/tls.crt) -MANUFACTURER_CERTIFICATE_CA="" +AUTHORIZATION_CA_IN_PEM=.tmp/certs/external/tls.crt +INTERNAL_CA_IN_PEM=.tmp/primary/certs/internal/tls.crt +EXTERNAL_CA_IN_PEM=.tmp/certs/external/tls.crt +STORAGE_PRIMARY_CA_IN_PEM=.tmp/primary/certs/storage/tls.crt +STORAGE_STANDBY_CA_IN_PEM=.tmp/standby/certs/storage/tls.crt +MANUFACTURER_CERTIFICATE_CA=.tmp/certs/manufacturer/tls.crt +DOCKER_AUTH_TOKEN=.tmp/tokens/plgd-docker-auth-token.txt # Create values.yaml file cat < values.yaml @@ -465,15 +472,15 @@ global: standby: $STANDBY extraCAPool: authorization: | - $AUTHORIZATION_CA_IN_PEM +$(sed 's/^/ /' $AUTHORIZATION_CA_IN_PEM) internal: | - $INTERNAL_CA_IN_PEM - $STORAGE_STANDBY_CA_IN_PEM - $EXTERNAL_CA_IN_PEM +$(sed 's/^/ /' $INTERNAL_CA_IN_PEM) +$(sed 's/^/ /' $STORAGE_STANDBY_CA_IN_PEM) +$(sed 's/^/ /' $EXTERNAL_CA_IN_PEM) storage: | - $STORAGE_PRIMARY_CA_IN_PEM - $STORAGE_STANDBY_CA_IN_PEM - $INTERNAL_CA_IN_PEM +$(sed 's/^/ /' $STORAGE_PRIMARY_CA_IN_PEM) +$(sed 's/^/ /' $STORAGE_STANDBY_CA_IN_PEM) +$(sed 's/^/ /' $INTERNAL_CA_IN_PEM) mockoauthserver: enabled: true oauth: @@ -513,6 +520,8 @@ mongodb: - external-dns.alpha.kubernetes.io/hostname: "mongodb-0.$DOMAIN" - external-dns.alpha.kubernetes.io/hostname: "mongodb-1.$DOMAIN" - external-dns.alpha.kubernetes.io/hostname: "mongodb-2.$DOMAIN" +nats: + enabled: false certmanager: storage: issuer: @@ -549,7 +558,7 @@ deviceProvisioningService: { "auths": { "ghcr.io": { - "auth": "$DOCKER_AUTH_TOKEN" + "auth": "$(cat $DOCKER_AUTH_TOKEN)" } } } @@ -559,7 +568,7 @@ deviceProvisioningService: attestationMechanism: x509: certificateChain: |- - $MANUFACTURER_CERTIFICATE_CA +$(sed 's/^/ /' $MANUFACTURER_CERTIFICATE_CA) hub: coapGateway: "$DOMAIN:15684" certificateAuthority: @@ -595,7 +604,7 @@ It is important that the `global.standby` flag is set to `true`, which means tha Once the MongoDB pods are running, we need to run the `mongodb-standby-tool` job to configure the MongoDB replica set. This configuration demotes the secondary members to hidden members. ```bash -kubectl -n plgd patch job/$(kubectl -n standby-mock-plgd-cloud get jobs | grep mongodb-standby-tool | awk '{print $1}') --type=strategic --patch '{"spec":{"suspend":false}}' +kubectl -n plgd patch job/$(kubectl -n plgd get jobs | grep mongodb-standby-tool | awk '{print $1}') --type=strategic --patch '{"spec":{"suspend":false}}' ``` Now the job will create the pod and configure the MongoDB replica set. @@ -612,35 +621,35 @@ When the primary cluster is down, you need to switch to the standby cluster. #### Promote the Standby Cluster -First, promote the hidden members to secondary members. To do this, upgrade the Helm chart with the `mongodb.standbyTool.mode` set to `active`. The active mode reconfigures the MongoDB replica set, promoting hidden members to secondary members and demoting the previous members to hidden. +First, promote the hidden members to secondary members. To do this, upgrade the Helm chart with the `mongodb.standbyTool.mode` set to `active`. The active mode reconfigures the MongoDB replica set, promoting hidden members to secondary members and demoting the previous members to hidden. To do that we need to delete the `mongodb-standby-tool` job and upgrade the Helm chart which will create a new job. ```bash +kubectl -n plgd delete job/$(kubectl -n plgd get jobs | grep mongodb-standby-tool | awk '{print $1}') helm upgrade -i -n plgd --create-namespace -f values.yaml --set mongodb.standbyTool.mode=active hub plgd/plgd-hub ``` -Next, delete the `mongodb-standby-tool` job and resume it to configure the MongoDB replica set. +Next, resume the job to configure the MongoDB replica set. ```bash -kubectl -n plgd delete job/$(kubectl -n plgd get jobs | grep mongodb-standby-tool | awk '{print $1}') kubectl -n plgd patch job/$(kubectl -n plgd get jobs | grep mongodb-standby-tool | awk '{print $1}') --type=strategic --patch '{"spec":{"suspend":false}}' ``` -The final step is to run plgd pods on the standby cluster. Set the `global.standby` flag to `false` and upgrade the Helm chart. +The final step is to run plgd pods on the standby cluster. Set the `global.standby` flag to `false`, enable NATs via `nats.enabled=true` and upgrade the Helm chart. ```bash -helm upgrade -i -n plgd --create-namespace -f values.yaml --set global.standby=false hub plgd/plgd-hub -helm upgrade -i -n plgd --create-namespace -f values.yaml --set global.standby=false dps plgd/plgd-dps +helm upgrade -i -n plgd --create-namespace -f values.yaml --set mongodb.standbyTool.mode=active --set global.standby=false --set nats.enabled=true hub plgd/plgd-hub +helm upgrade -i -n plgd --create-namespace -f values.yaml --set mongodb.standbyTool.mode=active --set global.standby=false --set nats.enabled=true dps plgd/plgd-dps ``` After rotating the device provisioning endpoints, the devices will connect to the standby cluster. #### Turn Off plgd Pods on the Primary Cluster -When the primary cluster is back up, set the `global.standby` flag to `true` and upgrade the Helm chart. +When the primary cluster is back up, set the `global.standby` flag to `true`, disable NATs via `nats.enabled=false` and upgrade the Helm chart. ```bash -helm upgrade -i -n plgd --create-namespace -f values.yaml --set global.standby=true hub plgd/plgd-hub -helm upgrade -i -n plgd --create-namespace -f values.yaml --set global.standby=true dps plgd/plgd-dps +helm upgrade -i -n plgd --create-namespace -f values.yaml --set global.standby=true --set nats.enabled=false hub plgd/plgd-hub +helm upgrade -i -n plgd --create-namespace -f values.yaml --set global.standby=true --set nats.enabled=false dps plgd/plgd-dps ``` ### How to Switch Back to the Primary Cluster @@ -649,33 +658,33 @@ When the primary cluster is ready for devices, switch back to the primary cluste #### Demote the Standby Cluster -First, promote the primary cluster's MongoDB hidden members to secondary members and demote the standby cluster's MongoDB secondary members to hidden. Upgrade the Helm chart with the `mongodb.standbyTool.mode` set to `standby`. +First, promote the primary cluster's MongoDB hidden members to secondary members and demote the standby cluster's MongoDB secondary members to hidden. Upgrade the Helm chart with the `mongodb.standbyTool.mode` set to `standby`. To do that we need to delete the `mongodb-standby-tool` job and upgrade the Helm chart which will create a new job. ```bash +kubectl -n plgd delete job/$(kubectl -n plgd get jobs | grep mongodb-standby-tool | awk '{print $1}') helm upgrade -i -n plgd --create-namespace -f values.yaml --set mongodb.standbyTool.mode=standby hub plgd/plgd-hub ``` Next, delete the `mongodb-standby-tool` job and resume it to configure the MongoDB replica set. ```bash -kubectl -n plgd delete job/$(kubectl -n plgd get jobs | grep mongodb-standby-tool | awk '{print $1}') kubectl -n plgd patch job/$(kubectl -n plgd get jobs | grep mongodb-standby-tool | awk '{print $1}') --type=strategic --patch '{"spec":{"suspend":false}}' ``` -The final step is to run plgd pods on the standby cluster. Set the `global.standby` flag to `true` and upgrade the Helm chart. +The final step is to run plgd pods on the standby cluster. Set the `global.standby` flag to `true`, disable NATs via `nats.enabled=false` and upgrade the Helm chart. ```bash -helm upgrade -i -n plgd --create-namespace -f values.yaml --set global.standby=true hub plgd/plgd-hub -helm upgrade -i -n plgd --create-namespace -f values.yaml --set global.standby=true dps plgd/plgd-dps +helm upgrade -i -n plgd --create-namespace -f values.yaml --set mongodb.standbyTool.mode=standby --set global.standby=true --set nats.enabled=false hub plgd/plgd-hub +helm upgrade -i -n plgd --create-namespace -f values.yaml --set mongodb.standbyTool.mode=standby --set global.standby=true --set nats.enabled=false dps plgd/plgd-dps ``` #### Turn On plgd Pods on the Primary Cluster -When the standby cluster is ready for devices, switch back to the primary cluster. Set the `global.standby` flag to `false` and upgrade the Helm chart. +When the standby cluster is ready for devices, switch back to the primary cluster. Set the `global.standby` flag to `false`, enable NATs via `nats.enabled=true` and upgrade the Helm chart. ```bash -helm upgrade -i -n plgd --create-namespace -f values.yaml --set global.standby=false hub plgd/plgd-hub -helm upgrade -i -n plgd --create-namespace -f values.yaml --set global.standby=false dps plgd/plgd-dps +helm upgrade -i -n plgd --create-namespace -f values.yaml --set global.standby=false --set nats.enabled=true hub plgd/plgd-hub +helm upgrade -i -n plgd --create-namespace -f values.yaml --set global.standby=false --set nats.enabled=true dps plgd/plgd-dps ``` After rotating the device provisioning endpoints, the devices will connect to the primary cluster. From 71075697990a98633bbe17d84ee539b6ce8d612d Mon Sep 17 00:00:00 2001 From: Jozef Kralik Date: Mon, 24 Jun 2024 08:28:34 +0000 Subject: [PATCH 05/13] update doc --- .../disaster-recovery-replica-set.md | 47 ++++++++++--------- 1 file changed, 24 insertions(+), 23 deletions(-) diff --git a/content/en/docs/tutorials/disaster-recovery-replica-set.md b/content/en/docs/tutorials/disaster-recovery-replica-set.md index 3e071aa3..120748f1 100644 --- a/content/en/docs/tutorials/disaster-recovery-replica-set.md +++ b/content/en/docs/tutorials/disaster-recovery-replica-set.md @@ -1,19 +1,21 @@ --- + title: 'Disaster Recovery via Replica Set' description: 'How to perform disaster recovery with a Replica Set?' date: '2024-06-17' categories: [architecture, d2c, provisioning, disaster-recovery] keywords: [architecture, d2c, provisioning, disaster-recovery] weight: 15 + --- -The plgd-hub Helm charts support disaster recovery via a MongoDB replica set because the source of truth is stored in the MongoDB database. It is required that devices have configured **device provisioning endpoints** for both clusters' device provisioning services. In this tutorial, we have two MicroK8s clusters: primary and standby. Each of them uses three root CA certificates: +The plgd-hub Helm charts support disaster recovery via a MongoDB replica set, as the source of truth is stored in the MongoDB database. Devices need to have configured **device provisioning endpoints** for both clusters' device provisioning services. In this tutorial, we have two MicroK8s clusters: primary and standby. Each cluster uses three root CA certificates: - `external CA certificate pair`: Used for public APIs (CoAP, HTTPS, gRPC) and is the same for both clusters. - `internal CA certificate pair`: Used for plgd services to communicate with each other, MongoDB, and NATs. Each cluster has its own internal CA certificate. - `storage CA certificate pair`: Used for MongoDB. Each cluster has its own storage CA certificate. -We also use an `authorization CA certificate` to communicate with the OAuth2 authorization server. In this tutorial, `mock-oauth-server` and its certificate are signed by the `external CA certificate pair`. Thus, we have only one `authorization CA certificate` for both clusters, which is the `external CA certificate`. +We also use an `authorization CA certificate` to communicate with the OAuth2 authorization server. In this tutorial, `mock-oauth-server` and its certificate are signed by the `external CA certificate pair`. Therefore, we have only one `authorization CA certificate` for both clusters, which is the `external CA certificate`. The goal is to ensure that only MongoDBs from the primary and standby clusters can communicate with each other, while plgd services can only connect to the MongoDB in their respective clusters. All APIs will be available on the root domain `primary.plgd.cloud` for the primary cluster and `standby.plgd.cloud` for the standby cluster. Additionally, MongoDB members are exposed via the LoadBalancer service type, and each member needs its own DNS name. @@ -27,8 +29,7 @@ The goal is to ensure that only MongoDBs from the primary and standby clusters c The `mongodb.primary.plgd.cloud` is used for external access to the MongoDB replica set for the standby cluster. This DNS record is an alias for all members of the primary cluster. -This DNS needs to be resolved to the external IP address of the LoadBalancer. The external IP address of the LoadBalancer is used to connect to the MongoDB replica set from the other cluster. For clouds, you can use the [external-dns](https://github.com/kubernetes-sigs/external-dns/) tool to create DNS records in AWS Route53 / Google Cloud DNS / Azure DNS. -In this tutorial, we show how to get the IPs of MongoDB services, and we will set them manually in /etc/hosts, then restart the dnsmasq daemon to load these changes on the computer with IP 192.168.1.1. +This DNS needs to be resolved to the external IP address of the LoadBalancer. The external IP address of the LoadBalancer is used to connect to the MongoDB replica set from the other cluster. For cloud environments, you can use the [external-dns](https://github.com/kubernetes-sigs/external-dns/) tool to create DNS records in AWS Route53, Google Cloud DNS, or Azure DNS. In this tutorial, we will show how to get the IPs of MongoDB services and manually set them in /etc/hosts. Then, we will restart the dnsmasq daemon to load these changes on a computer with the IP 192.168.1.1. {{< warning >}} It is also recommended to set up a firewall between clusters with source IP address filtering to mitigate DDOS attacks on MongoDB. The default port for MongoDB is 27017. Alternatively, use a VPN to interconnect clusters. @@ -38,7 +39,7 @@ It is also recommended to set up a firewall between clusters with source IP addr ### MicroK8s Prerequisites -The following addons are expected to be enabled on both clusters, with **Kubernetes v1.24+** installed. +The following addons should be enabled on both clusters, with **Kubernetes v1.24+** installed: ```yaml addons: @@ -53,14 +54,14 @@ addons: metallb # (core) Loadbalancer for your Kubernetes cluster ``` -The [dns](https://microk8s.io/docs/addon-dns) addon is configured to use a DNS server that hosts all records for `primary.plgd.cloud` and `standby.plgd.cloud` domains. To configure DNS in MicroK8s, you can use the following command: +The [dns](https://microk8s.io/docs/addon-dns) addon is configured to use a DNS server that hosts all records for the `primary.plgd.cloud` and `standby.plgd.cloud` domains. To configure DNS in MicroK8s, you can use the following commands: ```bash microk8s disable dns microk8s enable dns:192.168.1.1 ``` -For [metallb](https://microk8s.io/docs/addon-metallb), we need to set up the IP address pool for the LoadBalancer service type. The IP address pool needs to be accessible from the network where the MicroK8s is running. It is important that the IP address is not used by any other device in the network and that the DHCP server is not assigning this IP address to any device. +For [metallb](https://microk8s.io/docs/addon-metallb), we need to set up the IP address pool for the LoadBalancer service type. The IP address pool needs to be accessible from the network where MicroK8s is running. Ensure the IP address is not used by any other device in the network and that the DHCP server is not assigning this IP address to any device. Example for the primary cluster: @@ -78,7 +79,7 @@ microk8s enable metallb:192.168.1.220-192.168.1.239 ### Creating Certificates -To create certificates, you can use the cert-tool Docker image to generate root CA certificates for the services. +To create certificates, you can use the `cert-tool` Docker image to generate root CA certificates for the services. 1. Create the external CA certificate pair (same for both clusters): @@ -140,13 +141,13 @@ To create certificates, you can use the cert-tool Docker image to generate root --cert.subject.cn=standby.storage.root.ca --cert.validFor=876000h ``` -### Preparing Device Provisioning Service dependencies +### Preparing Device Provisioning Service Dependencies -The Device Provisioning Service (DPS) requires a certificate for the manufacturer. The certificate is used to authenticate the manufacturer when enrolling devices which need to stored in the file `.tmp/certs/manufacturer/tls.crt`. +The Device Provisioning Service (DPS) requires a certificate for the manufacturer. This certificate is used to authenticate the manufacturer when enrolling devices, and it needs to be stored in the file `.tmp/certs/manufacturer/tls.crt`. -To download proprietary device provisioning service docker image, you need to have a token for the GitHub Container Registry. The token need to stored in the file `.tmp/tokens/plgd-docker-auth-token.txt`. +To download the proprietary device provisioning service Docker image, you need to have a token for the GitHub Container Registry. This token needs to be stored in the file `.tmp/tokens/plgd-docker-auth-token.txt`. -### Setting up cert-manager on the Primary Cluster +### Setting Up cert-manager on the Primary Cluster Ensure that you have cert-manager installed. @@ -216,7 +217,7 @@ Ensure that you have cert-manager installed. EOF ``` -### Setting up cert-manager on the Standby Cluster +### Setting Up cert-manager on the Standby Cluster Ensure that you have cert-manager installed on the standby cluster as well. @@ -286,7 +287,7 @@ Ensure that you have cert-manager installed on the standby cluster as well. EOF ``` -### Deploy plgd on Primary Cluster +### Deploy plgd on the Primary Cluster The primary cluster will deploy the Hub with all APIs exposed on the `primary.plgd.cloud` domain. The CoAP gateway listens on NodePort `15684`, and the device provisioning service listens on NodePort `5684`. The MongoDB replica set is exposed via a LoadBalancer service type, requiring a client certificate (mTLS) to connect to MongoDB. @@ -442,9 +443,9 @@ sudo systemctl restart dnsmasq After some time for the pods to start, you can access the Hub at `https://primary.plgd.cloud`. -### Deploy plgd on Standby Cluster +### Deploy plgd on the Standby Cluster -Deploying plgd to the standby cluster is similar to deploying it to the primary cluster. The differences are that the domain is `standby.plgd.cloud`, different internal and storage certificates are used, the standby flag is set to `true`, NATs is disabled and MongoDB is configured to use the master DB at `mongodb.primary.plgd.cloud`, and the `mongodb-standby-tool` job is enabled to configure the MongoDB replica set. +Deploying plgd to the standby cluster is similar to deploying it to the primary cluster. The differences are that the domain is `standby.plgd.cloud`, different internal and storage certificates are used, the standby flag is set to `true`, NATs is disabled, and MongoDB is configured to use the master DB at `mongodb.primary.plgd.cloud`. Additionally, the `mongodb-standby-tool` job is enabled to configure the MongoDB replica set. ```bash # Set variables @@ -621,7 +622,7 @@ When the primary cluster is down, you need to switch to the standby cluster. #### Promote the Standby Cluster -First, promote the hidden members to secondary members. To do this, upgrade the Helm chart with the `mongodb.standbyTool.mode` set to `active`. The active mode reconfigures the MongoDB replica set, promoting hidden members to secondary members and demoting the previous members to hidden. To do that we need to delete the `mongodb-standby-tool` job and upgrade the Helm chart which will create a new job. +First, promote the hidden members to secondary members. To do this, upgrade the Helm chart with the `mongodb.standbyTool.mode` set to `active`. The active mode reconfigures the MongoDB replica set, promoting hidden members to secondary members and demoting the previous members to hidden. To do this, delete the `mongodb-standby-tool` job and upgrade the Helm chart, which will create a new job. ```bash kubectl -n plgd delete job/$(kubectl -n plgd get jobs | grep mongodb-standby-tool | awk '{print $1}') @@ -634,7 +635,7 @@ Next, resume the job to configure the MongoDB replica set. kubectl -n plgd patch job/$(kubectl -n plgd get jobs | grep mongodb-standby-tool | awk '{print $1}') --type=strategic --patch '{"spec":{"suspend":false}}' ``` -The final step is to run plgd pods on the standby cluster. Set the `global.standby` flag to `false`, enable NATs via `nats.enabled=true` and upgrade the Helm chart. +The final step is to run plgd pods on the standby cluster. Set the `global.standby` flag to `false`, enable NATs via `nats.enabled=true`, and upgrade the Helm chart. ```bash helm upgrade -i -n plgd --create-namespace -f values.yaml --set mongodb.standbyTool.mode=active --set global.standby=false --set nats.enabled=true hub plgd/plgd-hub @@ -645,7 +646,7 @@ After rotating the device provisioning endpoints, the devices will connect to th #### Turn Off plgd Pods on the Primary Cluster -When the primary cluster is back up, set the `global.standby` flag to `true`, disable NATs via `nats.enabled=false` and upgrade the Helm chart. +When the primary cluster is back up, set the `global.standby` flag to `true`, disable NATs via `nats.enabled=false`, and upgrade the Helm chart. ```bash helm upgrade -i -n plgd --create-namespace -f values.yaml --set global.standby=true --set nats.enabled=false hub plgd/plgd-hub @@ -658,20 +659,20 @@ When the primary cluster is ready for devices, switch back to the primary cluste #### Demote the Standby Cluster -First, promote the primary cluster's MongoDB hidden members to secondary members and demote the standby cluster's MongoDB secondary members to hidden. Upgrade the Helm chart with the `mongodb.standbyTool.mode` set to `standby`. To do that we need to delete the `mongodb-standby-tool` job and upgrade the Helm chart which will create a new job. +First, promote the primary cluster's MongoDB hidden members to secondary members and demote the standby cluster's MongoDB secondary members to hidden. Upgrade the Helm chart with the `mongodb.standbyTool.mode` set to `standby`. To do this, delete the `mongodb-standby-tool` job and upgrade the Helm chart, which will create a new job. ```bash kubectl -n plgd delete job/$(kubectl -n plgd get jobs | grep mongodb-standby-tool | awk '{print $1}') helm upgrade -i -n plgd --create-namespace -f values.yaml --set mongodb.standbyTool.mode=standby hub plgd/plgd-hub ``` -Next, delete the `mongodb-standby-tool` job and resume it to configure the MongoDB replica set. +Next, patch the `mongodb-standby-tool` job to resume it and configure the MongoDB replica set. ```bash kubectl -n plgd patch job/$(kubectl -n plgd get jobs | grep mongodb-standby-tool | awk '{print $1}') --type=strategic --patch '{"spec":{"suspend":false}}' ``` -The final step is to run plgd pods on the standby cluster. Set the `global.standby` flag to `true`, disable NATs via `nats.enabled=false` and upgrade the Helm chart. +The final step is to run plgd pods on the standby cluster. Set the `global.standby` flag to `true`, disable NATs via `nats.enabled=false`, and upgrade the Helm chart. ```bash helm upgrade -i -n plgd --create-namespace -f values.yaml --set mongodb.standbyTool.mode=standby --set global.standby=true --set nats.enabled=false hub plgd/plgd-hub @@ -680,7 +681,7 @@ helm upgrade -i -n plgd --create-namespace -f values.yaml --set mongodb.standbyT #### Turn On plgd Pods on the Primary Cluster -When the standby cluster is ready for devices, switch back to the primary cluster. Set the `global.standby` flag to `false`, enable NATs via `nats.enabled=true` and upgrade the Helm chart. +When the standby cluster is ready for devices, switch back to the primary cluster. Set the `global.standby` flag to `false`, enable NATs via `nats.enabled=true`, and upgrade the Helm chart. ```bash helm upgrade -i -n plgd --create-namespace -f values.yaml --set global.standby=false --set nats.enabled=true hub plgd/plgd-hub From e341efe06182b5728c54d6d95231b9b4086143ea Mon Sep 17 00:00:00 2001 From: Jozef Kralik Date: Mon, 24 Jun 2024 18:54:15 +0000 Subject: [PATCH 06/13] fix scripts --- .../tutorials/disaster-recovery-replica-set.md | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/content/en/docs/tutorials/disaster-recovery-replica-set.md b/content/en/docs/tutorials/disaster-recovery-replica-set.md index 120748f1..71831d3d 100644 --- a/content/en/docs/tutorials/disaster-recovery-replica-set.md +++ b/content/en/docs/tutorials/disaster-recovery-replica-set.md @@ -25,9 +25,9 @@ The goal is to ensure that only MongoDBs from the primary and standby clusters c | `mongodb-0.primary.plgd.cloud` | `mongodb-0.standby.plgd.cloud` | | `mongodb-1.primary.plgd.cloud` | `mongodb-1.standby.plgd.cloud` | | `mongodb-2.primary.plgd.cloud` | `mongodb-2.standby.plgd.cloud` | -| `mongodb.primary.plgd.cloud` | `----` | +| `mongodb.primary.plgd.cloud` | `mongodb.standby.plgd.cloud` | -The `mongodb.primary.plgd.cloud` is used for external access to the MongoDB replica set for the standby cluster. This DNS record is an alias for all members of the primary cluster. +The `mongodb.primary.plgd.cloud` and `mongodb.standby.plgd.cloud` are aliases for all members of the primary and standby clusters, respectively. Also the `mongodb.primary.plgd.cloud` is used for external access to the MongoDB replica set for the standby cluster. This DNS needs to be resolved to the external IP address of the LoadBalancer. The external IP address of the LoadBalancer is used to connect to the MongoDB replica set from the other cluster. For cloud environments, you can use the [external-dns](https://github.com/kubernetes-sigs/external-dns/) tool to create DNS records in AWS Route53, Google Cloud DNS, or Azure DNS. In this tutorial, we will show how to get the IPs of MongoDB services and manually set them in /etc/hosts. Then, we will restart the dnsmasq daemon to load these changes on a computer with the IP 192.168.1.1. @@ -316,6 +316,7 @@ global: hubId: "$HUB_ID" ownerClaim: "$OWNER_CLAIM" standby: $STANDBY + mongoUri: "mongodb://mongodb.$DOMAIN:27017/?replicaSet=rs0" extraCAPool: authorization: | $(sed 's/^/ /' $AUTHORIZATION_CA_IN_PEM) @@ -457,7 +458,7 @@ STANDBY=true # Read certificate files AUTHORIZATION_CA_IN_PEM=.tmp/certs/external/tls.crt -INTERNAL_CA_IN_PEM=.tmp/primary/certs/internal/tls.crt +INTERNAL_CA_IN_PEM=.tmp/standby/certs/internal/tls.crt EXTERNAL_CA_IN_PEM=.tmp/certs/external/tls.crt STORAGE_PRIMARY_CA_IN_PEM=.tmp/primary/certs/storage/tls.crt STORAGE_STANDBY_CA_IN_PEM=.tmp/standby/certs/storage/tls.crt @@ -471,6 +472,7 @@ global: hubId: "$HUB_ID" ownerClaim: "$OWNER_CLAIM" standby: $STANDBY + mongoUri: "mongodb://mongodb.$DOMAIN:27017/?replicaSet=rs0" extraCAPool: authorization: | $(sed 's/^/ /' $AUTHORIZATION_CA_IN_PEM) @@ -498,6 +500,9 @@ mockoauthserver: scopes: ['openid'] useInUi: true mongodb: + tls: + extraDnsNames: + - "mongodb.$DOMAIN" standbyTool: enabled: true replicaSet: @@ -591,9 +596,9 @@ Next, we need to get the IP addresses of the MongoDB members and set them to the ```bash kubectl -n plgd get services | grep mongodb | grep LoadBalancer | awk '{print $1 ":" $4}' echo " -192.168.1.222 mongodb-0.standby.plgd.cloud -192.168.1.220 mongodb-1.standby.plgd.cloud -192.168.1.221 mongodb-2.standby.plgd.cloud +192.168.1.222 mongodb-0.standby.plgd.cloud mongodb.standby.plgd.cloud +192.168.1.220 mongodb-1.standby.plgd.cloud mongodb.standby.plgd.cloud +192.168.1.221 mongodb-2.standby.plgd.cloud mongodb.standby.plgd.cloud " | sudo tee -a /etc/hosts sudo systemctl restart dnsmasq ``` From c56882331b8df0509c8420c307fe9f071f6feaa3 Mon Sep 17 00:00:00 2001 From: Jozef Kralik Date: Tue, 25 Jun 2024 06:47:01 +0000 Subject: [PATCH 07/13] getting ip and set --- content/en/docs/tutorials/disaster-recovery-replica-set.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/content/en/docs/tutorials/disaster-recovery-replica-set.md b/content/en/docs/tutorials/disaster-recovery-replica-set.md index 71831d3d..9fdef9f5 100644 --- a/content/en/docs/tutorials/disaster-recovery-replica-set.md +++ b/content/en/docs/tutorials/disaster-recovery-replica-set.md @@ -595,6 +595,12 @@ Next, we need to get the IP addresses of the MongoDB members and set them to the ```bash kubectl -n plgd get services | grep mongodb | grep LoadBalancer | awk '{print $1 ":" $4}' +mongodb-0-external:192.168.1.222 +mongodb-1-external:192.168.1.220 +mongodb-2-external:192.168.1.221 +``` + +```bash echo " 192.168.1.222 mongodb-0.standby.plgd.cloud mongodb.standby.plgd.cloud 192.168.1.220 mongodb-1.standby.plgd.cloud mongodb.standby.plgd.cloud From fe8ede2bd9d78119b5d7be3537d203a6763a092d Mon Sep 17 00:00:00 2001 From: Jozef Kralik Date: Tue, 25 Jun 2024 07:51:41 +0000 Subject: [PATCH 08/13] fix sentences --- content/en/docs/tutorials/disaster-recovery-replica-set.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/content/en/docs/tutorials/disaster-recovery-replica-set.md b/content/en/docs/tutorials/disaster-recovery-replica-set.md index 9fdef9f5..6fe0f82f 100644 --- a/content/en/docs/tutorials/disaster-recovery-replica-set.md +++ b/content/en/docs/tutorials/disaster-recovery-replica-set.md @@ -9,7 +9,7 @@ weight: 15 --- -The plgd-hub Helm charts support disaster recovery via a MongoDB replica set, as the source of truth is stored in the MongoDB database. Devices need to have configured **device provisioning endpoints** for both clusters' device provisioning services. In this tutorial, we have two MicroK8s clusters: primary and standby. Each cluster uses three root CA certificates: +The plgd-hub Helm charts support disaster recovery via a MongoDB replica set, as the source of truth is stored in the MongoDB database. In this tutorial, we have two MicroK8s clusters: primary and standby. Devices need to have configured **device provisioning endpoints** for both clusters' device provisioning services(`coaps+tcp://primary.plgd.cloud:5684` and `coaps+tcp://standby.plgd.cloud:5684`). Each cluster uses three root CA certificates: - `external CA certificate pair`: Used for public APIs (CoAP, HTTPS, gRPC) and is the same for both clusters. - `internal CA certificate pair`: Used for plgd services to communicate with each other, MongoDB, and NATs. Each cluster has its own internal CA certificate. @@ -27,7 +27,7 @@ The goal is to ensure that only MongoDBs from the primary and standby clusters c | `mongodb-2.primary.plgd.cloud` | `mongodb-2.standby.plgd.cloud` | | `mongodb.primary.plgd.cloud` | `mongodb.standby.plgd.cloud` | -The `mongodb.primary.plgd.cloud` and `mongodb.standby.plgd.cloud` are aliases for all members of the primary and standby clusters, respectively. Also the `mongodb.primary.plgd.cloud` is used for external access to the MongoDB replica set for the standby cluster. +The `mongodb.primary.plgd.cloud` and `mongodb.standby.plgd.cloud` are aliases for all members of the primary and standby clusters, respectively. Also, the `mongodb.primary.plgd.cloud` is used for external access to the MongoDB replica set for the standby cluster. This DNS needs to be resolved to the external IP address of the LoadBalancer. The external IP address of the LoadBalancer is used to connect to the MongoDB replica set from the other cluster. For cloud environments, you can use the [external-dns](https://github.com/kubernetes-sigs/external-dns/) tool to create DNS records in AWS Route53, Google Cloud DNS, or Azure DNS. In this tutorial, we will show how to get the IPs of MongoDB services and manually set them in /etc/hosts. Then, we will restart the dnsmasq daemon to load these changes on a computer with the IP 192.168.1.1. @@ -633,7 +633,7 @@ When the primary cluster is down, you need to switch to the standby cluster. #### Promote the Standby Cluster -First, promote the hidden members to secondary members. To do this, upgrade the Helm chart with the `mongodb.standbyTool.mode` set to `active`. The active mode reconfigures the MongoDB replica set, promoting hidden members to secondary members and demoting the previous members to hidden. To do this, delete the `mongodb-standby-tool` job and upgrade the Helm chart, which will create a new job. +First, promote the hidden members to secondary members by upgrading the Helm chart with the `mongodb.standbyTool.mode` set to `active`. The active mode reconfigures the MongoDB replica set, promoting hidden members to secondary members and demoting the previous members to hidden. Begin by deleting the `mongodb-standby-tool` job, then upgrade the Helm chart to initiate a new job. ```bash kubectl -n plgd delete job/$(kubectl -n plgd get jobs | grep mongodb-standby-tool | awk '{print $1}') From 73ba52345531ff3d183f7f7df0ce92a21c804d9d Mon Sep 17 00:00:00 2001 From: Jozef Kralik Date: Tue, 25 Jun 2024 07:58:12 +0000 Subject: [PATCH 09/13] update table --- .../tutorials/disaster-recovery-replica-set.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/content/en/docs/tutorials/disaster-recovery-replica-set.md b/content/en/docs/tutorials/disaster-recovery-replica-set.md index 6fe0f82f..602e8c5c 100644 --- a/content/en/docs/tutorials/disaster-recovery-replica-set.md +++ b/content/en/docs/tutorials/disaster-recovery-replica-set.md @@ -19,13 +19,13 @@ We also use an `authorization CA certificate` to communicate with the OAuth2 aut The goal is to ensure that only MongoDBs from the primary and standby clusters can communicate with each other, while plgd services can only connect to the MongoDB in their respective clusters. All APIs will be available on the root domain `primary.plgd.cloud` for the primary cluster and `standby.plgd.cloud` for the standby cluster. Additionally, MongoDB members are exposed via the LoadBalancer service type, and each member needs its own DNS name. -| For the primary cluster we have | For the standby cluster we have | -| --- | --- | -| `primary.plgd.cloud` | `standby.plgd.cloud` | -| `mongodb-0.primary.plgd.cloud` | `mongodb-0.standby.plgd.cloud` | -| `mongodb-1.primary.plgd.cloud` | `mongodb-1.standby.plgd.cloud` | -| `mongodb-2.primary.plgd.cloud` | `mongodb-2.standby.plgd.cloud` | -| `mongodb.primary.plgd.cloud` | `mongodb.standby.plgd.cloud` | +| For the primary cluster we have | For the standby cluster we have | Purpose | +| --- | --- | --- | +| `primary.plgd.cloud` | `standby.plgd.cloud` | Public access to the cluster | +| `mongodb-0.primary.plgd.cloud` | `mongodb-0.standby.plgd.cloud` | The MongoDB member | +| `mongodb-1.primary.plgd.cloud` | `mongodb-1.standby.plgd.cloud` | The MongoDB member | +| `mongodb-2.primary.plgd.cloud` | `mongodb-2.standby.plgd.cloud` | The MongoDB member | +| `mongodb.primary.plgd.cloud` | `mongodb.standby.plgd.cloud` | Alias for all MongoDB members | The `mongodb.primary.plgd.cloud` and `mongodb.standby.plgd.cloud` are aliases for all members of the primary and standby clusters, respectively. Also, the `mongodb.primary.plgd.cloud` is used for external access to the MongoDB replica set for the standby cluster. From 48cba1e865cd4470c195a67fa60b39f3e9cd962e Mon Sep 17 00:00:00 2001 From: Jozef Kralik Date: Tue, 25 Jun 2024 08:04:46 +0000 Subject: [PATCH 10/13] add link how to configure multiple endpoint to the device --- content/en/docs/tutorials/disaster-recovery-replica-set.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/en/docs/tutorials/disaster-recovery-replica-set.md b/content/en/docs/tutorials/disaster-recovery-replica-set.md index 602e8c5c..62acd043 100644 --- a/content/en/docs/tutorials/disaster-recovery-replica-set.md +++ b/content/en/docs/tutorials/disaster-recovery-replica-set.md @@ -9,7 +9,7 @@ weight: 15 --- -The plgd-hub Helm charts support disaster recovery via a MongoDB replica set, as the source of truth is stored in the MongoDB database. In this tutorial, we have two MicroK8s clusters: primary and standby. Devices need to have configured **device provisioning endpoints** for both clusters' device provisioning services(`coaps+tcp://primary.plgd.cloud:5684` and `coaps+tcp://standby.plgd.cloud:5684`). Each cluster uses three root CA certificates: +The plgd-hub Helm charts support disaster recovery via a MongoDB replica set, as the source of truth is stored in the MongoDB database. In this tutorial, we have two MicroK8s clusters: primary and standby. Devices need to have [configured device provisioning endpoints](/docs/services/device-provisioning-service/client-library/#set-dps-endpoint) for both clusters' device provisioning services(`coaps+tcp://primary.plgd.cloud:5684` and `coaps+tcp://standby.plgd.cloud:5684`). Each cluster uses three root CA certificates: - `external CA certificate pair`: Used for public APIs (CoAP, HTTPS, gRPC) and is the same for both clusters. - `internal CA certificate pair`: Used for plgd services to communicate with each other, MongoDB, and NATs. Each cluster has its own internal CA certificate. From 8af56f87a74022b9a09399a4f2c25e1a3c1c871a Mon Sep 17 00:00:00 2001 From: Jozef Kralik Date: Tue, 25 Jun 2024 08:08:16 +0000 Subject: [PATCH 11/13] fix sentence --- content/en/docs/tutorials/disaster-recovery-replica-set.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/en/docs/tutorials/disaster-recovery-replica-set.md b/content/en/docs/tutorials/disaster-recovery-replica-set.md index 62acd043..cdd9969c 100644 --- a/content/en/docs/tutorials/disaster-recovery-replica-set.md +++ b/content/en/docs/tutorials/disaster-recovery-replica-set.md @@ -624,7 +624,7 @@ Now the job will create the pod and configure the MongoDB replica set. ## Disaster Recovery {{< note >}} -These steps could be used in case of planned maintenance. +These steps can be used in case of planned maintenance. {{< /note >}} ### How to Switch to the Standby Cluster From 86421ae3854f6ae8ad5ee43abe781c554ff55bd3 Mon Sep 17 00:00:00 2001 From: Jozef Kralik Date: Wed, 26 Jun 2024 11:34:45 +0000 Subject: [PATCH 12/13] add mongodb-standby-tool cfg --- .../{oauth-server.md => mock-oauth-server.md} | 0 .../configuration/mongodb-standby-tool.md | 69 +++++++++++++++++++ .../disaster-recovery-replica-set.md | 10 +-- 3 files changed, 74 insertions(+), 5 deletions(-) rename content/en/docs/configuration/{oauth-server.md => mock-oauth-server.md} (100%) create mode 100644 content/en/docs/configuration/mongodb-standby-tool.md diff --git a/content/en/docs/configuration/oauth-server.md b/content/en/docs/configuration/mock-oauth-server.md similarity index 100% rename from content/en/docs/configuration/oauth-server.md rename to content/en/docs/configuration/mock-oauth-server.md diff --git a/content/en/docs/configuration/mongodb-standby-tool.md b/content/en/docs/configuration/mongodb-standby-tool.md new file mode 100644 index 00000000..77e333cc --- /dev/null +++ b/content/en/docs/configuration/mongodb-standby-tool.md @@ -0,0 +1,69 @@ +--- + +title: 'MongoDB Standby Tool' +description: 'MongoDB Standby Tool overview' +date: '2024-06-26' +categories: [configuration, deployment, disaster recovery] +keywords: [configuration, mongodb, disaster recovery] +weight: 11 +--- + +MongoDB Standby Tool is used to reconfigure some of the MongoDB replica set members to be in a hidden and secondary state. For example, you can have a replica set where members are running in multiple regions for disaster recovery purposes. + +## Docker Image + +```bash +docker pull ghcr.io/plgd-dev/hub/mongodb-standby-tool:latest +``` + +## YAML Configuration + +A configuration template is available at [tools/mongodb/standby-tool/config.yaml](https://github.com/plgd-dev/hub/blob/main/tools/mongodb/standby-tool/config.yaml). + +### Logging + +| Property | Type | Description | Default | +|----------|------|-------------|---------| +| `log.level` | string | `Logging enabled from this level.` | `"info"` | +| `log.encoding` | string | `Logging format. Supported values are: "json", "console".` | `"json"` | +| `log.stacktrace.enabled` | bool | `Log stacktrace.` | `false` | +| `log.stacktrace.level` | string | `Stacktrace from this level.` | `"warn"` | +| `log.encoderConfig.timeEncoder` | string | `Time format for logs. Supported values are: "rfc3339nano", "rfc3339".` | `"rfc3339nano"` | + +## Mode + +When the tool is started, it is in standby mode by default. In this mode, all members in the replica set are available, and members in the standby list are set as hidden, without votes, priority, and with a delay. All other members are reconfigured as secondary members. You can change the mode to active by setting the `mode` property to `active`. In this mode, standby members are set as secondary members with votes, priority, and without delay, and other members are set as hidden, without votes, priority, and with a delay. + +| Property | Type | Description | Default | +|----------|------|-------------|---------| +| `mode` | string | `Set the running mode of the tool. Supported modes are "standby" and "active".` | `"standby"` | + +### Replica Set + +| Property | Type | Description | Default | +|----------|------|-------------|---------| +| `replicaSet.forceUpdate` | bool | `Update the replica set configuration with the force flag. More info [here](https://www.mongodb.com/docs/manual/reference/method/rs.reconfig/#std-label-rs-reconfig-method-force).` | `false` | +| `replicaSet.maxReadyWaits` | int | `Set the maximum number of retries for members to be ready.` | `10` | +| `replicaSet.standby.members` | []string | `List of the MongoDB members in the replica set which are used as hidden (mode == "standby") or secondary (mode == "active") members. All other members are reconfigured to the opposite.` | `[]` | +| `replicaSet.standby.delays` | string | `Set the delay for syncing the hidden members with the secondary/primary members.` | `6m` | +| `replicaSet.secondary.priority` | int | `Used to configure the secondary members' priority.` | `10` | +| `replicaSet.secondary.votes` | int | `Set the number of votes for the secondary members.` | `1` | + +### MongoDB + +The tool uses the first member in the standby list (`replicaSet.standby.members`) to connect to MongoDB. + +| Property | Type | Description | Default | +|----------|------|-------------|---------| +| `clients.storage.mongoDB.timeout` | string | `The maximum duration for the entire request operation.` | `"20s"` | +| `clients.storage.mongoDB.tls.enabled` | bool | `If true, use TLS for the connection.` | `false` | +| `clients.storage.mongoDB.tls.caPool` | []string | `File paths to the root certificates in PEM format. The file may contain multiple certificates.` | `[]` | +| `clients.storage.mongoDB.tls.keyFile` | string | `File path to the private key in PEM format.` | `""` | +| `clients.storage.mongoDB.tls.certFile` | string | `File path to the certificate in PEM format.` | `""` | +| `clients.storage.mongoDB.tls.useSystemCAPool` | bool | `If true, use the system certification pool.` | `false` | + +{{< note >}} + +Note that string types related to time (i.e. timeout, idleConnTimeout, expirationTime) are decimal numbers, each with an optional fraction and a unit suffix, such as "300ms", "1.5h", or "2h45m". Valid time units are "ns", "us", "ms", "s", "m", "h". + +{{< /note >}} diff --git a/content/en/docs/tutorials/disaster-recovery-replica-set.md b/content/en/docs/tutorials/disaster-recovery-replica-set.md index cdd9969c..ac960448 100644 --- a/content/en/docs/tutorials/disaster-recovery-replica-set.md +++ b/content/en/docs/tutorials/disaster-recovery-replica-set.md @@ -446,7 +446,7 @@ After some time for the pods to start, you can access the Hub at `https://primar ### Deploy plgd on the Standby Cluster -Deploying plgd to the standby cluster is similar to deploying it to the primary cluster. The differences are that the domain is `standby.plgd.cloud`, different internal and storage certificates are used, the standby flag is set to `true`, NATs is disabled, and MongoDB is configured to use the master DB at `mongodb.primary.plgd.cloud`. Additionally, the `mongodb-standby-tool` job is enabled to configure the MongoDB replica set. +Deploying plgd to the standby cluster is similar to deploying it to the primary cluster. The differences are that the domain is `standby.plgd.cloud`, different internal and storage certificates are used, the standby flag is set to `true`, NATs is disabled, and MongoDB is configured to use the master DB at `mongodb.primary.plgd.cloud`. Additionally, the [mongodb-standby-tool](/docs/configuration/mongodb-standby-tool) job is enabled to configure the MongoDB replica set. ```bash # Set variables @@ -613,7 +613,7 @@ sudo systemctl restart dnsmasq It is important that the `global.standby` flag is set to `true`, which means that plgd pods are not running on the standby cluster. {{< /note >}} -Once the MongoDB pods are running, we need to run the `mongodb-standby-tool` job to configure the MongoDB replica set. This configuration demotes the secondary members to hidden members. +Once the MongoDB pods are running, we need to run the [mongodb-standby-tool](/docs/configuration/mongodb-standby-tool) job to configure the MongoDB replica set. This configuration demotes the secondary members to hidden members. ```bash kubectl -n plgd patch job/$(kubectl -n plgd get jobs | grep mongodb-standby-tool | awk '{print $1}') --type=strategic --patch '{"spec":{"suspend":false}}' @@ -633,7 +633,7 @@ When the primary cluster is down, you need to switch to the standby cluster. #### Promote the Standby Cluster -First, promote the hidden members to secondary members by upgrading the Helm chart with the `mongodb.standbyTool.mode` set to `active`. The active mode reconfigures the MongoDB replica set, promoting hidden members to secondary members and demoting the previous members to hidden. Begin by deleting the `mongodb-standby-tool` job, then upgrade the Helm chart to initiate a new job. +First, promote the hidden members to secondary members by upgrading the Helm chart with the `mongodb.standbyTool.mode` set to `active`. The active mode reconfigures the MongoDB replica set, promoting hidden members to secondary members and demoting the previous members to hidden. Begin by deleting the [mongodb-standby-tool](/docs/configuration/mongodb-standby-tool) job, then upgrade the Helm chart to initiate a new job. ```bash kubectl -n plgd delete job/$(kubectl -n plgd get jobs | grep mongodb-standby-tool | awk '{print $1}') @@ -670,14 +670,14 @@ When the primary cluster is ready for devices, switch back to the primary cluste #### Demote the Standby Cluster -First, promote the primary cluster's MongoDB hidden members to secondary members and demote the standby cluster's MongoDB secondary members to hidden. Upgrade the Helm chart with the `mongodb.standbyTool.mode` set to `standby`. To do this, delete the `mongodb-standby-tool` job and upgrade the Helm chart, which will create a new job. +First, promote the primary cluster's MongoDB hidden members to secondary members and demote the standby cluster's MongoDB secondary members to hidden. Upgrade the Helm chart with the `mongodb.standbyTool.mode` set to `standby`. To do this, delete the [mongodb-standby-tool](/docs/configuration/mongodb-standby-tool) job and upgrade the Helm chart, which will create a new job. ```bash kubectl -n plgd delete job/$(kubectl -n plgd get jobs | grep mongodb-standby-tool | awk '{print $1}') helm upgrade -i -n plgd --create-namespace -f values.yaml --set mongodb.standbyTool.mode=standby hub plgd/plgd-hub ``` -Next, patch the `mongodb-standby-tool` job to resume it and configure the MongoDB replica set. +Next, patch the [mongodb-standby-tool](/docs/configuration/mongodb-standby-tool) job to resume it and configure the MongoDB replica set. ```bash kubectl -n plgd patch job/$(kubectl -n plgd get jobs | grep mongodb-standby-tool | awk '{print $1}') --type=strategic --patch '{"spec":{"suspend":false}}' From 0e2f9498613f37e2b779d475e097a1fcd1c46e48 Mon Sep 17 00:00:00 2001 From: Jozef Kralik Date: Thu, 27 Jun 2024 07:27:34 +0000 Subject: [PATCH 13/13] update doc --- content/en/docs/configuration/mongodb-standby-tool.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/en/docs/configuration/mongodb-standby-tool.md b/content/en/docs/configuration/mongodb-standby-tool.md index 77e333cc..09c54902 100644 --- a/content/en/docs/configuration/mongodb-standby-tool.md +++ b/content/en/docs/configuration/mongodb-standby-tool.md @@ -43,7 +43,7 @@ When the tool is started, it is in standby mode by default. In this mode, all me | Property | Type | Description | Default | |----------|------|-------------|---------| | `replicaSet.forceUpdate` | bool | `Update the replica set configuration with the force flag. More info [here](https://www.mongodb.com/docs/manual/reference/method/rs.reconfig/#std-label-rs-reconfig-method-force).` | `false` | -| `replicaSet.maxReadyWaits` | int | `Set the maximum number of retries for members to be ready.` | `10` | +| `replicaSet.maxWaitsForReady` | int | `Set the maximum number of retries for members to be ready.` | `30` | | `replicaSet.standby.members` | []string | `List of the MongoDB members in the replica set which are used as hidden (mode == "standby") or secondary (mode == "active") members. All other members are reconfigured to the opposite.` | `[]` | | `replicaSet.standby.delays` | string | `Set the delay for syncing the hidden members with the secondary/primary members.` | `6m` | | `replicaSet.secondary.priority` | int | `Used to configure the secondary members' priority.` | `10` |