Merge branch 'main' into lukas-docs-search-algolia

plgd-dev · Mar 12, 2024 · 45e35a8 · 45e35a8
2 parents f99db12 + ae4d0c9
commit 45e35a8
Show file tree

Hide file tree

Showing 10 changed files with 1,158 additions and 4 deletions.
diff --git a/content/en/docs/deployment/hub/advanced.md b/content/en/docs/deployment/hub/advanced.md
@@ -96,6 +96,99 @@ global:
 ...
 ```
 
+## Configuring Custom Certificate Authority for PLGD Hub Services
+
+PLGD utilizes four types of service certificates:
+
+- **External Services:** (e.g., gRPC Gateway, HTTP Gateway, Certificate Authority) are exposed to the internet.
+- **Internal Services:** (e.g., MongoDB, NATs, Resource Directory, etc.) communicate internally.
+- **CoAP Gateway:** Communicates with devices. The Root CA of the certificate must be the same as the Root CA used by the Certificate Authority Signer.
+- **Certificate Authority Signer:** Used for signing certificates for devices. The Root CA used to sign the certificate is propagated to devices to trust the CoAP Gateway certificate.
+
+In the following steps, it uses one issuer for all service types. For your specific needs, you can separate each type of service by using a different issuer, such as Let's Encrypt for external services. To customize the Issuer for PLGD Hub services, follow these steps:
+
+### Add Custom CA to Kubernetes Secret
+
+Firstly, add the custom CA with the key pair to the Kubernetes secret. For a Cluster Issuer, include it in the `cert-manager` namespace.
+
+```yaml
+apiVersion: v1
+kind: Secret
+metadata:
+  name: plgd-ca-secret
+  namespace: cert-manager # or namespace in the case of issuer
+type: kubernetes.io/tls
+data:
+  ca.crt: <RootCA.crt encoded in base64> # Root CA
+  tls.crt: <CA.crt encoded in base64> # Root CA or Intermediate CA
+  tls.key: <CA.key encoded in base64> # Associated private key
+```
+
+Apply the secret to the Kubernetes cluster:
+
+```sh
+kubectl apply -f plgd-ca-secret.yaml
+```
+
+### Configure Issuer to Use Custom CA
+
+Next, configure the issuer to use the custom CA:
+
+```yaml
+apiVersion: cert-manager.io/v1
+kind: ClusterIssuer # or Issuer for namespace issuer
+metadata:
+  name: plgd-ca-issuer
+spec:
+  ca:
+    secretName: plgd-ca-secret
+```
+
+Apply the issuer configuration to the Kubernetes cluster:
+
+```sh
+kubectl apply -f plgd-ca-issuer.yaml
+```
+
+### Configure PLGD Hub Helm Chart
+
+Finally, configure the PLGD Hub Helm chart to use the custom CA. Adjust the certificate duration according to your needs:
+
+```yaml
+certmanager:
+  external: # external services
+    cert:
+      duration: 8760h # 1 year for external services
+    issuer:
+      kind: "ClusterIssuer" # or "Issuer"
+      name: "plgd-ca-issuer"
+      group: cert-manager.io
+  internal: # internal services
+    cert:
+      duration: 8760h # 1 year for internal services
+    issuer:
+      kind: "ClusterIssuer" # or "Issuer"
+      name: "plgd-ca-issuer"
+      group: cert-manager.io
+  coap: # CoAP Gateway
+    cert:
+      duration: 8760h # 1 year for CoAP Gateway
+    issuer:
+      kind: "ClusterIssuer" # or "Issuer"
+      name: "plgd-ca-issuer"
+      group: cert-manager.io
+  default: # used when internal, external, or coap is not specified
+    cert:
+      duration: 876000h # 100 years for intermediate CA used to sign device certificates
+    ca: # CA to signing services(in default) and device certificates
+      issuerRef: 
+        kind: "ClusterIssuer" # or "Issuer"
+        name: "plgd-ca-issuer"
+        group: cert-manager.io
+```
+
+Apply the Helm values configuration to the Kubernetes cluster.
+
 ## Troubleshooting
 
 ### Issue: Unable to fetch data from the ./well-known endpoint in browser

diff --git a/content/en/docs/features/control-plane/entity-tag.md b/content/en/docs/features/control-plane/entity-tag.md
@@ -2,7 +2,7 @@
 title: 'Entity-tag (ETAG)'
 description: 'What is ETAG?'
 docsOthersDisplay: true
-date: '2021-05-13'
+date: '2024-01-29'
 categories: [features]
 keywords: [twin, twin, cache, history]
 weight: 31
@@ -16,6 +16,12 @@ For more information about ETAG, refer to the [RFC7252 Section 5.10.6](https://d
 
 ## IoTivity-lite
 
+{{< note >}}
+
+To enable the ETAG feature in IoTivity-lite, use the CMake option `-DOC_ETAG_ENABLED=ON`.
+
+{{< /note >}}
+
 ### Definitions
 
 - **ETAG**: An ETAG is an 8-byte opaque value that represents the state of a resource. It is generated by the device and used to detect changes in resources.
@@ -178,6 +184,12 @@ oc_main_shutdown();
 
 In order to monitor resource changes and determine if a resource has been modified on the device, the CoAP gateway utilizes the Entity Tag (ETAG) mechanism.
 
+{{< note >}}
+
+To enable the use of ETAGs in the plgd CoAP gateway, activate it by setting the value in the Helm chart: `.coapgateway.deviceTwin.useETags: true`.
+
+{{< /note >}}
+
 For **Batch Observation**, the ETAG is associated with the overall state of resources. Prior to initiating resource observation, the CoAP gateway retrieves the latest ETAG for numbers of resource(`N-latest ETAGs`) among all device resources from the Hub Database. When initiating the resource observation, the CoAP gateway sends the ETAGs to the device with the query `incChanges`. If the received highest ETAG matches the highest ETAG among the device resources, the device responds with a code `VALID`. However, if the received ETAG does not match, the device responds with a code `CONTENT` and includes the current ETAG. Consequently, when a resource changes, the device sends the updated ETAG back to the CoAP gateway via a notification. The CoAP gateway transmits the ETAGs together with the Content by using the `NotifyResourceChanged` method to the resource-aggregate. This command is then converted into a `ResourceChanged` event, which is saved in a database and distributed through the event bus.
 
 The special query to the database efficiently retrieves the N-latest ETAGs from all device resources without loading the complete set of data. This optimized query solely focuses on performance and retrieves only the required ETAGs, excluding any additional information.

diff --git a/content/en/docs/features/monitoring-and-diagnostics/disaster-recovery.md b/content/en/docs/features/monitoring-and-diagnostics/disaster-recovery.md
@@ -1,7 +1,7 @@
 ---
 title: 'Disaster Recovery'
 description: 'Getting back online and in-sync'
-date: '2021-06-16'
+date: '2024-02-06'
 categories: [features]
 keywords: ['disaster recovery', 'data reconciliation' , 'jetstream']
 weight: 70
@@ -51,3 +51,69 @@ Having JetStream as an EventBus gives you the possibility to read stored events
 {{< warning >}}
 plgd hub doesn't guarantee delivery of all events to the EventBus. It guarantees that all events are stored in the EventStore in the correct order. In case there is a JetStream / NATS failure and plgd hub was not able to publish some events, they won't be published again and your service has to anyway fallback to reconciliation using plgd gRPC Gateway.
 {{< /warning >}}
+
+## Data Management and Failover Strategies
+
+The plgd hub is a stateful event-driven system, meaning that data is stored in the EventStore, which serves as the authoritative source of truth and is implemented using MongoDB. In this section, we will describe how to back up and restore data in a scenario where two clusters are running in different locations (e.g., East US / West US). The first cluster is utilized for normal operations, while the secondary cluster serves as a backup for disaster recovery.
+
+### Backup Databases
+
+To back up the database, two approaches can be used:
+
+* **Passive Backup**
+
+  ![passive-backup](/docs/features/monitoring-and-diagnostics/static/disaster-recovery-passive-backup.drawio.svg)
+
+  The database is regularly backed up to a different location and can be used in case of failure. Although this approach is simple and requires fewer resources, the data may become outdated, and the restoration process may take some time. For MongoDB, utilize the `mongodump` tool to create a export of the database contents, store it securely, and use it in case of failure. Regular backups are essential to keep the data up-to-date. For more details on this approach, refer to the [MongoDB documentation](https://www.mongodb.com/docs/database-tools/mongodump/).
+
+* **Active Backup**
+
+  ![active-backup](/docs/features/monitoring-and-diagnostics/static/disaster-recovery-active-backup.drawio.svg)
+
+  The database actively synchronizes data between two clusters in realtime. This approach is more complex and resource-intensive but is more reliable for disaster recovery. For MongoDB, use `cluster-to-cluster-sync` feature to synchronize data between two MongoDB clusters. For more details on this approach, refer to the [MongoDB documentation](https://www.mongodb.com/docs/cluster-to-cluster-sync/current/connecting/onprem-to-onprem/).
+
+{{< warning >}}
+
+**Using a backup from EventBus (JetStream) is not recommended in a restored cluster**, as it is not the source of truth for the plgd hub. This could result in data inconsistency because creating backups is not synchronized for both EventStore and EventBus (JetStream). The EventBus can contain events that are not stored in the EventStore, and vice versa.
+
+{{< /warning >}}
+
+### OAuth2 Server
+
+Devices connected to the hub have access tokens used to authorize device access. The access tokens are generated by the OAuth2 server, and its database needs to be backed up regularly. In case of OAuth2 server failure, devices won't connect to the hub. To prevent this, regularly back up the OAuth2 server database as described in the [Backup Databases](#backup-databases) section.
+
+### Certificates
+
+![certificates](/docs/features/monitoring-and-diagnostics/static/disaster-recovery-certificates.drawio.svg)
+
+The CoAP-Gateway and Device Provisioning Service depend on certificates validated by devices, and these certificates must be signed in the certificates chain by the same Root CAs. It is crucial that the Root CAs used for the primary and secondary clusters are identical. Additionally, the hub ID configured through the [plgd helm chart](https://github.com/plgd-dev/hub/blob/4c4861a4bc483ba4080a1d448063da392eff4026/charts/plgd-hub/values.yaml#L6) must remain consistent.
+
+### Devices
+
+If a primary cluster failure occurs and you cannot dynamically modify the endpoint on the devices, they will be unable to establish a connection with the hub. Devices are set up with a single endpoint to link with either the CoAP-Gateway or the Device Provisioning Service, which may include an IP address or DNS address. To guarantee connectivity to the secondary cluster, adopt one of the provided options:
+
+* **DNS Address as endpoint**
+
+  In case of primary cluster failure, update the DNS record on the DNS server. It is recommended to set the time to live (TTL) of the DNS record to a low value, e.g., 30 minutes.
+
+* **IP Address as endpoint**
+
+  ![load-balancer](/docs/features/monitoring-and-diagnostics/static/disaster-recovery-load-balancer.drawio.svg)
+
+  Changing the IP address could be challenging in case of primary cluster failure, as the public IP address is often assigned to the Internet Service Provider (ISP). However, using an IP load balancer near devices allows changing the IP address of the load balancer to the secondary cluster. For this, you can use HAProxy, which supports layer 4 load balancing. For more information, refer to the [HAProxy documentation](https://www.haproxy.com/documentation/haproxy-configuration-tutorials/load-balancing/tcp/) and [Failover & Worst Case Management With HAProxy](https://www.haproxy.com/blog/failover-and-worst-case-management-with-haproxy).
+
+* **Update Device Provisioning Service endpoint**
+
+  Under these circumstances, you have the option to update the DPS endpoint to the secondary cluster by utilizing the DHCP server to supply the devices with the updated endpoint. The device retrieves a new configuration from the DPS service, obtaining updated:
+
+  * Time(optional)
+  * Owner
+  * Credentials - Identity certificate, root CA certificate and Pre-shared key(optional)
+  * Access control lists (ACLs)
+  * Cloud configuration - Authorization code, Hub ID, Hub URL, etc.
+
+  Subsequently, the module connects to the cloud, with the first operation being to sign up for self-registration.
+
+  **From the Hub perspective:**
+
+  The Hub detects that the module has already been registered (from the restored database) because the DeviceID and owner haven't changed, indicating no factory reset occurred. Consequently, the device events will continue from the restored state. If your application relies on event versions, please **be mindful that the version may be in the past**, depending on when the backup was performed.
diff --git a/content/en/docs/features/monitoring-and-diagnostics/open-telemetry-tracing.md b/content/en/docs/features/monitoring-and-diagnostics/open-telemetry-tracing.md
@@ -17,7 +17,7 @@ This example shows open telemetry tracing in action:
 
 ## plgd & Open Telemetry
 
-The plgd hub services emit telemetry to collectors, secured using TLS and supporting otlp encoding. The open telemetry integration can be enabled globally for each in the [plgd hub helm chart](https://plgd.dev/deployment/k8s/#register-plgd-helm-chart-registry). Read further for more information on how to enable open telemetry in plgd hub helm chart.
+The plgd hub services emit telemetry to collectors, secured using TLS and supporting otlp encoding. The open telemetry integration can be enabled globally for each in the [plgd hub helm chart](/docs/deployment/#register-plgd-helm-chart-registry). Read further for more information on how to enable open telemetry in plgd hub helm chart.
 
 > The request content is included the gRPC as well as CoAP Gateway spans. As the HTTP Gateway is the proxy of the gRPC Gateway, the request content can be found in the gRPC Gateway spans.