From 9d73b03d18ed5fffc377d1aea1496b8d8ca6d26b Mon Sep 17 00:00:00 2001 From: Paulo Pires Date: Sun, 17 Dec 2017 12:58:03 +0000 Subject: [PATCH] Update documentation according to latest changes Fixes #7 Fixes #10 Fixes #11 Signed-off-by: Paulo Pires --- README.md | 182 ++++++++++++++++++++++++++++-------------------------- 1 file changed, 93 insertions(+), 89 deletions(-) diff --git a/README.md b/README.md index 4c6c69e..e23b5d7 100644 --- a/README.md +++ b/README.md @@ -1,85 +1,117 @@ # kubernetes-nats-cluster NATS cluster on top of Kubernetes made easy. -[![Docker Repository on Quay](https://quay.io/repository/pires/docker-nats/status "Docker Repository on Quay")](https://quay.io/repository/pires/docker-nats) +**NOTE:** This repository provides a configurable way to deploy secure, available +and scalable NATS clusters. However, [a _smarter_ solution](https://github.com/pires/nats-operator) +in on the way (see [#5](https://github.com/pires/kubernetes-nats-cluster/issues/5)). ## Pre-requisites -* Kubernetes cluster, tested with v1.5.1 on top of: - * [Vagrant + CoreOS](https://github.com/pires/kubernetes-vagrant-coreos-cluster) - * Google Container Engine +* Kubernetes cluster v1.8+ - tested with v1.9.0 on top of [Vagrant + CoreOS](https://github.com/pires/kubernetes-vagrant-coreos-cluster) +* At least 3 nodes available (see [Pod anti-affinity](#pod-anti-affinity)) * `kubectl` configured to access your cluster master API Server -* Optionally, OpenSSL for TLS certificate generation +* `openssl` for TLS certificate generation -## Building the image - -### `gnatsd` (NATS server) +## Deploy -First, one needs to download the version of the `gnatsd` binary that runs on the official Docker image. This is available at https://github.com/nats-io/nats-docker: +We will be deploying a cluster of 3 NATS instances, with the following set-up: +- TLS on +- NATS client credentials: `nats_client_user:nats_client_pwd` +- NATS route/cluster credentials: `nats_route_user:nats_route_pwd` +- Logging: `debug:false`, `trace:true`, `logtime:true` -``` -curl -o artifacts/gnatsd https://raw.githubusercontent.com/nats-io/nats-docker/master/gnatsd -chmod a+x artifacts/gnatsd +First, make sure to change `nats.conf` according to your needs. +Then create a Kubernetes configmap to store it: +```bash +kubectl create configmap nats-config --from-file nats.conf ``` -Alternatively one can build the binary locally: - -``` -go get github.com/nats-io/gnatsd -cd $GOPATH/src/github.com/nats-io/gnatsd -git checkout v1.0.2 -GOARCH=amd64 GOOS=linux CGO_ENABLED=0 go build -v -a -tags netgo -installsuffix netgo -ldflags "-s -w -X github.com/nats-io/gnatsd/version.GITCOMMIT=`git rev-parse --short HEAD`" +Next, we need to generate valid TLS artifacts: +```bash +openssl genrsa -out ca-key.pem 2048 +openssl req -x509 -new -nodes -key ca-key.pem -days 10000 -out ca.pem -subj "/CN=kube-ca" +openssl genrsa -out nats-key.pem 2048 +openssl req -new -key nats-key.pem -out nats.csr -subj "/CN=kube-nats" -config ssl.cnf +openssl x509 -req -in nats.csr -CA ca.pem -CAkey ca-key.pem -CAcreateserial -out nats.pem -days 3650 -extensions v3_req -extfile ssl.cnf ``` -Then one needs to copy the resulting `gnatsd` binary to this repository's `artifacts` directory. - -### Route checker - -Because of issue #2, it was decided to produce an app that makes sure that: - -* there's more than once instance of NATS available in the cluster, and if positive -* at least one route is established - -``` -cd route_checker/ -GOARCH=amd64 GOOS=linux CGO_ENABLED=0 go build -v -a -tags netgo -installsuffix netgo -mv route_checker ../artifacts +Then, it's time to create a couple Kubernetes secrets to store the TLS artifacts: +- `tls-nats-server` for the NATS server TLS setup +- `tls-nats-client` for NATS client apps setup - one will need it to validate the self-signed certificate +used to secure NATS server +```bash +kubectl create secret generic tls-nats-server --from-file nats.pem --from-file nats-key.pem --from-file ca.pem +kubectl create secret generic tls-nats-client --from-file ca.pem +``` + +**ATTENTION:** Both using self-signed certificates and using the same certificates for securing +client and cluster connections is a significant security compromise. But for the sake of showing +how it can be done, I'm fine with doing just that. +In an ideal scenario, there should be: +- One centralized PKI/CA +- One certificate for securing NATS route/cluster connections +- One certificate for securing NATS client connections +- TLS route/cluster authentication should be enforced, so one TLS certificate per route/cluster peer +- TLS client authentication should be enforced, so one TLS certificate per client + +And finally, we deploy NATS: +```bash +kubectl create -f nats.yml +``` + +Logs should be enough to make sure everything is working as expected: +``` +$ kubectl logs -f nats-0 +[1] 2017/12/17 12:38:37.801139 [INF] Starting nats-server version 1.0.4 +[1] 2017/12/17 12:38:37.801449 [INF] Starting http monitor on 0.0.0.0:8222 +[1] 2017/12/17 12:38:37.801580 [INF] Listening for client connections on 0.0.0.0:4242 +[1] 2017/12/17 12:38:37.801772 [INF] TLS required for client connections +[1] 2017/12/17 12:38:37.801778 [INF] Server is ready +[1] 2017/12/17 12:38:37.802078 [INF] Listening for route connections on 0.0.0.0:6222 +[1] 2017/12/17 12:38:38.874497 [TRC] 10.244.1.3:33494 - rid:1 - ->> [CONNECT {"verbose":false,"pedantic":false,"user":"nats_route_user","pass":"nats_route_pwd","tls_required":true,"name":"KGMPnL89We3gFLEjmp8S5J"}] +[1] 2017/12/17 12:38:38.956806 [TRC] 10.244.74.2:46018 - rid:3 - ->> [CONNECT {"verbose":false,"pedantic":false,"user":"nats_route_user","pass":"nats_route_pwd","tls_required":true,"name":"Skc5mx9enWrGPIQhyE7uzR"}] +[1] 2017/12/17 12:38:39.951160 [TRC] 10.244.1.4:46242 - rid:4 - ->> [CONNECT {"verbose":false,"pedantic":false,"user":"nats_route_user","pass":"nats_route_pwd","tls_required":true,"name":"0kaCfF3BU8g92snOe34251"}] +[1] 2017/12/17 12:40:38.956203 [TRC] 10.244.74.2:46018 - rid:3 - <<- [PING] +[1] 2017/12/17 12:40:38.958279 [TRC] 10.244.74.2:46018 - rid:3 - ->> [PING] +[1] 2017/12/17 12:40:38.958300 [TRC] 10.244.74.2:46018 - rid:3 - <<- [PONG] +[1] 2017/12/17 12:40:38.961791 [TRC] 10.244.74.2:46018 - rid:3 - ->> [PONG] +[1] 2017/12/17 12:40:39.951421 [TRC] 10.244.1.4:46242 - rid:4 - <<- [PING] +[1] 2017/12/17 12:40:39.952578 [TRC] 10.244.1.4:46242 - rid:4 - ->> [PONG] +[1] 2017/12/17 12:40:39.952594 [TRC] 10.244.1.4:46242 - rid:4 - ->> [PING] +[1] 2017/12/17 12:40:39.952598 [TRC] 10.244.1.4:46242 - rid:4 - <<- [PONG] ``` -### Kubernetes Deployment +### Route checker -One must change `deployment.yaml` accordingly, commit everything and proceed to push a new tag that will trigger an automatic build: -``` -git tag 1.0.2 -git push -git push --tags -``` +Because of issue #2, I decided to produce an app that makes sure that: -## Deploy +* there's more than one instance of NATS available in the cluster, and, if positive, +* one route to each of the other agents is setup. -``` -kubectl create -f svc.yaml -kubectl create -f deployment.yaml -``` +However ## Scale +**WARNING:** Due to the [Pod anti-affinity](#pod-anti-affinity) rule, for scaling up to _n_ NATS +instances, one needs _n_ available Kubernetes nodes. + ``` -kubectl scale deployment nats --replicas 3 +kubectl scale statefulsets nats --replicas 5 ``` Did it work? ``` -$ kubectl get svc,pods -NAME CLUSTER_IP EXTERNAL_IP PORT(S) SELECTOR AGE -kubernetes 10.100.0.1 443/TCP 58m -nats None 4222/TCP,6222/TCP,8222/TCP component=nats 23m - -NAME READY STATUS RESTARTS AGE -nats-651427393-5zmb7 1/1 Running 0 23m -nats-651427393-dn3rk 1/1 Running 0 21m -nats-651427393-gmc5n 1/1 Running 0 21m +NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE +svc/kubernetes ClusterIP 10.100.0.1 443/TCP 1h +svc/nats ClusterIP None 4222/TCP,6222/TCP,8222/TCP 4m + +NAME READY STATUS RESTARTS AGE +po/nats-0 1/1 Running 0 4m +po/nats-1 1/1 Running 0 4m +po/nats-2 1/1 Running 0 4m +po/nats-3 1/1 Running 0 7s +po/nats-4 1/1 Running 0 6s ``` ## Access the service @@ -93,42 +125,14 @@ Just point your client apps to: nats:4222 ``` -## TLS - -First, we need to generate a valid TLS certificate: -``` -openssl genrsa -out ca-key.pem 2048 -openssl req -x509 -new -nodes -key ca-key.pem -days 10000 -out ca.pem -subj "/CN=kube-ca" -openssl genrsa -out nats-key.pem 2048 -openssl req -new -key nats-key.pem -out nats.csr -subj "/CN=kube-nats" -config ssl.cnf -openssl x509 -req -in nats.csr -CA ca.pem -CAkey ca-key.pem -CAcreateserial -out nats.pem -days 3650 -extensions v3_req -extfile ssl.cnf -``` - -Now, it's time to create a Kubernetes secret to store the certificate files: -``` -kubectl create secret generic tls-nats --from-file nats.pem --from-file nats-key.pem -``` - -Finally, deploy a secured NATS cluster: -``` -kubectl create -f deployment-tls.yaml -kubectl scale deployment nats --replicas 3 -``` - -## Other configurations - -One can configure `gnatsd` through environment variables, to be set on pod descriptors, as listed below: - -**SVC** - the headless service name used to discover NATS instances. Defaults to `nats`. - -**USER** - the username to authenticate with. Defaults to empty. - -**PASS** - the password to authenticate with. Defaults to empty. + -**TLS** - whether to enable TLS. Defaults to `false`. +## Pod anti-affinity -**TLSCERT** - the certificate to use for TLS. Defaults to empty. -**TLSKEY** - the certificate key to use for TLS. Defaults to empty. +One of the main advantages of running NATS on top of Kubernetes is how resilient the cluster becomes, +particularly during node restarts. However if all NATS pods are scheduled onto the same node(s), this +advantage decreases significantly and may even result in service downtime. -**EXTRA** - extra arguments to pass to `gnatsd`, e.g. `-DV`. Defaults to empty. +It is then **highly recommended** that one adopts [pod anti-affinity](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#inter-pod-affinity-and-anti-affinity-beta-feature) +in order to increase availability. This is enabled by default (see `nats.yml`). \ No newline at end of file