Skip to content

Latest commit

 

History

History
612 lines (441 loc) · 29.3 KB

README.md

File metadata and controls

612 lines (441 loc) · 29.3 KB

Connect Agent to Cloud

You can securely connect a Netdata Agent, running on a distributed node, to Netdata Cloud. A Space's administrator creates a claiming token, which is used to add an Agent to their Space via the Agent-Cloud link (ACLK).

Are you just starting out with Netdata Cloud? See our get started with Cloud guide for a walkthrough of the process and simplified instructions.

When connecting an agent (also referred to as a node) to Netdata Cloud, you must complete a verification process that proves you have some level of authorization to manage the node itself. This verification is a security feature that helps prevent unauthorized users from seeing the data on your node.

Only the administrators of a Space in Netdata Cloud can view the claiming token and accompanying script generated by Netdata Cloud.

The connection process ensures no third party can add your node, and then view your node's metrics, in a Cloud account, Space, or War Room that you did not authorize.

By connecting a node, you opt-in to sending data from your Agent to Netdata Cloud via the ACLK. This data is encrypted by TLS while it is in transit. We use the RSA keypair created during the connection process to authenticate the identity of the Netdata Agent when it connects to the Cloud. While the data does flow through Netdata Cloud servers on its way from Agents to the browser, we do not store or log it.

You can connect a node during the Netdata Cloud onboarding process, or after you created a Space by clicking on Connect Nodes in the Spaces management area.

There are two important notes regarding connecting nodes:

  • You can only connect any given node in a single Space. You can, however, add that connected node to multiple War Rooms within that one Space.
  • You must repeat the connection process on every node you want to add to Netdata Cloud.

How to connect a node

There will be three main flows from where you might want to connect a node to Netdata Cloud.

  • when you are on an War Room and you want to connect your first node
  • when you are at the Manage Space area and you select Connect Nodes to connect a node, coming from Manage Space or Manage War Room
  • when you are on the Nodes view page and want to connect a node - this process falls into the Manage Space flow

Please note that only the administrators of a Space in Netdata Cloud can view the claiming token and accompanying script, generated by Netdata Cloud, to trigger the connection process.

Empty War Room

Either at your first sign in or following ones, when you enter Netdata Cloud and are at a War Room that doesn’t have any node added to it, you will be able to:

  • connect a new node to Netdata Cloud and add it to the War Room
  • add a previously connected node to the War Room

If your case is to connect a new node and add it to the War Room, you will need to tell us what environment the node is running on (Linux, Docker, macOS, Kubernetes) and then we will provide you with a script to initiate the connection process. You just will need to copy and paste it into your node's terminal. See one of the following sections depending on your case:

Repeat this process with every node you want to add to Netdata Cloud during onboarding. You can also add more nodes once you've finished onboarding.

Manage Space or War Room

To connect a node, select which War Rooms you want to add this node to with the dropdown, then copy and paste the script given by Netdata Cloud into your node's terminal.

When coming from Nodes view page the room parameter is already defined to current War Room.

Connect an agent running in Linux

If you want to connect a node that is running on a Linux environment, the script that will be provided to you by Netdata Cloud is the kickstart which will install the Netdata Agent on your node, if it isn't already installed, and connect the node to Netdata Cloud. It should be similar to:

wget -O /tmp/netdata-kickstart.sh https://my-netdata.io/kickstart.sh && sh /tmp/netdata-kickstart.sh --claim-token TOKEN --claim-rooms ROOM1,ROOM2 --claim-url https://app.netdata.cloud

The script should return Agent was successfully claimed.. If the connecting to Netdata Cloud process returns errors, or if you don't see the node in your Space after 60 seconds, see the troubleshooting information.

Please note that to run it you will either need to have root privileges or run it with the user that is running the agent, more details on the Connect an agent without root privileges section.

For more details on what are the extra parameters claim-token, claim-rooms and claim-url please refer to Connect node to Netdata Cloud during installation.

Connect an agent without root privileges

If you don't want to run the installation script to connect your nodes to Netdata Cloud with root privileges, you can discover which user is running the Agent, switch to that user, and run the script.

Use grep to search your netdata.conf file, which is typically located at /etc/netdata/netdata.conf, for the run as user setting. For example: To connect a node, select which War Rooms you want to add this node to with the dropdown, then copy and paste the script given by Netdata Cloud into your node's terminal.

grep "run as user" /etc/netdata/netdata.conf 
    # run as user = netdata

The default user is netdata. Yours may be different, so pay attention to the output from grep. Switch to that user and run the script.

wget -O /tmp/netdata-kickstart.sh https://my-netdata.io/kickstart.sh && sh /tmp/netdata-kickstart.sh --claim-token TOKEN --claim-rooms ROOM1,ROOM2 --claim-url https://app.netdata.cloud

Connect an agent running in Docker

To connect an instance of the Netdata Agent running inside of a Docker container, it is recommended that you follow the instructions and use the commands provided either in the Nodes tab of an empty War Room on Netdata Cloud or in the shelf that appears when you click Connect Nodes and select Docker.

However, users can also claim a new node by claiming environment variables in the container to have it automatically connected on startup or restart.

For the connection process to work, the contents of /var/lib/netdata must be preserved across container restarts using a persistent volume. See our recommended docker run and Docker Compose examples for details.

Known issues on older hosts with seccomp enabled

The nodes running on the following hosts cannot be claimed:

  • libseccomp version less than v2.3.3.
  • Docker version less than v18.04.0-ce.
  • The kernel is configured with CONFIG_SECCOMP enabled.

To check if your kernel supports seccomp:

# grep CONFIG_SECCOMP= /boot/config-$(uname -r) 2>/dev/null || zgrep CONFIG_SECCOMP  /proc/config.gz 2>/dev/null
CONFIG_SECCOMP=y

To resolve the issue, do one of the following actions:

  • Update to a newer version of Docker and libseccomp (recommended).
  • Create a custom profile and pass it for the container.
  • Run without the default seccomp profile (unsafe, not recommended).
See how to create a custom profile
  1. Download the moby default seccomp profile and change defaultAction to SCMP_ACT_TRACE on line 2.

    sudo wget https://raw.githubusercontent.com/moby/moby/master/profiles/seccomp/default.json -O /etc/docker/seccomp.json
    sudo sed -i '2s/SCMP_ACT_ERRNO/SCMP_ACT_TRACE/' /etc/docker/seccomp.json
  2. Specify the new policy for the container explicitly.

    • When using docker run:
    docker run -d --name=netdata \
      --security-opt=seccomp=/etc/docker/seccomp.json \
      ...
    • When using docker-compose:

    ⚠️ The security_opt option is ignored when deploying a stack in swarm mode.

    version: '3'
    services:
      netdata:
        security_opt:
          - seccomp:/etc/docker/seccomp.json
        ...
    • When using docker stack deploy:

    Change the default profile globally by adding --seccomp-profile=/etc/docker/seccomp.json to the options passed to dockerd on startup.

Using environment variables

The Netdata Docker container looks for the following environment variables on startup:

  • NETDATA_CLAIM_TOKEN
  • NETDATA_CLAIM_URL
  • NETDATA_CLAIM_ROOMS
  • NETDATA_CLAIM_PROXY

If the token and URL are specified in their corresponding variables and the container is not already connected, it will use these values to attempt to connect the container, automatically adding the node to the specified War Rooms. If a proxy is specified, it will be used for the connection process and for connecting to Netdata Cloud.

These variables can be specified using any mechanism supported by your container tooling for setting environment variables inside containers.

When using the docker run command, if you have an agent container already running, it is important to know that there will be a short period of downtime. This is due to the process of recreating the new agent container.

The command to connect a new node to Netdata Cloud is:

docker run -d --name=netdata \
  -p 19999:19999 \
  -v netdataconfig:/etc/netdata \
  -v netdatalib:/var/lib/netdata \
  -v netdatacache:/var/cache/netdata \
  -v /etc/passwd:/host/etc/passwd:ro \
  -v /etc/group:/host/etc/group:ro \
  -v /proc:/host/proc:ro \
  -v /sys:/host/sys:ro \
  -v /etc/os-release:/host/etc/os-release:ro \
  --restart unless-stopped \
  --cap-add SYS_PTRACE \
  --security-opt apparmor=unconfined \
  -e NETDATA_CLAIM_TOKEN=TOKEN \
  -e NETDATA_CLAIM_URL="https://app.netdata.cloud" \
  -e NETDATA_CLAIM_ROOMS=ROOM1,ROOM2 \
  -e NETDATA_CLAIM_PROXY=PROXY \
 netdata/netdata

Note: This command is suggested for connecting a new container. Using this command for an existing container recreates the container, though data and configuration of the old container may be preserved. If you are claiming an existing container that can not be recreated, you can add the container by going to Netdata Cloud, clicking the Nodes tab, clicking Connect Nodes, selecting Docker, and following the instructions and commands provided or by following the instructions in an empty War Room.

The output that would be seen from the connection process when using other methods will be present in the container logs.

Using the environment variables like this to handle the connection process is the preferred method of connecting Docker containers as it works in the widest variety of situations and simplifies configuration management.

Using Docker compose

If you use docker compose, you can copy the config provided by Netdata Cloud, which should be same as the one below:

version: '3'
services:
  netdata:
    image: netdata/netdata
    container_name: netdata
  hostname: example.com # set to fqdn of host
  ports:
    - 19999:19999
  restart: unless-stopped
  cap_add:
    - SYS_PTRACE
  security_opt:
    - apparmor:unconfined
  volumes:
    - netdataconfig:/etc/netdata
    - netdatalib:/var/lib/netdata
    - netdatacache:/var/cache/netdata
    - /etc/passwd:/host/etc/passwd:ro
    - /etc/group:/host/etc/group:ro
    - /proc:/host/proc:ro
    - /sys:/host/sys:ro
    - /etc/os-release:/host/etc/os-release:ro
  environment:
    - NETDATA_CLAIM_TOKEN=TOKEN
    - NETDATA_CLAIM_URL="https://app.netdata.cloud"
    - NETDATA_CLAIM_ROOMS=ROOM1,ROOM2

volumes:
  netdataconfig:
  netdatalib:
  netdatacache:

Then run the following command in the same directory as the docker-compose.yml file to start the container.

docker-compose up -d

Using docker exec

Connect a running Netdata Agent container, where you don't want to recreate the existing container, append the script offered by Netdata Cloud to a docker exec ... command, replacing netdata with the name of your running container:

docker exec -it netdata netdata-claim.sh -token=TOKEN -rooms=ROOM1,ROOM2 -url=https://app.netdata.cloud

The values for ROOM1,ROOM2 can be found by by going to Netdata Cloud, clicking the Nodes tab, clicking Connect Nodes, selecting Docker, and copying the rooms= value in the command provided.

The script should return Agent was successfully claimed.. If the connection process returns errors, or if you don't see the node in your Space after 60 seconds, see the troubleshooting information.

Connect an agent running in macOS

To connect a node that is running on a macOS environment the script that will be provided to you by Netdata Cloud is the kickstart which will install the Netdata Agent on your node, if it isn't already installed, and connect the node to Netdata Cloud. It should be similar to:

curl https://my-netdata.io/kickstart.sh > /tmp/netdata-kickstart.sh && sh /tmp/netdata-kickstart.sh --install /usr/local/ --claim-token TOKEN --claim-rooms ROOM1,ROOM2 --claim-url https://app.netdata.cloud

The script should return Agent was successfully claimed.. If the connecting to Netdata Cloud process returns errors, or if you don't see the node in your Space after 60 seconds, see the troubleshooting information.

Connect a Kubernetes cluster's parent Netdata pod

Read our Kubernetes installation for details on connecting a parent Netdata pod.

Connect through a proxy

A Space's administrator can connect a node through HTTP(S) proxy.

You should first configure the proxy in the [cloud] section of netdata.conf. The proxy settings you specify here will also be used to tunnel the ACLK. The default proxy setting is none.

[cloud]
    proxy = none

The proxy setting can take one of the following values:

  • none: Do not use a proxy, even if the system configured otherwise.
  • env: Try to read proxy settings from set environment variables http_proxy.
  • http://[user:pass@]host:ip: The ACLK and connection process will use the specified HTTP(S) proxy.

For example, a HTTP proxy setting may look like the following:

[cloud]
    proxy = http://203.0.113.0:1080       # With an IP address
    proxy = http://proxy.example.com:1080 # With a URL

You can now move on to connecting. When you connect with the kickstart script, add the --claim-proxy= parameter and append the same proxy setting you added to netdata.conf.

wget -O /tmp/netdata-kickstart.sh https://my-netdata.io/kickstart.sh && sh /tmp/netdata-kickstart.sh --claim-token TOKEN --claim-rooms ROOM1,ROOM2 --claim-url https://app.netdata.cloud --claim-proxy http://[user:pass@]host:ip

Hit Enter. The script should return Agent was successfully claimed.. If the connecting to Netdata Cloud process returns errors, or if you don't see the node in your Space after 60 seconds, see the troubleshooting information.

Troubleshooting

If you're having trouble connecting a node, this may be because the ACLK cannot connect to Cloud.

With the Netdata Agent running, visit http://NODE:19999/api/v1/info in your browser, replacing NODE with the IP address or hostname of your Agent. The returned JSON contains four keys that will be helpful to diagnose any issues you might be having with the ACLK or connection process.

	"cloud-enabled"
	"cloud-available"
	"agent-claimed"
	"aclk-available"

On Netdata agent version 1.32 (netdata -v to find your version) and newer, the netdata -W aclk-state command can be used to get some diagnostic information about ACLK. Sample output:

ACLK Available: Yes
ACLK Implementation: Next Generation
New Cloud Protocol Support: Yes
Claimed: Yes
Claimed Id: 53aa76c2-8af5-448f-849a-b16872cc4ba1
Online: Yes
Used Cloud Protocol: New

Use these keys and the information below to troubleshoot the ACLK.

kickstart: unsupported Netdata installation

If you run the kickstart script and get the following error Existing install appears to be handled manually or through the system package manager. you most probably installed Netdata using an unsupported package.

If you are using an unsupported package, such as a third-party .deb/.rpm package provided by your distribution, please remove that package and reinstall using our recommended kickstart script.

kickstart: Failed to write new machine GUID

If you run the kickstart script but don't have privileges required for the actions done on the connecting to Netdata Cloud process you will get the following error:

Failed to write new machine GUID. Please make sure you have rights to write to /var/lib/netdata/registry/netdata.public.unique.id.

For a successful execution you will need to run the script with root privileges or run it with the user that is running the agent, more details on the Connect an agent without root privileges section.

bash: netdata-claim.sh: command not found

If you run the claiming script and see a command not found error, you either installed Netdata in a non-standard location or are using an unsupported package. If you installed Netdata in a non-standard path using the --install option, you need to update your $PATH or run netdata-claim.sh using the full path. For example, if you installed Netdata to /opt/netdata, use /opt/netdata/bin/netdata-claim.sh to run the claiming script.

If you are using an unsupported package, such as a third-party .deb/.rpm package provided by your distribution, please remove that package and reinstall using our recommended kickstart script.

Connecting on older distributions (Ubuntu 14.04, Debian 8, CentOS 6)

If you're running an older Linux distribution or one that has reached EOL, such as Ubuntu 14.04 LTS, Debian 8, or CentOS 6, your Agent may not be able to securely connect to Netdata Cloud due to an outdated version of OpenSSL. These old versions of OpenSSL cannot perform hostname validation, which helps securely encrypt SSL connections.

We recommend you reinstall Netdata with a static build, which uses an up-to-date version of OpenSSL with hostname validation enabled.

If you choose to continue using the outdated version of OpenSSL, your node will still connect to Netdata Cloud, albeit with hostname verification disabled. Without verification, your Netdata Cloud connection could be vulnerable to man-in-the-middle attacks.

cloud-enabled is false

If cloud-enabled is false, you probably ran the installer with --disable-cloud option.

Additionally, check that the enabled setting in var/lib/netdata/cloud.d/cloud.conf is set to true:

[global]
    enabled = true

To fix this issue, reinstall Netdata using your preferred method and do not add the --disable-cloud option.

cloud-available is false / ACLK Available: No

If cloud-available is false after you verified Cloud is enabled in the previous step, the most likely issue is that Cloud features failed to build during installation.

If Cloud features fail to build, the installer continues and finishes the process without Cloud functionality as opposed to failing the installation altogether. We do this to ensure the Agent will always finish installing.

If you can't see an explicit error in the installer's output, you can run the installer with the --require-cloud option. This option causes the installation to fail if Cloud functionality can't be built and enabled, and the installer's output should give you more error details.

You may see one of the following error messages during installation:

  • Failed to build libmosquitto. The install process will continue, but you will not be able to connect this node to Netdata Cloud.
  • Unable to fetch sources for libmosquitto. The install process will continue, but you will not be able to connect this node to Netdata Cloud.
  • Failed to build libwebsockets. The install process will continue, but you may not be able to connect this node to Netdata Cloud.
  • Unable to fetch sources for libwebsockets. The install process will continue, but you may not be able to connect this node to Netdata Cloud.
  • Could not find cmake, which is required to build libwebsockets. The install process will continue, but you may not be able to connect this node to Netdata Cloud.
  • Could not find cmake, which is required to build JSON-C. The install process will continue, but Netdata Cloud support will be disabled.
  • Failed to build JSON-C. Netdata Cloud support will be disabled.
  • Unable to fetch sources for JSON-C. Netdata Cloud support will be disabled.

One common cause of the installer failing to build Cloud features is not having one of the following dependencies on your system: cmake, json-c and OpenSSL, including corresponding devel packages.

You can also look for error messages in /var/log/netdata/error.log. Try one of the following two commands to search for ACLK-related errors.

less /var/log/netdata/error.log
grep -i ACLK /var/log/netdata/error.log

If the installer's output does not help you enable Cloud features, contact us by creating an issue on GitHub with details about your system and relevant output from error.log.

agent-claimed is false / Claimed: No

You must connect your node.

aclk-available is false / Online: No

If aclk-available is false and all other keys are true, your Agent is having trouble connecting to the Cloud through the ACLK. Please check your system's firewall.

If your Agent needs to use a proxy to access the internet, you must set up a proxy for connecting.

If you are certain firewall and proxy settings are not the issue, you should consult the Agent's error.log at /var/log/netdata/error.log and contact us by creating an issue on GitHub with details about your system and relevant output from error.log.

Remove and reconnect a node

To remove a node from your Space in Netdata Cloud, delete the cloud.d/ directory in your Netdata library directory.

cd /var/lib/netdata   # Replace with your Netdata library directory, if not /var/lib/netdata/
sudo rm -rf cloud.d/

This node no longer has access to the credentials it was used when connecting to Netdata Cloud via the ACLK. You will still be able to see this node in your War Rooms in an unreachable state.

If you want to reconnect this node, you need to create a new identity by adding -id=$(uuidgen) to the claiming script parameters (not yet supported on the kickstart script). Make sure that you have the uuidgen-runtime package installed, as it is used to run the command uuidgen. For example:

Claiming script

sudo netdata-claim.sh -token=TOKEN -rooms=ROOM1,ROOM2 -url=https://app.netdata.cloud -id=$(uuidgen)

The agent must be restarted after this change.

Connecting reference

In the sections below, you can find reference material for the kickstart script, claiming script, connecting via the Agent's command line tool, and details about the files found in cloud.d.

The cloud.conf file

This section defines how and whether your Agent connects to Netdata Cloud using the ACLK.

setting default info
cloud base url https://app.netdata.cloud The URL for the Netdata Cloud web application. You should not change this. If you want to disable Cloud, change the enabled setting.
enabled yes The runtime option to disable the Agent-Cloud link and prevent your Agent from connecting to Netdata Cloud.

kickstart script

The best way to install Netdata and connect your nodes to Netdata Cloud is with our automatic one-line installation script, kickstart. This script will install the Netdata Agent, in case it isn't already installed, and connect your node to Netdata Cloud.

This works with:

For details on how to run this script please check How to connect a node and choose your environment.

In case Netdata Agent is already installed and you run this script to connect a node to Netdata Cloud it will not upgrade your agent automatically. If you also want to upgrade the Agent installation you'll need to run the script again without the connection options.

Our suggestion is to first run kickstart to upgrade your agent by running the command below and the run the [How to connect a node] (#how-to-connect-a-node).

Linux

wget -O /tmp/netdata-kickstart.sh https://my-netdata.io/kickstart.sh && sh /tmp/netdata-kickstart.sh

macOS

curl https://my-netdata.io/kickstart.sh > /tmp/netdata-kickstart.sh && sh /tmp/netdata-kickstart.sh --install /usr/local/

Claiming script

A Space's administrator can also connect an Agent by directly calling the netdata-claim.sh script either with root privileges using sudo, or as the user running the Agent (typically netdata), and passing the following arguments:

-token=TOKEN
    where TOKEN is the Space's claiming token.
-rooms=ROOM1,ROOM2,...
    where ROOMX is the War Room this node should be added to. This list is optional.
-url=URL_BASE
    where URL_BASE is the Netdata Cloud endpoint base URL. By default, this is https://app.netdata.cloud.
-id=AGENT_ID
    where AGENT_ID is the unique identifier of the Agent. This is the Agent's MACHINE_GUID by default.
-hostname=HOSTNAME
    where HOSTNAME is the result of the hostname command by default.
-proxy=PROXY_URL
    where PROXY_URL is the endpoint of a HTTP or HTTPS proxy.

For example, the following command connects an Agent and adds it to rooms room1 and room2:

netdata-claim.sh -token=MYTOKEN1234567 -rooms=room1,room2

You should then update the netdata service about the result with netdatacli:

netdatacli reload-claiming-state

This reloads the Agent connection state from disk.

Our recommendation is to trigger the connection process using the kickstart whenever possible.

Netdata Agent command line

If a Netdata Agent is running, the Space's administrator can connect a node using the netdata service binary with additional command line parameters:

-W "claim -token=TOKEN -rooms=ROOM1,ROOM2"

For example:

/usr/sbin/netdata -D -W "claim -token=MYTOKEN1234567 -rooms=room1,room2"

If need be, the user can override the Agent's defaults by providing additional arguments like those described here.

Connection directory

Netdata stores the Agent's connection-related state in the Netdata library directory under cloud.d. For a default installation, this directory exists at /var/lib/netdata/cloud.d. The directory and its files should be owned by the user that runs the Agent, which is typically the netdata user.

The cloud.d/token file should contain the claiming-token and the cloud.d/rooms file should contain the list of War Rooms you added that node to.

The user can also put the Cloud endpoint's full certificate chain in cloud.d/cloud_fullchain.pem so that the Agent can trust the endpoint if necessary.