Skip to content

Service Discovery

Ilan Rabinovitch edited this page Mar 23, 2016 · 14 revisions

Service Discovery (beta)

Overview

Starting with the 5.8 release, the Datadog Agent includes support for service discovery. This feature works for Docker containers and can run on platforms such as Kubernetes, Docker Swarm, and Amazon ECS.

Service discovery makes it possible to define configuration templates for a specific image in a configuration store. The Agent will then use these configuration templates combined with container metadata to enable, disable and reconfigure checks dynamically in response to container start/stop events.

How does it work?

The service discovery module listens to the Docker Events API, searching for events related to container creation, deletion, start or stop. When such events are found, the Agent tries to identify which services are running in the new containers, and load the appropriate corresponding configuration objects if available.

The configuration happens in a few steps:

  • First, the Agent looks in the configuration store for a user-supplied configuration template
  • If no template is found in the user-supplied configuration store, the Agent will try to match the container image with a list of auto-configurable checks (the simplest ones, mostly)
  • If a configuration template is found, the service discovery module will try to replace template variables with data pulled from the Docker API (host IP address, port, and tags for now), and aggregate a list of instances with every instance of the Docker image that this particular agent has access to.
  • If no match is found, the service discovery process ends here and the container is left unmonitored. Of course manually-provided YAML configuration files still apply.

Dependencies

To run this feature, the only required component on top of the Datadog Agent is a key/value store where the configuration templates are defined.

Both etcd and consul are supported for this.

How do I configure it?

The first thing to do is to populate the configuration store. The structure of the configuration should look like this:

/datadog/
  check_configs/
    docker_image_0/
      - check_name: check_name_0
      - init_config: {init_config}
      - instance: {instance_config}
    docker_image_1/
      - check_name: check_name_1
      - init_config: {init_config}
      - instance: {instance_config}
    docker_image_2/
      - check_name: check_name_2
      - init_config: {init_config}
      - instance: {instance_config}
    ...

Let's take the example of monitoring nginx with Datadog. The default NGINX image doesn't have the nginx_status endpoint enabled, so we build a new image named custom-nginx that configures this endpoint.

Now if several NGINX instances are running in the environment, or if you are using a platform like Kubernetes, there is no elegant way to configure the right agents to monitor each NGINX instance. Indeed, the host where the NGINX container will run is not known in advance.

Enter service discovery. Now the only requirement is to setup a configuration template in the form of a few keys in a key/value store the Agent can reach. Here is an example using etcd:

./etcdctl mkdir /datadog/check_configs/custom-nginx
./etcdctl set /datadog/check_configs/custom-nginx/check_name 'nginx'
./etcdctl set /datadog/check_configs/custom-nginx/init_config '{}'
./etcdctl set /datadog/check_configs/custom-nginx/instance '{"nginx_status_url": "http://%%host%%/nginx_status/", "tags": ["env:production"]}'

or with curl:

curl -L -X PUT http://127.0.0.1:4001/v2/keys/datadog/check_configs/custom-nginx/check_name -d value="nginx"
curl -L -X PUT http://127.0.0.1:4001/v2/keys/datadog/check_configs/custom-nginx/init_config -d value="{}"
curl -L -X PUT http://127.0.0.1:4001/v2/keys/datadog/check_configs/custom-nginx/instance -d value='{"nginx_status_url": "http://%25%25host%25%25/nginx_status/", "tags": ["env:production"]}'

If the Agent is configured to use consul instead:

curl -L http://127.0.0.1:8500/v1/kv/datadog/check_configs/custom-nginx/check_name -XPUT -d value='nginx'
curl -L http://127.0.0.1:8500/v1/kv/datadog/check_configs/custom-nginx/init_config -XPUT -d value='{}'
curl -L http://127.0.0.1:8500/v1/kv/datadog/check_configs/custom-nginx/instance -XPUT -d value='{"nginx_status_url": "http://%25%25host%25%25/nginx_status/", "tags": ["env:production"]}'

Notice the format of template variables: %%host%%. For now host and port are supported on every platform. Kubernetes users can also use the tags variable that collects relevant tags like the pod name and node name from the Kubernetes API. Support for more variables and platforms is planned, and feature requests are welcome.

Then you need to configure all the Agents of the environment to enable service discovery using this store as a backend. To do so, simply edit the datadog.conf file to modify these options:

# For now only docker is supported so you just need to un-comment this line.
# service_discovery_backend: docker
#
# Define which key/value store must be used to look for configuration templates.
# Default is etcd. Consul is also supported.
# sd_config_backend: etcd

# Settings for connecting to the backend. These are the default, edit them if you run a different config.
# sd_backend_host: 127.0.0.1
# sd_backend_port: 4001

# By default, the agent will look for the configuration templates under the
# `/datadog/check_configs` key in the back-end.
# If you wish otherwise, uncomment this option and modify its value.
# sd_template_dir: /datadog/check_configs

Now every Agent will be able to detect an nginx instance running on its host and setup a check for it automatically. No need to restart the Agent every time the container starts or stops, and no other configuration file to modify.

Template variables

To automate the resolution of parameters like the host IP address or its port, the Agent uses template variables in this format: %%variable%%.

This format can be suffixed with an index when a list of values is expected for the variable, and selecting a specific one is mandatory. It has to look like this: %%variable_index%% If no index is provided, the last value in the value list ordered increasingly will be used.

Let's take the example of the port variable: a rabbitmq container with the management module enabled has 6 exposed ports by default (the docker image with the management module enabled by default is rabbitmq:3-management). The list of ports as seen by the agent is: [4369, 5671, 5672, 15671, 15672, 25672]. Notice the order. The Agent always sorts values in ascending order.

The default management port for the rabbitmq image is 15672 with index 4 in the list (starting from 0), so the template variable needs to look like %%port_4%%.

Running and configuring the Agent in a container

The above settings can be passed to the dd-agent container through the following environment variables:

SD_BACKEND <-> service_discovery_backend
SD_CONFIG_BACKEND <-> sd_config_backend
SD_BACKEND_HOST <-> sd_backend_host
SD_BACKEND_PORT <-> sd_backend_port
SD_TEMPLATE_DIR <-> sd_template_dir

Available tags:

datadog/docker-dd-agent:sd-beta (has the Docker check preconfigured)
datadog/docker-dd-agent:sd-kubernetes-beta (has the Docker and Kubernetes checks preconfigured)

example:

docker run -d --name dd-agent -h `hostname` -v /var/run/docker.sock:/var/run/docker.sock -v /proc/:/host/proc/:ro -v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro -e API_KEY=[YOUR_API_KEY] -e SD_CONFIG_BACKEND=etcd -e SD_BACKEND=docker -e SD_BACKEND_HOST=localhost -e SD_BACKEND_PORT=4001 datadog/docker-dd-agent:sd-kubernetes-beta

Monitoring your custom container

Service discovery works with any image—one important note though is that for the %%port%% variable to be interpolated, the current version needs the container to expose the targeted port. See the NGINX Dockerfile for reference.

Kubernetes users

Service discovery is particularly useful for container platforms like Kubernetes where by default the user doesn't choose the node on which a container will be scheduled. With service discovery you can simply deploy the Agent container with a DaemonSet and declare your configuration templates for all the containers you plan to launch in the same cluster. To deploy the Agent, simply follow the instruction from the install page for Kubernetes.

Additionally, installing an etcd cluster on Kubernetes can be done fairly easily. The most important part is to setup a service that is accessible from the Datadog Agent. Instructions to install a simple, 3-node cluster can be found in the etcd repository.

Once the cluster is running, simply use the K/V store service IP address and port as sd_backend_host and sd_backend_port in datadog.conf (passing the corresponding environment variables to the container makes this easier, see the mapping above, or entrypoint.sh for reference.

Then write your configuration templates, and let the Agent detect your running pods and take care of re-configuring checks.

Examples

Following is an example of how to setup templates for an NGINX, PostgreSQL stack. The example will use etcd as the configuration store and suppose that the etcd cluster is deployed as a service in kubernetes with the IP address 10.0.65.98.

NGINX

The default NGINX image doesn't have a /nginx_status/ endpoint enabled, so the first step is to enable that as described in the Datadog NGINX tile (click on "Configuration") in a new image which we will name custom-nginx in this example. Once the image is named, the configuration template can be defined this way:

curl -L -X PUT http://10.0.65.98:4001/v2/keys/datadog/check_configs/custom-nginx/check_name -d value="nginx"
curl -L -X PUT http://10.0.65.98:4001/v2/keys/datadog/check_configs/custom-nginx/init_config -d value="{}"
curl -L -X PUT http://10.0.65.98:4001/v2/keys/datadog/check_configs/custom-nginx/instance -d value='{"nginx_status_url": "http://%25%25host%25%25/nginx_status/", "tags": %25%25tags%25%25}'

The %%tags%% variable will add metadata about the replication controller, the pod name, etc.

PostgreSQL

Next comes the PostgreSQL configuration. Steps to connect Postgres to Datadog are as usual described in the integration tile. To ease the deployment process we'll assume these steps are automated in a script that is executed in a Dockerfile based on the official postgres Docker image, resulting in a new custom-postgres image.

The configuration template is thus defined like this:

curl -L -X PUT http://10.0.65.98:4001/v2/keys/datadog/check_configs/custom-postgres/check_name -d value="postgres"
curl -L -X PUT http://10.0.65.98:4001/v2/keys/datadog/check_configs/custom-postgres/init_config -d value="{}"
curl -L -X PUT http://10.0.65.98:4001/v2/keys/datadog/check_configs/custom-postgres/instance -d value='{"host": "%25%25host%25%25", "port": "%25%25port%25%25", "tags": %25%25tags%25%25}'

The postgres image only exposes the default port, so appending an index to the port variable is unnecessary.

Now the Agent can be deployed following the Kubernetes instructions and passing the right environment variables to enable service discovery as covered earlier. And whenever a Postgres or NGINX container is deployed, agents will detect them and update the check configurations accordingly.

What's next?

This feature is still under active development, here are the next expected improvements:

  • making the config reload smarter by only reloading concerned checks (for which the config template was changed or a container was started/stopped)
  • support for other key/value stores
  • auto tagging support for more platforms
Clone this wiki locally