Skip to content

Service Discovery

Ilan Rabinovitch edited this page Feb 13, 2016 · 14 revisions

Service Discovery (beta)

Overview

Starting with the 5.7 release, the datadog agent includes support for service discovery. This feature works for Docker containers and can run on platforms such as Kubernetes, Docker Swarm, and Amazon ECS.

Service Discovery makes it possible to define configuration templates for a specific image in a configuration store. The agent will then use these configuration templates combined with container metadata to enable, disable and reconfigure checks dynamically in response to container start/stop events.

How it works?

The service discovery module listens to the Docker Events API, looking for events related to container creation, deletion, start or stop. When such events are found, the agent tries to identify which services are running in the new containers, and load the appropriate corresponding configuration objects if available.

The configuration reload is done in a few steps:

  • First, the agent looks in the configuration store for a user-supplied configuration template
  • If no template is found in the user-supplied configuration store, the agent will try to match the container image with a list of auto-configurable checks (the simplest ones, mostly)
  • If a configuration template is found, the service discovery module will try to replace template variables with data pulled from the docker API.
  • If no match is found, the service discovery process ends here and the container is left unmonitored. Of course manually-provided YAML configuration files still apply. (host IP, and port for now), and aggregate a list of instances with every instance of the service that this particular agent can have access to.

Dependencies

To run this feature, the only required component on top of the datadog-agent is a Key/Value store where the configuration templates are defined.

Both etcd and consul are supported for this, support for zookeeper is coming soon.

How do I configure it?

The first thing to do is to populate the configuration store. The structure should look like this:

/datadog/
  check_configs/
    docker_image_0/
      - check_name: check_name_0
      - init_config: {init_config}
      - instance: {instance_config}
    docker_image_1/
      - check_name: check_name_1
      - init_config: {init_config}
      - instance: {instance_config}
    docker_image_2/
      - check_name: check_name_2
      - init_config: {init_config}
      - instance: {instance_config}
    ...

Let's take the example of monitoring nginx with Datadog. The default nginx image doesn't have the nginx_status endpoint enabled, so we build a new image named custom-nginx that configures this endpoint.

Now if several nginx instances are running in the environment, or if you are using a platform like Kubernetes, there is no elegant way to configure the right agent to monitor the nginx instances. Indeed, the instance where the nginx container will run is not known in advance.

Enter service discovery. Now the only requirement is to setup a configuration template in the form of a few keys in a key/value store the agent can reach. Here is an example using etcd:

./etcdctl mkdir /datadog/check_configs/custom-nginx
./etcdctl set /datadog/check_configs/custom-nginx/check_name 'nginx'
./etcdctl set /datadog/check_configs/custom-nginx/init_config '{}'
./etcdctl set /datadog/check_configs/custom-nginx/instance '{"nginx_status_url": "http://%%host%%/nginx_status/", "tags": ["env:production"]}'

or with curl:

curl -L -X PUT http://127.0.0.1:4001/v2/keys/datadog/check_configs/custom-nginx/check_name -d value="nginx"
curl -L -X PUT http://127.0.0.1:4001/v2/keys/datadog/check_configs/custom-nginx/init_config -d value="{}"
curl -L -X PUT http://127.0.0.1:4001/v2/keys/datadog/check_configs/custom-nginx/instance -d value='{"nginx_status_url": "http://%25%25host%25%25/nginx_status/", "tags": ["env:production"]}'

Or if the agent is configured to use consul instead:

curl -L http://127.0.0.1:8500/v1/kv/datadog/check_configs/custom-nginx/check_name -XPUT -d value='nginx'
curl -L http://127.0.0.1:8500/v1/kv/datadog/check_configs/custom-nginx/init_config -XPUT -d value='{}'
curl -L http://127.0.0.1:8500/v1/kv/datadog/check_configs/custom-nginx/instance -XPUT -d value='{"nginx_status_url": "http://%25%25host%25%25/nginx_status/", "tags": ["env:production"]}'

Notice the format of template variables: %%host%%. For now host and port are supported on every platform. Kubernetes users can also use the tags variable that collects relevant tags like the pod name and node name from the Kubernetes API. Support of more variables and platforms is planned, and feature requests are welcome.

Then configure all the agents of the environment to run the service discovery using this store as the backend. To do so, simply edit your datadog.conf file to modify these options:

# For now only docker is supported so you can leave this line as-is.
# service_discovery_backend: docker
#
# Define which key/value store must be used to look for configuration templates.
# Default is etcd. Consul is also supported.
# sd_config_backend: etcd

# Settings for connecting to the backend. These are the default, edit them if you run a different config.
# sd_backend_host: 127.0.0.1
# sd_backend_port: 4001
#
# By default, the agent will look for the configuration templates under the
# `/datadog/check_configs` key in the back-end.
# If you wish otherwise, uncomment this option and modify its value.
# sd_template_dir: /datadog/check_configs

Now each agent will be able to detect an nginx instance running on its host and setup a check for it automatically. No need to restart the agent every time the container starts or stops, no other configuration file to modify.

Running and configuring the Agent in a container

The above settings can be passed to the dd-agent container through the following environment variables:

SD_BACKEND <-> service_discovery_backend
SD_CONFIG_BACKEND <-> sd_config_backend
SD_BACKEND_HOST <-> sd_backend_host
SD_BACKEND_PORT <-> sd_backend_port
SD_TEMPLATE_DIR <-> sd_template_dir

Available tags:

datadog/docker-dd-agent:sd-beta (has the Docker check preconfigured)
datadog/docker-dd-agent:sd-kubernetes-beta (has the Docker and Kubernetes checks preconfigured)

example:

docker run -d --name dd-agent -h `hostname` -v /var/run/docker.sock:/var/run/docker.sock -v /proc/:/host/proc/:ro -v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro -e API_KEY=[YOUR_API_KEY] -e SD_CONFIG_BACKEND=etcd -e SD_BACKEND=docker -e SD_BACKEND_HOST=localhost -e SD_BACKEND_PORT=4001 datadog/docker-dd-agent:sd-kubernetes-beta

Monitoring your custom container

Service discovery works with any image, one important note though is that for the %%port%% variable to be interpolated, the current version needs the container to expose the targeted port. See the nginx Dockerfile for reference.

Kubernetes users

Service discovery is particularly useful for container platforms like Kubernetes where by default the user doesn't choose on which node a container will be scheduled. With service discovery you can simply deploy the agent container with a DaemonSet and declare your configuration templates for all the containers you plan to launch in the same cluster. To deploy the agent, simply follow the instruction from the install page for kubernetes.

Additionally, installing an etcd cluster on Kubernetes can be done fairly easily. The most important part is to setup a service that is accessible from the Datadog agent. Instructions to install a simple, 3-node cluster can be found in the etcd repository.

Once the cluster is running, simply use the K/V store service IP address and port as sd_backend_host and sd_backend_port in datadog.conf (passing the corresponding environment variables to the container makes this easier, see entrypoint.sh

Then write your configuration templates, and let the agent detect your running pods and take care of re-configuring checks.

What is coming next?

This feature is still under active development, here are the next expected improvements:

  • making the config reload smarter by only reloading concerned checks (for which the config template was changed or a container was started/stopped)
  • support for other Key/Value stores
  • adding auto tagging support for more platforms
Clone this wiki locally