Skip to content

Service Discovery

Haïssam Kaj edited this page Apr 7, 2016 · 14 revisions

Service Discovery (beta)

Overview

Starting with the 5.8 release, the Datadog Agent ships with service discovery. This feature works for Docker containers and can run on platforms such as Kubernetes, Docker Swarm, and Amazon ECS.

Service discovery makes defining configuration templates for a Docker image in a configuration store and dynamically applying them to new containers on the fly possible. The Agent will then use these configuration templates combined with container metadata to enable, disable and reconfigure checks dynamically in response to container start/stop events.

HOWTO: migrate from the previous configuration template format

If you were using the beta version prior 7 April 2016 you need to migrate to the new template format before pulling the new tag. Newest versions of the feature can be found at tags service-discovery and service-discovery-k8s.

As a reminder, here is what the previous template looked like:

/datadog/
  check_configs/
    docker_image_0/
      - check_name: "check_name_0"
      - init_config: {init_config}
      - instance: {instance_config}
    docker_image_1/
      - check_name: "check_name_1"
      - init_config: {init_config}
      - instance: {instance_config}
    docker_image_2/
      - check_name: "check_name_2"
      - init_config: {init_config}
      - instance: {instance_config}
    ...

And this is what the new one looks like:

/datadog/
  check_configs/
    docker_image_0/
      - check_names: ["check_name_0"]
      - init_configs: [{init_config}]
      - instances: [{instance_config}]
    docker_image_1/
      - check_names: ["check_name_1"]
      - init_configs: [{init_config}]
      - instances: [{instance_config}]
    docker_image_2/
      - check_names: ["check_name_2"]
      - init_configs: [{init_config}]
      - instances: [{instance_config}]
    ...

This change allows defining several checks for each image. More about that here.

So for example if you used these commands to create an nginx template:

curl -L -X PUT http://127.0.0.1:4001/v2/keys/datadog/check_configs/custom-nginx/check_name -d value="nginx"
curl -L -X PUT http://127.0.0.1:4001/v2/keys/datadog/check_configs/custom-nginx/init_config -d value="{}
curl -L -X PUT http://127.0.0.1:4001/v2/keys/datadog/check_configs/custom-nginx/instance -d value='{"nginx_status_url": "http://%25%25host%25%25/nginx_status/", "tags": ["env:production"]}'

You now need to use these instead:

curl -L -X PUT http://127.0.0.1:4001/v2/keys/datadog/check_configs/custom-nginx/check_names -d value='["nginx"]'
curl -L -X PUT http://127.0.0.1:4001/v2/keys/datadog/check_configs/custom-nginx/init_configs -d value="[{}]"
curl -L -X PUT http://127.0.0.1:4001/v2/keys/datadog/check_configs/custom-nginx/instances -d value='[{"nginx_status_url": "http://%25%25host%25%25/nginx_status/", "tags": ["env:production"]}]'

To perform the upgrade smoothly we recommend you first create the keys/values for your new templates, then delete the agent container and create the new one, and finally remove the old template keys once the upgrade is over.

How does it work?

The service discovery module listens to the Docker Events API, searching for events related to container creation, deletion, start or stop. When such events are found, the Agent tries to identify which services are running in the new containers, and load the appropriate corresponding configuration objects if available.

The configuration happens in a few steps:

  • First, the Agent looks into the configuration store for a user-supplied configuration template
  • If no template was found in the user-supplied configuration store, the Agent will try to match the container image with a list of auto-configurable checks (the simplest ones, mostly)
  • If a configuration template is found, the service discovery module will try to replace template variables with data pulled from the Docker API (host IP address, port, and tags for now), and aggregate a list of instances with every instance of the Docker image that this particular agent has access to.
  • If no match is found, the service discovery process ends here and the container is left unmonitored. Of course manually-provided YAML configuration files still apply.

Dependencies

To enable this feature, the only required components on top of the Datadog Agent are Docker (service discovery only works for containers at the moment) and a key/value store where the configuration templates are defined.

Both etcd and consul are supported for this.

Note that enabling service discovery without setting up a configuration store will partially work. The agent will try applying simple, included configuration templates to containers it recognizes. This is the auto-configuration mode.

How do I configure it?

The first thing to do is to populate the configuration store. The structure of the configuration should look like this:

/datadog/
  check_configs/
    docker_image_0/
      - check_names: ["check_name_0"]
      - init_configs: [{init_config}]
      - instances: [{instance_config}]
    docker_image_1/
      - check_names: ["check_name_1"]
      - init_configs: [{init_config}]
      - instances: [{instance_config}]
    docker_image_2/
      - check_names: ["check_name_2"]
      - init_configs: [{init_config}]
      - instances: [{instance_config}]
    ...

Example

Let's take the example of monitoring nginx with Datadog. The default NGINX image doesn't have the nginx_status endpoint enabled, so we build a new image named custom-nginx that configures this endpoint.

Now if several NGINX instances are running in the environment, or if you are using a platform like Kubernetes, there is no satisfying way to configure the right agents to monitor each NGINX instance. Indeed, the host where the NGINX container will run is not known in advance.

Enter service discovery. Now the only requirement is to setup a configuration template in the form of a few keys in a key/value store the Agent can reach. Here is an example using etcd:

./etcdctl mkdir /datadog/check_configs/custom-nginx
./etcdctl set /datadog/check_configs/custom-nginx/check_names '["nginx"]'
./etcdctl set /datadog/check_configs/custom-nginx/init_configs '[{}]'
./etcdctl set /datadog/check_configs/custom-nginx/instances '[{"nginx_status_url": "http://%%host%%/nginx_status/", "tags": ["env:production"]}]'

or with curl:

curl -L -X PUT http://127.0.0.1:4001/v2/keys/datadog/check_configs/custom-nginx/check_names -d value='["nginx"]'
curl -L -X PUT http://127.0.0.1:4001/v2/keys/datadog/check_configs/custom-nginx/init_configs -d value="[{}]"
curl -L -X PUT http://127.0.0.1:4001/v2/keys/datadog/check_configs/custom-nginx/instances -d value='[{"nginx_status_url": "http://%25%25host%25%25/nginx_status/", "tags": ["env:production"]}]'

If the Agent is configured to use consul instead:

curl -L http://127.0.0.1:8500/v1/kv/datadog/check_configs/custom-nginx/check_names -XPUT -d '["nginx"]'
curl -L http://127.0.0.1:8500/v1/kv/datadog/check_configs/custom-nginx/init_configs -XPUT -d '[{}]'
curl -L http://127.0.0.1:8500/v1/kv/datadog/check_configs/custom-nginx/instances -XPUT -d '[{"nginx_status_url": "http://%%host%%/nginx_status/", "tags": ["env:production"]}]'

Notice the format of template variables: %%host%%. For now host and port are supported on every platform. Kubernetes users can also use the tags variable that collects relevant tags like the pod name and node name from the Kubernetes API. Support for more variables and platforms is planned, and feature requests are welcome.

Finally you need to configure all the Agents of the environment to enable service discovery using this store as a backend. To do so, simply edit the datadog.conf file to modify these options:

# For now only docker is supported so you just need to un-comment this line.
# service_discovery_backend: docker
#
# Define which key/value store must be used to look for configuration templates.
# Default is etcd. Consul is also supported.
# sd_config_backend: etcd

# Settings for connecting to the backend. These are the default, edit them if you run a different config.
# sd_backend_host: 127.0.0.1
# sd_backend_port: 4001

# By default, the agent will look for the configuration templates under the
# `/datadog/check_configs` key in the back-end.
# If you wish otherwise, uncomment this option and modify its value.
# sd_template_dir: /datadog/check_configs

Now every Agent will be able to detect an nginx instance running on its host and setup a check for it automatically. No need to restart the Agent every time the container starts or stops, and no other configuration file to modify.

Template variables

To automate the resolution of parameters like the host IP address or its port, the agent uses template variables in this format: %%variable%%.

This format can be suffixed with an index when a list of values is expected for the variable, and selecting a specific one is mandatory. It has to look like this: %%variable_index%% If no index is provided, the last value in the value list ordered increasingly will be used.

Let's take the example of the port variable: a rabbitmq container with the management module enabled has 6 exposed ports by default (the docker image with the management module enabled by default is rabbitmq:3-management). The list of ports as seen by the agent is: [4369, 5671, 5672, 15671, 15672, 25672]. Notice the order. The Agent always sorts values in ascending order.

The default management port for the rabbitmq image is 15672 with index 4 in the list (starting from 0), so the template variable needs to look like %%port_4%%.

Configuring multiple checks for the same image

Sometimes enabling several checks on a single container is needed. For instance if you run a Java service that provides an HTTP API, using the HTTP check and the JMX integration at the same time makes perfect sense. To declare that in templates, simply add elements to the check_names, init_configs and instances lists. These elements will be matched together based on their index in their respective lists.

Example

In the previous example of the custom nginx image, adding http_check would look like this:

curl -L -X PUT \
  http://127.0.0.1:4001/v2/keys/datadog/check_configs/custom-nginx/check_names \
  -d value='["nginx", "http_check"]'
curl -L -X PUT \
  http://127.0.0.1:4001/v2/keys/datadog/check_configs/custom-nginx/init_configs \
  -d value="[{}, {}]"
curl -L -X PUT \
    http://127.0.0.1:4001/v2/keys/datadog/check_configs/custom-nginx/instances \
    -d value='[ \
    {"nginx_status_url": "http://%25%25host%25%25/nginx_status/", "tags": ["env:production"]}, \
    {"name": "Test service", "url": "http://%25%25host%25%25/test_endpoint", "timeout": 1}]'

Running and configuring the Agent in a container

The above settings can be passed to the dd-agent container through the following environment variables:

SD_BACKEND <-> service_discovery_backend
SD_CONFIG_BACKEND <-> sd_config_backend
SD_BACKEND_HOST <-> sd_backend_host
SD_BACKEND_PORT <-> sd_backend_port
SD_TEMPLATE_DIR <-> sd_template_dir

Available tags:

datadog/docker-dd-agent:service-discovery (has the Docker check preconfigured)
datadog/docker-dd-agent:service-discovery-k8s (has the Docker and Kubernetes checks preconfigured)

example:

docker run -d --name dd-agent -h `hostname` -v /var/run/docker.sock:/var/run/docker.sock -v /proc/:/host/proc/:ro -v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro -e API_KEY=[YOUR_API_KEY] -e SD_CONFIG_BACKEND=etcd -e SD_BACKEND=docker -e SD_BACKEND_HOST=localhost -e SD_BACKEND_PORT=4001 datadog/docker-dd-agent:service-discovery-k8s

Monitoring your custom container

Service discovery works with any image—one important note though is that for the %%port%% variable to be interpolated, the current version needs the container to expose the targeted port. See the NGINX Dockerfile for reference.

Kubernetes users

Service discovery is particularly useful for container platforms like Kubernetes where by default the user doesn't choose the node on which a container will be scheduled. With service discovery you can simply deploy the Agent container with a DaemonSet and declare your configuration templates for all the containers you plan to launch in the same cluster. To deploy the Agent, simply follow the instruction from the install page for Kubernetes.

Additionally, installing an etcd cluster on Kubernetes can be done fairly easily. The most important part is to setup a service that is accessible from the Datadog Agent. Instructions to install a simple, 3-node cluster can be found in the etcd repository.

Once the cluster is running, simply use the K/V store service IP address and port as sd_backend_host and sd_backend_port in datadog.conf (passing the corresponding environment variables to the container makes this easier, see the mapping above, or entrypoint.sh for reference.

Then write your configuration templates, and let the Agent detect your running pods and take care of re-configuring checks.

Examples

Following is an example of how to setup templates for an NGINX, PostgreSQL stack. The example will use etcd as the configuration store and suppose that the etcd cluster is deployed as a service in kubernetes with the IP address 10.0.65.98.

NGINX

The default NGINX image doesn't have a /nginx_status/ endpoint enabled, so the first step is to enable that as described in the Datadog NGINX tile (click on "Configuration") in a new image which we will name custom-nginx in this example. Once the image is named, the configuration template can be defined this way:

curl -L -X PUT http://10.0.65.98:4001/v2/keys/datadog/check_configs/custom-nginx/check_name -d value="nginx"
curl -L -X PUT http://10.0.65.98:4001/v2/keys/datadog/check_configs/custom-nginx/init_config -d value="{}"
curl -L -X PUT http://10.0.65.98:4001/v2/keys/datadog/check_configs/custom-nginx/instance -d value='{"nginx_status_url": "http://%25%25host%25%25/nginx_status/", "tags": %25%25tags%25%25}'

The %%tags%% variable will add metadata about the replication controller, the pod name, etc.

PostgreSQL

Next comes the PostgreSQL configuration. Steps to connect Postgres to Datadog are as usual described in the integration tile. To ease the deployment process we'll assume these steps are automated in a script that is executed in a Dockerfile based on the official postgres Docker image, resulting in a new custom-postgres image.

The configuration template is thus defined like this:

curl -L -X PUT http://10.0.65.98:4001/v2/keys/datadog/check_configs/custom-postgres/check_name -d value="postgres"
curl -L -X PUT http://10.0.65.98:4001/v2/keys/datadog/check_configs/custom-postgres/init_config -d value="{}"
curl -L -X PUT http://10.0.65.98:4001/v2/keys/datadog/check_configs/custom-postgres/instance -d value='{"host": "%25%25host%25%25", "port": "%25%25port%25%25", "tags": %25%25tags%25%25}'

The postgres image only exposes the default port, so appending an index to the port variable is unnecessary.

Now the Agent can be deployed following the Kubernetes instructions and passing the right environment variables to enable service discovery as covered earlier. And whenever a Postgres or NGINX container is deployed, agents will detect them and update the check configurations accordingly.

What's next?

This feature is still under active development, here are the next expected improvements:

  • making the config reload smarter by only reloading concerned checks (for which the config template was changed or a container was started/stopped)
  • support for other key/value stores
  • auto tagging support for more platforms
Clone this wiki locally