How to monitor Kubernetes + Docker with Datadog for the Keep-Network project.

PurDasha
4 min readOct 29, 2020

Collect Kubernetes and Docker metrics

First, you will need to deploy the Datadog Agent [https://github.com/DataDog/datadog-agent] to collect key resource metrics and events from Kubernetes and Docker for monitoring in Datadog. In this section, we will show you one way to install the containerized Datadog Agent as a DaemonSet on every node in your Kubernetes cluster. Or, if you only want to install it on a specific subset of nodes, you can add a nodeSelector field to your pod configuration.

If your Kubernetes cluster uses role-based access control (RBAC), you can deploy the Datadog Agent’s RBAC manifest (rbac-agent.yaml) to grant it the necessary permissions to operate in your cluster. Doing this creates a ClusterRole, ClusterRoleBinding, and ServiceAccount for the Agent.

kubectl create -f "https://raw.githubusercontent.com/DataDog/datadog-agent/master/Dockerfiles/manifests/cluster-agent/rbac/rbac-agent.yaml"

Next, copy the following manifest to a local file and save it as datadog-agent.yaml.

apiVersion: apps/v1
kind: DaemonSet
metadata:
name: datadog-agent
namespace: default
spec:
selector:
matchLabels:
app: datadog-agent
template:
metadata:
labels:
app: datadog-agent
name: datadog-agent
spec:
serviceAccountName: datadog-agent
containers:
- image: datadog/agent:latest
imagePullPolicy: Always
name: datadog-agent
ports:
- containerPort: 3919
name: dogstatsdport
protocol: UDP
- containerPort: 3920
name: traceport
protocol: TCP
env:
- name: DD_API_KEY
value: <YOUR_API_KEY>
- name: DD_COLLECT_KUBERNETES_EVENTS
value: "true"
- name: DD_LEADER_ELECTION
value: "true"
- name: KUBERNETES
value: "true"
- name: DD_HEALTH_PORT
value: "5555"
- name: DD_KUBELET_TLS_VERIFY
value: "false"
- name: DD_KUBERNETES_KUBELET_HOST
valueFrom:
fieldRef:
fieldPath: status.hostIP
- name: DD_APM_ENABLED
value: "true"
resources:
requests:
memory: "256Mi"
cpu: "200m"
limits:
memory: "256Mi"
cpu: "200m"
volumeMounts:
- name: dockersocket
mountPath: /var/run/docker.sock
- name: procdir
mountPath: /host/proc
readOnly: true
- name: cgroups
mountPath: /host/sys/fs/cgroup
readOnly: true
livenessProbe:
httpGet:
path: /health
port: 5555
initialDelaySeconds: 15
periodSeconds: 15
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 3
volumes:
- hostPath:
path: /var/run/docker.sock
name: dockersocket
- hostPath:
path: /proc
name: procdir
- hostPath:
path: /sys/fs/cgroup
name: cgroups

Replace <YOUR_API_KEY> with an API key from your Datadog account. Then run the following command to deploy the Agent as a DaemonSet:

kubectl create -f datadog-agent.yaml

Now you can verify that the Agent is collecting Docker and Kubernetes metrics by running the Agent’s status command. To do that, you first need to get the list of running pods so you can run the command on one of the Datadog Agent pods:

# Get the list of running pods
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
datadog-agent-krrmd 1/1 Running 0 17d
...

# Use the pod name returned above to run the Agent's 'status' command
$ kubectl exec -it datadog-agent-krrmd agent status

In the output you should see sections resembling the following, indicating that Kubernetes and Docker metrics are being collected:

kubelet (4.1.0)
---------------
Instance ID: kubelet:d884b5186b651429 [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/kubelet.d/conf.yaml.default
Total Runs: 35
Metric Samples: Last Run: 378, Total: 14,191
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 4, Total: 140
Average Execution Time : 817ms
Last Execution Date : 2020-10-22 15:20:37.000000 UTC
Last Successful Execution Date : 2020-10-22 15:20:37.000000 UTC

docker
------
Instance ID: docker [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/docker.d/conf.yaml.default
Total Runs: 35
Metric Samples: Last Run: 290, Total: 15,537
Events: Last Run: 1, Total: 4
Service Checks: Last Run: 1, Total: 35
Average Execution Time : 101ms
Last Execution Date : 2020-10-22 15:20:30.000000 UTC
Last Successful Execution Date : 2020-10-22 15:20:30.000000 UTC

Now you can glance at your built-in Datadog dashboards for Kubernetes and Docker to see what those metrics look like.

And, if you’re running a large-scale production deployment, you can also install the Datadog Cluster Agent — in addition to the node-based Agent — as a centralized and streamlined way to collect cluster-data for deep visibility into your infrastructure.

Add more Kubernetes metrics with kube-state-metrics

By default, the Kubernetes Agent check reports a handful of basic system metrics to Datadog, covering CPU, network, disk, and memory usage. You can easily expand on the data collected from Kubernetes by deploying the kube-state-metrics [https://github.com/kubernetes/kube-state-metrics] add-on to your cluster, which provides much more detailed metrics on the state of the cluster itself.

kube-state-metrics listens to the Kubernetes API and generates metrics about the state of Kubernetes logical objects: node status, node capacity (CPU and memory), number of desired/available/unavailable/updated replicas per deployment, pod status (e.g., waiting, running, ready), and so on. You can see the full list of metrics that Datadog collects from kube-state-metrics here.

To deploy kube-state-metrics as a Kubernetes service, copy the manifest here [https://github.com/kubernetes/kube-state-metrics/blob/master/examples/standard/service.yaml], paste it into a kube-state-metrics.yaml file, and deploy the service to your cluster:

kubectl create -f kube-state-metrics.yaml

Within minutes, you should see metrics with the prefix kubernetes_state. streaming into your Datadog account.

I used the website to compile the instructions — https://www.datadoghq.com/

--

--