enable-ssi
Enable APM on Kubernetes via Single Step Instrumentation
Before doing anything else: Fully resolve all variables in
## Context to resolve before acting. Do not begin Step 0 until every variable has a concrete value.
Triggers
Invoke this skill when the user expresses intent to:
- Enable APM on a Kubernetes cluster
- Instrument Kubernetes applications with Datadog tracing
- Set up Single Step Instrumentation (SSI)
Do NOT invoke this skill if:
- The Datadog Agent is not yet installed — run
agent-installfirst - The user wants to verify SSI after setup — use
verify-ssi - The user wants to enable Profiler, AppSec, or Data Streams — use
dd-apm-k8s-sdk-features
Prerequisites
These are not a reading exercise — actively verify each one before proceeding.
Environment
- Datadog Agent is installed and healthy —
agent-installcomplete - Kubernetes v1.20+
- Linux node pools only — Windows pods require explicit namespace exclusion
- Cluster is not ECS Fargate — unsupported
- Not a hardened SELinux environment — unsupported
- Not a very small VM instance (e.g. t2.micro) — SSI can hit init timeouts
- No PodSecurity baseline or restricted policy enforced
Base image — verify before proceeding:
Claude runs
kubectl exec -n <APP_NAMESPACE> -l app=<APP_LABEL> -- ldd --version 2>&1 | head -1
If the output contains glibc or GLIBC or GNU libc — proceed.
ERROR: Output contains musl — stop. SSI's injector requires glibc and is ABI-incompatible with musl libc. The injector will load but silently abort injection, and no traces will be sent. Switch the base image to a glibc-based equivalent (e.g. python:X-slim, node:X-bookworm-slim, any Debian/Ubuntu/UBI image), then rebuild, reload, restart the pod, and rerun this check before continuing.
Language and runtime
- Application language is one of: Java, Python, Ruby, Node.js, .NET, PHP
- Runtime version is within SSI's supported range — verify against the SSI compatibility matrix
- Node.js app is not using ESM — SSI does not support ESM
- Java app is not already using a
-javaagentJVM flag
Existing instrumentation — verify before proceeding:
Claude runs
# Check source files for manual tracer imports
grep -r "import ddtrace\|from ddtrace\|require 'ddtrace'\|require(\"dd-trace\")\|opentelemetry\|tracer\.trace(" <SOURCE_DIR> 2>/dev/null || echo "No manual instrumentation found"
# Check dependency manifests
grep -rE "ddtrace|dd-trace|opentelemetry" requirements.txt package.json Gemfile go.mod pom.xml 2>/dev/null || echo "No tracer dependency found"
ERROR: Any match found — remove the import/package before continuing (see Step 0). SSI silently disables itself when existing instrumentation is detected.
If no matches — proceed.
Context to resolve before acting
| Variable | How to resolve |
|---|---|
AGENT_NAMESPACE |
Same namespace used in agent-install (e.g. datadog) |
APP_NAMESPACE |
Ask the user which namespace their application runs in |
TARGET_LANGUAGES |
Identify from repo — check Dockerfiles, package manifests, or ask the user |
DEPLOYMENT_NAME |
Identify from repo or ask the user |
APP_LABEL |
Check spec.selector.matchLabels.app in the Deployment manifest |
CLUSTER_NAME |
Check spec.global.clusterName in datadog-agent.yaml, or kubectl config current-context — needed for kind clusters in Step 0 |
Step 0 (Only if existing instrumentation detected): Remove Manual Instrumentation
Scan all source files for: import ddtrace, from ddtrace, require 'ddtrace', require("dd-trace"), opentelemetry, tracer.trace(
Also check dependency manifests for ddtrace / dd-trace / OTel SDK packages.
If found — remove the import/package, then rebuild and reload:
Claude runs
docker build -f <DOCKERFILE_PATH> -t <IMAGE_NAME> <BUILD_CONTEXT>
[DECISION: how does this cluster get local images?]
Check the repo's setup script (e.g. create.sh, Makefile, justfile) for how images are loaded — do not guess from the cluster name or context. Common patterns:
| What you find in the setup script | Load command |
|---|---|
minikube image load or minikube cache add |
minikube -p <PROFILE> image load <IMAGE_NAME> — profile is the -p flag value in the script, NOT necessarily the kubectl context name |
kind load docker-image |
kind load docker-image <IMAGE_NAME> --name <CLUSTER_NAME> |
docker push to a registry |
Push the new image; the cluster will pull on restart — skip local load |
k3d image import |
k3d image import <IMAGE_NAME> -c <CLUSTER_NAME> |
| No image load step (cloud cluster, always pulls from registry) | Skip — image will be pulled on next deployment |
If the setup script is ambiguous, run the load command it uses exactly as written.
- Registry-based: skip — image will be pulled on next deployment
Claude runs
kubectl rollout restart deployment/<DEPLOYMENT_NAME> -n <APP_NAMESPACE>
kubectl wait --for=condition=Ready pod \
-l app=<APP_LABEL> \
-n <APP_NAMESPACE> \
--timeout=120s
Step 1: Extend the DatadogAgent Manifest with APM
SSI is configured on the existing DatadogAgent resource — do not create a separate manifest.
[DECISION: targeting scope — ask the user if unclear]
- Cluster-wide:
enabled: truewith notargetsorenabledNamespaces - Specific namespaces:
enabledNamespaces - Specific pods:
targetswithpodSelector - Excluding namespaces:
disabledNamespaces
Recommended ddTraceVersions: java: "1", python: "2", js: "5", dotnet: "3", ruby: "2", php: "1"
Option A — Target specific workloads (recommended for production):
features:
apm:
instrumentation:
enabled: true
targets:
- name: <TARGET_NAME>
namespaceSelector:
matchNames:
- <APP_NAMESPACE>
ddTraceVersions:
<LANGUAGE>: "<MAJOR_VERSION>"
Option B — Specific namespaces only:
features:
apm:
instrumentation:
enabled: true
enabledNamespaces:
- <APP_NAMESPACE>
Option C — Cluster-wide with exclusions:
features:
apm:
instrumentation:
enabled: true
disabledNamespaces:
- jenkins
- kube-system
Claude runs
kubectl apply -f datadog-agent.yaml
If datadogagent.datadoghq.com/datadog configured — continue to Step 2.
ERROR: Validation error — check YAML. enabledNamespaces and disabledNamespaces cannot both be set.
Step 2: Configure Unified Service Tags on Application Workloads
Add UST labels to the Deployment under both metadata.labels and spec.template.metadata.labels:
metadata:
labels:
tags.datadoghq.com/env: "<ENV>"
tags.datadoghq.com/service: "<SERVICE_NAME>"
tags.datadoghq.com/version: "<VERSION>"
spec:
template:
metadata:
labels:
tags.datadoghq.com/env: "<ENV>"
tags.datadoghq.com/service: "<SERVICE_NAME>"
tags.datadoghq.com/version: "<VERSION>"
Claude runs
kubectl apply -f <your-app-deployment.yaml>
Step 3: Restart Application Pods
Claude runs
kubectl rollout restart deployment/<DEPLOYMENT_NAME> -n <APP_NAMESPACE>
kubectl wait --for=condition=Ready pod \
-l app=<APP_LABEL> \
-n <APP_NAMESPACE> \
--timeout=120s
If pods restart cleanly, init containers named datadog-lib-<language>-init will be visible in the pod spec.
ERROR: Pods crash-looping — check for existing custom instrumentation. See troubleshoot-ssi.
Done
Exit when ALL of the following are true:
-
features.apm.instrumentationis present in the appliedDatadogAgentmanifest - Application pods have been restarted and are Running
- UST labels are present on the Deployment and pod template
- Scope confirmed: which workloads are instrumented, which were skipped and why
Automatically proceed to verify-ssi now — do not ask the user for permission.
Security constraints
- Never write a raw API key into any file or chat message
- Never use namespace
defaultfor Datadog resources - Never modify
admissionControllersettings directly — SSI manages this via the Operator - Do not add APM config to application manifests — configure only via
DatadogAgent - Exception: UST labels (
tags.datadoghq.com/*) on application Deployments are required and intentional - Never run
kubectl deletewithout user confirmation docker pushto a registry always requires user confirmation