Datadog Logs

Search, process, and archive logs with cost awareness.

Prerequisites

Datadog Pup (dd-pup/pup) should already be installed:

go install github.com/datadog-labs/pup@latest

Quick Start

pup auth login

Search Logs

# Basic search
pup logs search --query="status:error" --from="1h"

# With filters
pup logs search --query="service:api status:error" --from="1h" --limit 100

# JSON output
pup logs search --query="@http.status_code:>=500" --from="1h" --json

Search Syntax

Query	Meaning
`error`	Full-text search
`status:error`	Tag equals
`@http.status_code:500`	Attribute equals
`@http.status_code:>=400`	Numeric range
`service:api AND env:prod`	Boolean
`@message:timeout`	Wildcard

Pipelines

Process logs before indexing:

# List pipelines
pup logs pipelines list

# Create pipeline (JSON)
pup logs pipelines create --json @pipeline.json

Common Processors

{
  "name": "API Logs",
  "filter": {"query": "service:api"},
  "processors": [
    {
      "type": "grok-parser",
      "name": "Parse nginx",
      "source": "message",
      "grok": {"match_rules": "%{IPORHOST:client_ip} %{DATA:method} %{DATA:path} %{NUMBER:status}"}
    },
    {
      "type": "status-remapper",
      "name": "Set severity",
      "sources": ["level", "severity"]
    },
    {
      "type": "attribute-remapper",
      "name": "Remap user_id",
      "sources": ["user_id"],
      "target": "usr.id"
    }
  ]
}

⚠️ Exclusion Filters (Cost Control)

Index only what matters:

{
  "name": "Drop debug logs",
  "filter": {"query": "status:debug"},
  "is_enabled": true
}

High-Volume Exclusions

# Find noisiest log sources
pup logs search --query="*" --from="1h" --json | jq 'group_by(.service) | map({service: .[0].service, count: length}) | sort_by(-.count)[:10]'

Exclude	Query
Health checks	`@http.url:"/health" OR @http.url:"/ready"`
Debug logs	`status:debug`
Static assets	`@http.url:.css OR @http.url:.js`
Heartbeats	`@message:heartbeat`

Log-Based Metrics

Create metrics from logs (cheaper than indexing):

# Count errors per service
pup logs metrics create \
  --name "api.errors.count" \
  --query "service:api status:error" \
  --group-by "endpoint"

⚠️ Cardinality warning: Group by bounded values only.

Sensitive Data

Scrubbing Rules

{
  "type": "hash-remapper",
  "name": "Hash emails",
  "sources": ["email", "@user.email"]
}

Never Log

# In your app - sanitize before sending
import re

def sanitize_log(message: str) -> str:
    # Remove credit cards
    message = re.sub(r'\b\d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}\b', '[REDACTED]', message)
    # Remove SSNs
    message = re.sub(r'\b\d{3}-\d{2}-\d{4}\b', '[REDACTED]', message)
    return message

Troubleshooting

Problem	Fix
Logs not appearing	Check agent, pipeline filters
High costs	Add exclusion filters
Search slow	Narrow time range, use indexes
Missing attributes	Check grok parser

dd-logs