skills/akin-ozer/cc-devops-skills/loki-config-generator

loki-config-generator

SKILL.md

Loki Configuration Generator

Overview

Generate production-ready Grafana Loki server configurations with best practices. Supports monolithic, simple scalable, and microservices deployment modes with S3, GCS, Azure, or filesystem storage.

Current Stable: Loki 3.6.2 (November 2025) Important: Promtail deprecated in 3.4 - use Grafana Alloy instead. See examples/grafana-alloy.yaml for log collection configuration.

When to Use

Invoke when: deploying Loki, creating configs from scratch, migrating to Loki, implementing multi-tenant logging, configuring storage backends, or optimizing existing deployments.


Generation Methods

Method 1: Script Generation (Recommended)

Use scripts/generate_config.py for consistent, validated configurations:

# Simple Scalable with S3 (production)
python3 scripts/generate_config.py \
  --mode simple-scalable \
  --storage s3 \
  --bucket my-loki-bucket \
  --region us-east-1 \
  --retention-days 30 \
  --otlp-enabled \
  --output loki-config.yaml

# Monolithic with filesystem (development)
python3 scripts/generate_config.py \
  --mode monolithic \
  --storage filesystem \
  --no-auth-enabled \
  --output loki-dev.yaml

# Production with Thanos storage (Loki 3.4+)
python3 scripts/generate_config.py \
  --mode simple-scalable \
  --storage s3 \
  --thanos-storage \
  --otlp-enabled \
  --time-sharding \
  --output loki-thanos.yaml

Script Options:

Option Description
--mode monolithic, simple-scalable, microservices
--storage filesystem, s3, gcs, azure
--auth-enabled / --no-auth-enabled Explicitly enable/disable auth
--otlp-enabled Enable OTLP ingestion configuration
--thanos-storage Use Thanos object storage client (3.4+, cloud backends)
--time-sharding Enable out-of-order ingestion (simple-scalable)
--ruler Enable alerting/recording rules (not monolithic)
--horizontal-compactor main/worker mode (simple-scalable, 3.6+)
--zone-awareness Enable multi-AZ placement safeguards
--limits-dry-run Log limit rejections without enforcing

Method 2: Manual Configuration

Follow the staged workflow below when script generation doesn't meet specific requirements or when learning the configuration structure.

Output Formats

For Kubernetes deployments, generate BOTH formats:

  1. Native Loki config (loki-config.yaml) - For ConfigMap or direct use
  2. Helm values (values.yaml) - For Helm chart deployments

See examples/kubernetes-helm-values.yaml for Helm format.


Documentation Lookup

When to Use Context7/Web Search

REQUIRED - Use Context7 MCP for:

  • Configuring features from Loki 3.4+ (Thanos storage, time sharding)
  • Configuring features from Loki 3.6+ (horizontal compactor, enforced labels)
  • Bloom filter configuration (complex, experimental)
  • Custom OTLP attribute mappings beyond standard patterns
  • Troubleshooting configuration errors

OPTIONAL - Skip documentation lookup for:

  • Standard deployment modes (monolithic, simple-scalable)
  • Basic storage configuration (S3, GCS, Azure, filesystem)
  • Default limits and component settings
  • Configurations covered in references/ directory

Context7 MCP (preferred)

resolve-library-id: "grafana loki"
get-library-docs: /websites/grafana_loki, topic: [component]

Example topics: storage_config, limits_config, otlp, compactor, ruler, bloom

Web Search Fallback

Use when Context7 unavailable: "Grafana Loki 3.6 [component] configuration documentation site:grafana.com"


Configuration Workflow

Stage 1: Gather Requirements

Deployment Mode:

Mode Scale Use Case
Monolithic <100GB/day Testing, development
Simple Scalable 100GB-1TB/day Production
Microservices >1TB/day Large-scale, multi-tenant

Storage Backend: S3, GCS, Azure Blob, Filesystem, MinIO

Key Questions: Expected log volume? Retention period? Multi-tenancy needed? High availability requirements? Kubernetes deployment?

Ask the user directly if required information is missing.

Stage 2: Schema Configuration (CRITICAL)

For all new deployments (Loki 2.9+), use TSDB with v13 schema:

schema_config:
  configs:
    - from: "2025-01-01"  # Use deployment date
      store: tsdb
      object_store: s3     # s3, gcs, azure, filesystem
      schema: v13
      index:
        prefix: loki_index_
        period: 24h

Key: Schema cannot change after deployment without migration.

Stage 3: Storage Configuration

S3:

common:
  storage:
    s3:
      s3: s3://us-east-1/loki-bucket
      s3forcepathstyle: false

GCS: gcs: { bucket_name: loki-bucket } Azure: azure: { container_name: loki-container, account_name: ${AZURE_ACCOUNT_NAME} } Filesystem: filesystem: { chunks_directory: /loki/chunks, rules_directory: /loki/rules }

Stage 4: Component Configuration

Ingester:

ingester:
  chunk_encoding: snappy
  chunk_idle_period: 30m
  max_chunk_age: 2h
  chunk_target_size: 1572864  # 1.5MB
  lifecycler:
    ring:
      replication_factor: 3  # 3 for production

Querier:

querier:
  max_concurrent: 4
  query_timeout: 1m

Compactor:

compactor:
  working_directory: /loki/compactor
  compaction_interval: 10m
  retention_enabled: true
  retention_delete_delay: 2h

Stage 5: Limits Configuration

limits_config:
  ingestion_rate_mb: 10
  ingestion_burst_size_mb: 20
  max_streams_per_user: 10000
  max_entries_limit_per_query: 5000
  max_query_length: 721h
  retention_period: 30d
  allow_structured_metadata: true
  volume_enabled: true

Stage 6: Server & Auth

server:
  http_listen_port: 3100
  grpc_listen_port: 9096
  log_level: info

auth_enabled: true  # false for single-tenant

Stage 7: OTLP Ingestion (Loki 3.0+)

Native OpenTelemetry ingestion - use otlphttp exporter (NOT deprecated lokiexporter):

limits_config:
  allow_structured_metadata: true
  otlp_config:
    resource_attributes:
      attributes_config:
        - action: index_label  # Low-cardinality only!
          attributes: [service.name, service.namespace, deployment.environment]
        - action: structured_metadata  # High-cardinality
          attributes: [k8s.pod.name, service.instance.id]

Actions: index_label (searchable, low-cardinality), structured_metadata (queryable), drop

⚠️ NEVER use k8s.pod.name as index_label - use structured_metadata instead.

OTel Collector:

exporters:
  otlphttp:
    endpoint: http://loki:3100/otlp

Stage 8: Caching

chunk_store_config:
  chunk_cache_config:
    memcached_client:
      host: memcached-chunks
      timeout: 500ms

query_range:
  cache_results: true
  results_cache:
    cache:
      memcached_client:
        host: memcached-results

Stage 9: Advanced Features

Pattern Ingester (3.0+):

pattern_ingester:
  enabled: true

Bloom Filters (Experimental, 3.3+): Only for >75TB/month deployments. Works on structured metadata only. See examples/ for config.

Time Sharding (3.4+): For out-of-order ingestion:

limits_config:
  shard_streams:
    time_sharding_enabled: true

Thanos Storage (3.4+): New storage client, opt-in now, default later:

storage_config:
  use_thanos_objstore: true
  object_store:
    s3:
      bucket_name: my-bucket
      endpoint: s3.us-west-2.amazonaws.com

Stage 10: Ruler (Alerting)

ruler:
  storage:
    type: s3
    s3: { bucket_name: loki-ruler }
  alertmanager_url: http://alertmanager:9093
  enable_api: true
  enable_sharding: true

Stage 11: Loki 3.6 Features

  • Horizontally Scalable Compactor: horizontal_scaling_mode: main|worker
  • Policy-Based Enforced Labels: enforced_labels: [service.name]
  • FluentBit v4: structured_metadata parameter support

Stage 12: Validate Configuration (REQUIRED)

Always validate before deployment:

# Syntax and parameter validation
loki -config.file=loki-config.yaml -verify-config

# Print resolved configuration (shows defaults)
loki -config.file=loki-config.yaml -print-config-stderr 2>&1 | head -100

# Dry-run with Docker (if Loki not installed locally)
docker run --rm -v $(pwd)/loki-config.yaml:/etc/loki/config.yaml \
  grafana/loki:3.6.2 -config.file=/etc/loki/config.yaml -verify-config

Validation Checklist:

  • No syntax errors from -verify-config
  • Schema uses tsdb and v13
  • replication_factor: 3 for production
  • auth_enabled: true if multi-tenant
  • Storage credentials/IAM configured
  • Retention period matches requirements

Production Checklist

High Availability Requirements

Zone-Aware Replication (CRITICAL for production multi-AZ deployments):

When using replication_factor: 3, ALWAYS enable zone-awareness for multi-AZ deployments:

ingester:
  lifecycler:
    ring:
      replication_factor: 3
      zone_awareness_enabled: true  # CRITICAL for multi-AZ

# Set zone via environment variable or config
# Each pod should set its zone based on node topology
common:
  instance_availability_zone: ${AVAILABILITY_ZONE}

Why: Without zone-awareness, all 3 replicas may land in the same AZ. If that AZ fails, you lose data.

Kubernetes Implementation:

# In Helm values or pod spec
env:
  - name: AVAILABILITY_ZONE
    valueFrom:
      fieldRef:
        fieldPath: metadata.labels['topology.kubernetes.io/zone']

TLS Configuration (Production Required)

Enable TLS for all inter-component and client communication:

server:
  http_tls_config:
    cert_file: /etc/loki/tls/tls.crt
    key_file: /etc/loki/tls/tls.key
    client_ca_file: /etc/loki/tls/ca.crt  # For mTLS
  grpc_tls_config:
    cert_file: /etc/loki/tls/tls.crt
    key_file: /etc/loki/tls/tls.key
    client_ca_file: /etc/loki/tls/ca.crt

See examples/production-tls.yaml for complete TLS configuration.

Production Checklist Summary

Requirement Setting Required For
replication_factor: 3 common block All production
zone_awareness_enabled: true ingester.lifecycler.ring Multi-AZ
auth_enabled: true root level Multi-tenant
TLS enabled server block All production
IAM roles (not keys) storage config Cloud storage
Caching enabled chunk_store_config, query_range Performance
Pattern ingester pattern_ingester.enabled Observability
Retention configured compactor + limits_config Cost control

Monitoring Recommendations

Key Metrics to Monitor

Configure Prometheus to scrape Loki metrics and alert on these critical indicators:

# Prometheus scrape config
- job_name: 'loki'
  static_configs:
    - targets: ['loki:3100']

Critical Alerts

groups:
  - name: loki-critical
    rules:
      # Ingestion failures
      - alert: LokiIngestionFailures
        expr: sum(rate(loki_distributor_ingester_append_failures_total[5m])) > 0
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Loki ingestion failures detected"

      # High stream cardinality (performance killer)
      - alert: LokiHighStreamCardinality
        expr: loki_ingester_memory_streams > 100000
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "High stream cardinality - review labels"

      # Compaction not running (retention broken)
      - alert: LokiCompactionStalled
        expr: time() - loki_compactor_last_successful_run_timestamp_seconds > 7200
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Loki compaction stalled - retention not enforced"

      # Query latency
      - alert: LokiSlowQueries
        expr: histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket{route=~"loki_api_v1_query.*"}[5m])) by (le)) > 30
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Loki query P99 latency > 30s"

      # Ingester memory pressure
      - alert: LokiIngesterMemoryHigh
        expr: container_memory_usage_bytes{container="ingester"} / container_spec_memory_limit_bytes{container="ingester"} > 0.8
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Loki ingester memory usage > 80%"

Key Metrics Reference

Metric Description Action Threshold
loki_ingester_memory_streams Active streams in memory >100k: review cardinality
loki_distributor_ingester_append_failures_total Ingestion failures >0: investigate immediately
loki_request_duration_seconds Query latency P99 >30s: add caching/queriers
loki_ingester_chunks_flushed_total Chunk flush rate Low rate: check ingester health
loki_compactor_last_successful_run_timestamp_seconds Last compaction >2h ago: compaction broken

Grafana Dashboard

Import official Loki dashboards:

  • Dashboard ID: 13407 - Loki Logs
  • Dashboard ID: 14055 - Loki Operational

Log Collection with Grafana Alloy

Promtail is deprecated (support ends Feb 2026). Use Grafana Alloy for new deployments.

Basic Alloy Configuration

See examples/grafana-alloy.yaml for complete configuration.

// Kubernetes log discovery
discovery.kubernetes "pods" {
  role = "pod"
}

// Relabeling for Kubernetes metadata
discovery.relabel "pods" {
  targets = discovery.kubernetes.pods.targets

  rule {
    source_labels = ["__meta_kubernetes_namespace"]
    target_label  = "namespace"
  }
  rule {
    source_labels = ["__meta_kubernetes_pod_name"]
    target_label  = "pod"
  }
  rule {
    source_labels = ["__meta_kubernetes_pod_container_name"]
    target_label  = "container"
  }
}

// Log collection
loki.source.kubernetes "pods" {
  targets    = discovery.relabel.pods.output
  forward_to = [loki.write.default.receiver]
}

// Send to Loki
loki.write "default" {
  endpoint {
    url = "http://loki-gateway.loki.svc.cluster.local/loki/api/v1/push"

    // For multi-tenant
    tenant_id = "default"
  }
}

Migration from Promtail

# Convert Promtail config to Alloy
alloy convert --source-format=promtail --output=alloy-config.alloy promtail.yaml

Complete Examples

See examples/ directory for full configurations:

  • monolithic-filesystem.yaml - Development/testing
  • simple-scalable-s3.yaml - Production with S3
  • microservices-s3.yaml - Large-scale distributed
  • multi-tenant.yaml - Multi-tenant with per-tenant limits
  • production-tls.yaml - TLS-enabled production config
  • grafana-alloy.yaml - Log collection with Alloy
  • kubernetes-helm-values.yaml - Helm chart values

Minimal Monolithic:

auth_enabled: false
server:
  http_listen_port: 3100

common:
  path_prefix: /loki
  storage:
    filesystem:
      chunks_directory: /loki/chunks
      rules_directory: /loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

schema_config:
  configs:
    - from: 2025-01-01
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: loki_index_
        period: 24h

limits_config:
  retention_period: 30d
  allow_structured_metadata: true

compactor:
  working_directory: /loki/compactor
  retention_enabled: true

Helm Deployment

helm repo add grafana https://grafana.github.io/helm-charts
helm install loki grafana/loki -f values.yaml

Generate both native config and Helm values for Kubernetes deployments.

# values.yaml
deploymentMode: SimpleScalable

loki:
  schemaConfig:
    configs:
      - from: "2025-01-01"
        store: tsdb
        object_store: s3
        schema: v13
        index:
          prefix: loki_index_
          period: 24h
  limits_config:
    retention_period: 30d
    allow_structured_metadata: true
  # Zone awareness for HA
  ingester:
    lifecycler:
      ring:
        zone_awareness_enabled: true

backend:
  replicas: 3
  # Spread across zones
  topologySpreadConstraints:
    - maxSkew: 1
      topologyKey: topology.kubernetes.io/zone
      whenUnsatisfiable: DoNotSchedule
read:
  replicas: 3
write:
  replicas: 3

Best Practices

Performance:

  • chunk_encoding: snappy, chunk_target_size: 1572864
  • Enable caching (chunks, results)
  • parallelise_shardable_queries: true

Security:

  • auth_enabled: true with reverse proxy auth
  • IAM roles for cloud storage (never hardcode keys)
  • TLS for all communications (see Production Checklist)

Reliability:

  • replication_factor: 3 for production
  • zone_awareness_enabled: true for multi-AZ (see Production Checklist)
  • Persistent volumes for ingesters
  • Monitor ingestion rate and query latency (see Monitoring section)

Limits: Set ingestion_rate_mb, max_streams_per_user to prevent overload


Common Issues

Issue Solution
High ingester memory Reduce max_streams_per_user, lower chunk_idle_period
Slow queries Increase max_concurrent, enable parallelization, add caching
Ingestion failures Check ingestion_rate_mb, verify storage connectivity
Storage growing fast Enable retention, check compression, review cardinality
Data loss in AZ failure Enable zone_awareness_enabled: true
Config validation fails Run loki -verify-config, check YAML syntax

Deprecated (Migrate Away)

  • boltdb-shippertsdb
  • lokiexporterotlphttp
  • Promtail → Grafana Alloy (support ends Feb 2026)

Resources

scripts/generate_config.py - Generate configs programmatically (RECOMMENDED) examples/ - Complete configuration examples for all modes references/ - Full parameter reference and best practices

Related Skills

  • logql-generator - LogQL query generation
  • fluentbit-generator - Log collection to Loki
Weekly Installs
9
GitHub Stars
112
First Seen
12 days ago
Installed on
opencode9
gemini-cli8
claude-code8
github-copilot8
codex8
amp8