langfuse

SKILL.md

Langfuse Observability

Langfuse is the LLM observability layer for joelclaw. Every LLM call produces a Langfuse trace with nested hierarchy, I/O, usage, cost, and attribution.

Architecture

joelclaw has two Langfuse integration points:

1. Pi-session extension (langfuse-cost)

  • Source: pi/extensions/langfuse-cost/index.ts (canonical, git-tracked in this repo)
  • Runtime: loaded as a pi extension from the same source tree
  • What it traces: Every gateway + interactive pi session LLM call
  • How: Hooks into pi session events (session_start, message_start, message_end, tool_call, tool_result, session_shutdown)
  • Dedup: globalThis.__langfuse_cost_loaded__ guard prevents duplicate extension instances
  • Optional dependency behavior: langfuse is lazily loaded (no top-level hard import). Missing module must disable telemetry, not crash extension import. Regression test: pi/extensions/langfuse-cost/index.test.ts
  • Runtime dependency location: because the extension is loaded from pi/extensions/ at repo root instead of a workspace package, the langfuse npm package must be available from the repo root package.json. If root install drift drops it, gateway/session telemetry silently degrades to the optional-dependency warning again.

2. System-bus OTEL bridge (langfuse.ts)

  • Source: packages/system-bus/src/lib/langfuse.ts
  • What it traces: All Inngest function LLM calls (reflect, triage, email cleanup, docs ingest)
  • How: @langfuse/otel LangfuseSpanProcessor + @langfuse/tracing startObservation()
  • Produces: joelclaw.inference traces with generation children

Current Trace Hierarchy (pi-session)

The langfuse-cost extension produces a 4-level nested span hierarchy:

joelclaw.session (trace)
  └── session (span) — entire session lifetime
        └── turn-1 (span) — user message → final assistant response
        │     ├── tool:bash (span) — individual tool execution
        │     ├── tool:read (span)
        │     └── llm.call (generation) — the LLM API call with usage/cost
        └── turn-2 (span)
              ├── tool:edit (span)
              ├── tool:bash (span)
              └── llm.call (generation)

What each level captures

Level Created on Ended on Contains
joelclaw.session trace session_start session_shutdown userId, sessionId, tags, turn count
session span session_start session_shutdown Channel, session type, turn count
turn-N span message_start[user] message_end[assistant] with text output User input (clean), sourceChannel metadata
tool:name span tool_call event tool_result event Tool input, output (truncated 500 chars)
llm.call generation message_end[assistant] immediate Model, usage, cache tokens, cost, I/O

Channel header stripping

User messages from Telegram arrive with a ---\nChannel:...\n--- header. The extension:

  1. Strips the header from trace input (clean user text only)
  2. Parses known keys (channel, date, platform_capabilities) into sourceChannel metadata
  3. Skips multi-line values (e.g. formatting_guide)

Credentials

Langfuse creds in agent-secrets:

  • langfuse_public_keypk-lf-cb8b...
  • langfuse_secret_keysk-lf-c86f...
  • langfuse_base_urlhttps://us.cloud.langfuse.com

Gateway gets them via gateway-start.sh env exports. System-bus resolves via env → secrets lease fallback.

Trace Conventions

Naming

  • Pi-session: joelclaw.session (trace) → sessionturn-Ntool:namellm.call
  • System-bus: joelclaw.inference (trace) → generation children

Required Attributes

Every trace MUST have:

  • userId: "joel"
  • sessionId — pi session ID for grouping
  • tags — minimum: ["joelclaw", "pi-session"]
  • Dynamic tags: provider:anthropic, model:anthropic/claude-opus-4-6, channel:central, session:central

Metadata Shape (flat, filterable)

{
  channel: "central",           // GATEWAY_ROLE env
  sessionType: "central",       // "gateway" | "interactive" | "codex" | "central"
  component: "pi-session",
  model: "anthropic/claude-opus-4-6",
  provider: "anthropic",
  stopReason: "toolUse",        // or "endTurn"
  turnCount: 5,                 // Updated on each turn
  sourceChannel: {              // Only on first user message per turn
    channel: "telegram",
    date: "...",
    platform_capabilities: "..."
  },
  tools: ["bash", "read"],      // Tool names used this turn
}

Generation usageDetails

{
  input: 1,                      // Non-cached input tokens
  output: 97,                    // Output tokens
  total: 68195,                  // Total tokens
  cache_read_input_tokens: 67877, // 90% discount
  cache_write_input_tokens: 220,  // 25% premium (NOT priced by Langfuse — known gap)
}

Pi session guardrails (alert-only)

Long-running pi sessions can dominate Langfuse spend. The extension now tracks per-session totals and emits warnings only on first threshold breach per guardrail type:

  • JOELCLAW_LANGFUSE_ALERT_MAX_LLM_CALLS (default: 120)
  • JOELCLAW_LANGFUSE_ALERT_MAX_TOTAL_TOKENS (default: 1200000)
  • JOELCLAW_LANGFUSE_ALERT_MAX_COST_USD (default: 20)

Behavior:

  • no automatic model switch
  • no forced compaction
  • no stop/interruption
  • emits console.warn(...) with session ID + current counters
  • records breach flags and first breach turn index in trace metadata (guardrails)

Model/provider normalization

Both the pi-session extension and system-bus Langfuse bridge normalize provider/model before writing tags, trace metadata, and generation model fields. This keeps provider:* + model:* tags aligned with metadata after model switches and for provider-prefixed IDs such as:

  • anthropic/claude-opus-4-6
  • openai-codex/gpt-5.4

Normalization is fail-open: tracing continues even if normalization cannot resolve a value.

Output-contract + usage-coverage signals (2026-03-02)

System-bus inference now emits explicit coverage/output-contract metadata so low-yield calls are queryable:

  • usageCoverage: "present"|"missing"
  • usageCaptured: boolean
  • jsonRequested, jsonParsed, outputChars
  • warning OTEL event: model_router.usage_missing

For strict machine-readable paths, callers can require output contracts:

  • requireJson: true — parse failure becomes inference failure
  • requireTextOutput: true — empty text becomes inference failure

Recall rewrite traces now include rewriteReason in addition to strategy (disabled|skipped|haiku|openai|fallback) to separate deliberate skips from failure fallbacks.

Known Gaps

Issue Severity Notes
cache_write_input_tokens not priced Medium Langfuse platform limitation — no cache write rate in their pricing table
No completionStartTime on first turn Low lastAssistantStartTime not set before first message_start[assistant]
tool_result matching Low Relies on toolCallId — if pi changes the field name, spans won't close

Debugging

Check recent traces

LF_PK=$(secrets lease langfuse_public_key --ttl 5m)
LF_SK=$(secrets lease langfuse_secret_key --ttl 5m)
curl -s -u "$LF_PK:$LF_SK" "https://us.cloud.langfuse.com/api/public/traces?limit=5" \
  | jq '[.data[] | {name, ts: .timestamp[:19], obs: (.observations | length), output: (.output // "" | tostring | .[0:60])}]'

Check nested observations on a trace

TRACE_ID="<id>"
curl -s -u "$LF_PK:$LF_SK" "https://us.cloud.langfuse.com/api/public/observations?traceId=$TRACE_ID" \
  | jq '[.data[] | {name, type, model, startTime: .startTime[:19], endTime: .endTime[:19]}]'

Common Issues

Symptom Cause Fix
Double traces Extension loaded twice via symlink/realpath split globalThis dedup guard (already fixed)
[toolUse] output instead of tool names tool_call events not firing Check pi version, verify toolName field on event
No traces at all Langfuse creds missing Check LANGFUSE_PUBLIC_KEY/LANGFUSE_SECRET_KEY env
channel:interactive on gateway GATEWAY_ROLE not set Must be in gateway-start.sh
Stale extension code Gateway/interactive session not reloaded after change Restart gateway and start a fresh interactive session
OTEL emit errors in gateway system-bus-worker port-forward down kubectl port-forward -n joelclaw svc/system-bus-worker 3111:3111

Key Files

  • Pi extension: pi/extensions/langfuse-cost/index.ts
  • Pi extension tests: pi/extensions/langfuse-cost/index.test.ts
  • System-bus bridge: packages/system-bus/src/lib/langfuse.ts
  • Gateway ops notes: docs/gateway.md

Deployment Workflow

After editing the pi extension:

  1. Commit changes in this repo (source of truth).
  2. Restart gateway so the updated extension is loaded.
  3. Start a new interactive pi session (or reload) so per-session tracing uses the new code.

ADRs

  • ADR-0146: Inference Cost Monitoring and Control — shipped
  • ADR-0147: Named Agent Profiles (trace attribution by role)
Weekly Installs
23
GitHub Stars
49
First Seen
Feb 27, 2026
Installed on
gemini-cli23
opencode23
codebuddy23
github-copilot23
codex23
kimi-cli23