langfuse
Langfuse Observability
Langfuse is the LLM observability layer for joelclaw. Every LLM call produces a Langfuse trace with nested hierarchy, I/O, usage, cost, and attribution.
Architecture
joelclaw has two Langfuse integration points:
1. Pi-session extension (langfuse-cost)
- Source:
pi/extensions/langfuse-cost/index.ts(canonical, git-tracked in this repo) - Runtime: loaded as a pi extension from the same source tree
- What it traces: Every gateway + interactive pi session LLM call
- How: Hooks into pi session events (
session_start,message_start,message_end,tool_call,tool_result,session_shutdown) - Dedup:
globalThis.__langfuse_cost_loaded__guard prevents duplicate extension instances - Optional dependency behavior:
langfuseis lazily loaded (no top-level hard import). Missing module must disable telemetry, not crash extension import. Regression test:pi/extensions/langfuse-cost/index.test.ts - Runtime dependency location: because the extension is loaded from
pi/extensions/at repo root instead of a workspace package, thelangfusenpm package must be available from the repo rootpackage.json. If root install drift drops it, gateway/session telemetry silently degrades to the optional-dependency warning again.
2. System-bus OTEL bridge (langfuse.ts)
- Source:
packages/system-bus/src/lib/langfuse.ts - What it traces: All Inngest function LLM calls (reflect, triage, email cleanup, docs ingest)
- How:
@langfuse/otelLangfuseSpanProcessor+@langfuse/tracingstartObservation() - Produces:
joelclaw.inferencetraces with generation children
Current Trace Hierarchy (pi-session)
The langfuse-cost extension produces a 4-level nested span hierarchy:
joelclaw.session (trace)
└── session (span) — entire session lifetime
└── turn-1 (span) — user message → final assistant response
│ ├── tool:bash (span) — individual tool execution
│ ├── tool:read (span)
│ └── llm.call (generation) — the LLM API call with usage/cost
└── turn-2 (span)
├── tool:edit (span)
├── tool:bash (span)
└── llm.call (generation)
What each level captures
| Level | Created on | Ended on | Contains |
|---|---|---|---|
joelclaw.session trace |
session_start |
session_shutdown |
userId, sessionId, tags, turn count |
session span |
session_start |
session_shutdown |
Channel, session type, turn count |
turn-N span |
message_start[user] |
message_end[assistant] with text output |
User input (clean), sourceChannel metadata |
tool:name span |
tool_call event |
tool_result event |
Tool input, output (truncated 500 chars) |
llm.call generation |
message_end[assistant] |
immediate | Model, usage, cache tokens, cost, I/O |
Channel header stripping
User messages from Telegram arrive with a ---\nChannel:...\n--- header. The extension:
- Strips the header from trace
input(clean user text only) - Parses known keys (
channel,date,platform_capabilities) intosourceChannelmetadata - Skips multi-line values (e.g.
formatting_guide)
Credentials
Langfuse creds in agent-secrets:
langfuse_public_key—pk-lf-cb8b...langfuse_secret_key—sk-lf-c86f...langfuse_base_url—https://us.cloud.langfuse.com
Gateway gets them via gateway-start.sh env exports. System-bus resolves via env → secrets lease fallback.
Trace Conventions
Naming
- Pi-session:
joelclaw.session(trace) →session→turn-N→tool:name→llm.call - System-bus:
joelclaw.inference(trace) → generation children
Required Attributes
Every trace MUST have:
userId: "joel"sessionId— pi session ID for groupingtags— minimum:["joelclaw", "pi-session"]- Dynamic tags:
provider:anthropic,model:anthropic/claude-opus-4-6,channel:central,session:central
Metadata Shape (flat, filterable)
{
channel: "central", // GATEWAY_ROLE env
sessionType: "central", // "gateway" | "interactive" | "codex" | "central"
component: "pi-session",
model: "anthropic/claude-opus-4-6",
provider: "anthropic",
stopReason: "toolUse", // or "endTurn"
turnCount: 5, // Updated on each turn
sourceChannel: { // Only on first user message per turn
channel: "telegram",
date: "...",
platform_capabilities: "..."
},
tools: ["bash", "read"], // Tool names used this turn
}
Generation usageDetails
{
input: 1, // Non-cached input tokens
output: 97, // Output tokens
total: 68195, // Total tokens
cache_read_input_tokens: 67877, // 90% discount
cache_write_input_tokens: 220, // 25% premium (NOT priced by Langfuse — known gap)
}
Pi session guardrails (alert-only)
Long-running pi sessions can dominate Langfuse spend. The extension now tracks per-session totals and emits warnings only on first threshold breach per guardrail type:
JOELCLAW_LANGFUSE_ALERT_MAX_LLM_CALLS(default:120)JOELCLAW_LANGFUSE_ALERT_MAX_TOTAL_TOKENS(default:1200000)JOELCLAW_LANGFUSE_ALERT_MAX_COST_USD(default:20)
Behavior:
- no automatic model switch
- no forced compaction
- no stop/interruption
- emits
console.warn(...)with session ID + current counters - records breach flags and first breach turn index in trace metadata (
guardrails)
Model/provider normalization
Both the pi-session extension and system-bus Langfuse bridge normalize provider/model before writing tags, trace metadata, and generation model fields. This keeps provider:* + model:* tags aligned with metadata after model switches and for provider-prefixed IDs such as:
anthropic/claude-opus-4-6openai-codex/gpt-5.4
Normalization is fail-open: tracing continues even if normalization cannot resolve a value.
Output-contract + usage-coverage signals (2026-03-02)
System-bus inference now emits explicit coverage/output-contract metadata so low-yield calls are queryable:
usageCoverage: "present"|"missing"usageCaptured: booleanjsonRequested,jsonParsed,outputChars- warning OTEL event:
model_router.usage_missing
For strict machine-readable paths, callers can require output contracts:
requireJson: true— parse failure becomes inference failurerequireTextOutput: true— empty text becomes inference failure
Recall rewrite traces now include rewriteReason in addition to strategy (disabled|skipped|haiku|openai|fallback) to separate deliberate skips from failure fallbacks.
Known Gaps
| Issue | Severity | Notes |
|---|---|---|
cache_write_input_tokens not priced |
Medium | Langfuse platform limitation — no cache write rate in their pricing table |
No completionStartTime on first turn |
Low | lastAssistantStartTime not set before first message_start[assistant] |
tool_result matching |
Low | Relies on toolCallId — if pi changes the field name, spans won't close |
Debugging
Check recent traces
LF_PK=$(secrets lease langfuse_public_key --ttl 5m)
LF_SK=$(secrets lease langfuse_secret_key --ttl 5m)
curl -s -u "$LF_PK:$LF_SK" "https://us.cloud.langfuse.com/api/public/traces?limit=5" \
| jq '[.data[] | {name, ts: .timestamp[:19], obs: (.observations | length), output: (.output // "" | tostring | .[0:60])}]'
Check nested observations on a trace
TRACE_ID="<id>"
curl -s -u "$LF_PK:$LF_SK" "https://us.cloud.langfuse.com/api/public/observations?traceId=$TRACE_ID" \
| jq '[.data[] | {name, type, model, startTime: .startTime[:19], endTime: .endTime[:19]}]'
Common Issues
| Symptom | Cause | Fix |
|---|---|---|
| Double traces | Extension loaded twice via symlink/realpath split | globalThis dedup guard (already fixed) |
[toolUse] output instead of tool names |
tool_call events not firing |
Check pi version, verify toolName field on event |
| No traces at all | Langfuse creds missing | Check LANGFUSE_PUBLIC_KEY/LANGFUSE_SECRET_KEY env |
channel:interactive on gateway |
GATEWAY_ROLE not set |
Must be in gateway-start.sh |
| Stale extension code | Gateway/interactive session not reloaded after change | Restart gateway and start a fresh interactive session |
| OTEL emit errors in gateway | system-bus-worker port-forward down | kubectl port-forward -n joelclaw svc/system-bus-worker 3111:3111 |
Key Files
- Pi extension:
pi/extensions/langfuse-cost/index.ts - Pi extension tests:
pi/extensions/langfuse-cost/index.test.ts - System-bus bridge:
packages/system-bus/src/lib/langfuse.ts - Gateway ops notes:
docs/gateway.md
Deployment Workflow
After editing the pi extension:
- Commit changes in this repo (source of truth).
- Restart gateway so the updated extension is loaded.
- Start a new interactive pi session (or reload) so per-session tracing uses the new code.
ADRs
- ADR-0146: Inference Cost Monitoring and Control —
shipped - ADR-0147: Named Agent Profiles (trace attribution by role)
More from joelhooks/joelclaw
cli-design
Design and build agent-first CLIs with HATEOAS JSON responses, context-protecting output, and self-documenting command trees. Use when creating new CLI tools, adding commands to existing CLIs (joelclaw, slog), or reviewing CLI design for agent-friendliness. Triggers on 'build a CLI', 'add a command', 'CLI design', 'agent-friendly output', or any task involving command-line tool creation.
129k8s
>-
88docker-sandbox
Create, manage, and execute agent tools (claude, codex) inside Docker sandboxes for isolated code execution. Use when running agent loops, spawning tool subprocesses, or any task requiring process isolation. Triggers on "sandbox", "isolated execution", "docker sandbox", "safe agent execution", or when working on agent loop infrastructure.
86joel-writing-style
Joel's writing voice and style guide for joelclaw.com content. Use when writing, editing, or reviewing any blog post, essay, book chapter, or prose content for joelclaw.com. Also use when asked to 'write like Joel,' 'match Joel's voice,' 'draft a post,' 'write content for the blog,' or 'review this for voice.' This skill captures Joel's specific writing patterns derived from ~90,000 words of published content spanning 2012–2026. Cross-reference with copy-editing and copywriting skills for marketing-specific copy.
81task-management
Manage Joel's task system in Todoist. Triggers on: 'add a task', 'create a todo', 'what's on my list', 'today's tasks', 'what do I need to do', 'remind me to', 'inbox', 'complete', 'mark done', 'weekly review', 'groom tasks', 'what's next', or when actionable items emerge from other work. Also triggers when Joel mentions something he needs to do in passing — capture it.
54skill-review
Audit and maintain the joelclaw skill inventory. Use when checking skill health, fixing broken symlinks, finding stale skills, or running the skill garden. Triggers: 'skill audit', 'check skills', 'stale skills', 'skill health', 'skill garden', 'broken skill', 'skill review', 'fix skills', 'garden skills', or any task involving skill inventory maintenance.
49