langfuse
SKILL.md
Langfuse — Open-Source LLM Observability
You are an expert in Langfuse, the open-source LLM engineering platform. You help developers trace LLM calls, evaluate output quality, manage prompts, track costs and latency, run experiments, and build evaluation datasets — providing full observability into AI applications from development through production.
Core Capabilities
Tracing
# Decorator-based tracing (Python)
from langfuse.decorators import observe, langfuse_context
@observe()
def answer_question(question: str, context: str) -> str:
"""Trace the entire RAG pipeline as a single trace."""
# Step 1: Retrieve relevant docs
docs = retrieve_docs(question)
# Step 2: Generate answer
answer = generate_answer(question, docs)
# Add metadata to trace
langfuse_context.update_current_trace(
user_id="user-42",
session_id="session-abc",
tags=["production", "rag"],
metadata={"model": "gpt-4o", "doc_count": len(docs)},
)
return answer
@observe()
def retrieve_docs(question: str) -> list[str]:
"""Traced as a span within the parent trace."""
embeddings = openai.embeddings.create(model="text-embedding-3-small", input=question)
results = vector_store.search(embeddings.data[0].embedding, top_k=5)
return [r.text for r in results]
@observe(as_type="generation")
def generate_answer(question: str, docs: list[str]) -> str:
"""Traced as an LLM generation with token usage and cost."""
response = openai.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": f"Context:\n{chr(10).join(docs)}"},
{"role": "user", "content": question},
],
)
return response.choices[0].message.content
// TypeScript / Vercel AI SDK integration
import { Langfuse } from "langfuse";
import { AISDKExporter } from "langfuse-vercel";
const langfuse = new Langfuse();
// Wrap AI SDK calls
const result = await generateText({
model: openai("gpt-4o"),
prompt: question,
experimental_telemetry: {
isEnabled: true,
functionId: "answer-question",
metadata: { userId: "user-42" },
},
});
Prompt Management
from langfuse import Langfuse
langfuse = Langfuse()
# Fetch versioned prompt from Langfuse
prompt = langfuse.get_prompt("rag-system-prompt", version=3)
# Use in generation
response = openai.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": prompt.compile(company_name="Acme", tone="professional")},
{"role": "user", "content": user_question},
],
)
# Prompts managed via Langfuse UI — non-engineers can edit, version, A/B test
Evaluation
# Score traces for quality
langfuse.score(
trace_id=trace.id,
name="relevance",
value=0.9, # 0-1 scale
comment="Answer directly addressed the question",
)
# LLM-as-judge evaluation
langfuse.score(
trace_id=trace.id,
name="hallucination",
value=0.0, # 0 = no hallucination
data_type="NUMERIC",
)
# Create evaluation datasets
dataset = langfuse.create_dataset("rag-eval-v1")
for item in test_cases:
langfuse.create_dataset_item(
dataset_name="rag-eval-v1",
input={"question": item["question"]},
expected_output=item["expected_answer"],
)
Installation
pip install langfuse # Python
npm install langfuse # TypeScript
# Self-hosted: docker-compose up (Langfuse is open-source)
Best Practices
- Trace everything — Wrap all LLM calls with tracing; understand latency, cost, and quality per request
- Structured traces — Use nested spans (retrieve → generate → format); identify bottlenecks in pipeline
- Cost tracking — Langfuse auto-calculates token costs per model; track spending by user, feature, prompt version
- Prompt versioning — Manage prompts in Langfuse UI; A/B test versions, rollback safely
- Evaluation datasets — Create test sets from production traces; run regression tests on prompt changes
- LLM-as-judge — Use automated scoring for hallucination, relevance, helpfulness at scale
- Session tracking — Group traces by session for conversational AI; see full conversation flow
- Self-hosted — Deploy with Docker for data sovereignty; same features as cloud, your infrastructure
Weekly Installs
1
Repository
terminalskills/skillsGitHub Stars
15
First Seen
3 days ago
Security Audits
Installed on
amp1
cline1
augment1
opencode1
cursor1
kimi-cli1