ai-taking-actions
Build AI That Takes Actions
Guide the user through building AI that reasons and takes actions — calling APIs, using tools, and completing multi-step tasks. Uses DSPy's ReAct and CodeAct agent modules.
Step 1: Understand the use case
Ask the user:
- What should the AI do? (answer questions, call APIs, perform calculations, search, etc.)
- What tools does it need? (calculator, search, database, APIs, file system, etc.)
- How many steps might it take? (simple tool call vs. multi-step reasoning)
Step 2: Define tools
Tools are Python functions with type hints and docstrings. DSPy uses these to tell the AI what's available:
def search(query: str) -> str:
"""Search the web for information."""
# Your search implementation
return "search results..."
def calculate(expression: str) -> float:
"""Evaluate a mathematical expression."""
return dspy.PythonInterpreter({}).execute(expression)
def lookup_database(table: str, query: str) -> str:
"""Query the database for records matching the query."""
# Your database logic
return "query results..."
Tool requirements:
- Type hints on all parameters and return type
- Docstring explaining what the tool does
- Return a string (or something that converts to string)
Step 3: Build the AI
ReAct (Reasoning + Acting) — start here
The standard choice. Alternates between thinking and acting:
import dspy
agent = dspy.ReAct(
"question -> answer",
tools=[search, calculate],
max_iters=5, # max steps before stopping
)
result = agent(question="What is the population of France divided by 3?")
print(result.answer)
CodeAct — for code-heavy tasks
For tasks where writing and executing code is more natural:
agent = dspy.CodeAct(
"question -> answer",
tools=[search, calculate],
max_iters=5,
)
result = agent(question="Calculate the compound interest on $1000 at 5% for 10 years")
print(result.answer)
Custom AI with state
class ResearchBot(dspy.Module):
def __init__(self):
self.agent = dspy.ReAct(
"question, context -> answer",
tools=[search, lookup_database],
max_iters=8,
)
def forward(self, question):
# Add initial context or pre-processing
context = "Use search for general questions, database for specific records."
return self.agent(question=question, context=context)
Step 4: Test the quality
def action_metric(example, prediction, trace=None):
# Check if the final answer is correct
return prediction.answer.strip().lower() == example.answer.strip().lower()
# For open-ended tasks, use an AI judge
class JudgeResult(dspy.Signature):
"""Judge if the AI's answer correctly addresses the question."""
question: str = dspy.InputField()
expected: str = dspy.InputField()
actual: str = dspy.InputField()
is_correct: bool = dspy.OutputField()
def judge_metric(example, prediction, trace=None):
judge = dspy.Predict(JudgeResult)
result = judge(
question=example.question,
expected=example.answer,
actual=prediction.answer,
)
return result.is_correct
Step 5: Improve accuracy
# Optimize the AI's reasoning prompts
optimizer = dspy.BootstrapFewShot(metric=action_metric, max_bootstrapped_demos=4)
optimized = optimizer.compile(agent, trainset=trainset)
For action-taking AI, MIPROv2 often works better since it can optimize the reasoning instructions:
optimizer = dspy.MIPROv2(metric=action_metric, auto="medium")
optimized = optimizer.compile(agent, trainset=trainset)
Using LangChain tools
LangChain has 100+ pre-built tools (search engines, Wikipedia, SQL databases, web scrapers, etc.). Convert any of them to DSPy tools with one line:
import dspy
from langchain_community.tools import DuckDuckGoSearchRun, WikipediaQueryRun
from langchain_community.utilities import WikipediaAPIWrapper
# Convert LangChain tools to DSPy tools
search = dspy.Tool.from_langchain(DuckDuckGoSearchRun())
wikipedia = dspy.Tool.from_langchain(WikipediaQueryRun(api_wrapper=WikipediaAPIWrapper()))
# Use in any DSPy agent
agent = dspy.ReAct(
"question -> answer",
tools=[search, wikipedia],
max_iters=5,
)
When to use LangChain tools vs writing your own:
| Use LangChain tools when... | Write your own when... |
|---|---|
| There's an existing tool for it (search, Wikipedia, SQL) | You need custom business logic |
| You want quick prototyping | You need tight error handling |
| The tool wraps a standard API | You're wrapping an internal API |
Install the tools you need:
pip install langchain-community # DuckDuckGo, Wikipedia, requests, etc.
For the full LangChain/LangGraph API reference, see docs/langchain-langgraph-reference.md.
Key patterns
- Start with ReAct — it's the most general-purpose action module
- Keep tools simple — each tool should do one thing well
- Set
max_itersto prevent infinite loops (default is usually fine) - Use descriptive docstrings — the AI uses them to decide when to call each tool
- Test without optimization first — action AI often works well zero-shot
- Add assertions for safety — use
dspy.Assertto prevent dangerous tool calls
Additional resources
- For worked examples (calculator, search, APIs), see examples.md
- Need multiple agents working together (not just one)? Use
/ai-coordinating-agents - Next:
/ai-improving-accuracyto measure and improve your AI - Install
/ai-doif you do not have it — it routes any AI problem to the right skill and is the fastest way to work:npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill ai-do
More from lebsral/dspy-programming-not-prompting-lms-skills
ai-switching-models
Switch AI providers or models without breaking things. Use when you want to switch from OpenAI to Anthropic, try a cheaper model, stop depending on one vendor, compare models side-by-side, a model update broke your outputs, you need vendor diversification, or you want to migrate to a local model. Also use when your prompt broke after a model update, prompts that work for GPT-4 do not work for Claude or Llama, or you need to do a model migration. Covers DSPy model portability with provider config, re-optimization, model comparison, and multi-model pipelines. Also used for migrate from OpenAI to Anthropic, GPT to Claude migration, try Llama instead of GPT, model comparison framework, multi-provider AI setup, avoid vendor lock-in for AI, prompts break when switching models, model-agnostic AI code.
55ai-stopping-hallucinations
Stop your AI from making things up. Use when your AI hallucinates, fabricates facts, is not grounded in real data, does not cite sources, makes unsupported claims, or you need to verify AI responses against source material. Also use when your LLM makes up facts, responses are disconnected from the input, or outputs are not grounded in source documents. Covers citation enforcement, faithfulness verification, grounding via retrieval, confidence thresholds, and evaluation of anti-hallucination quality. Also used for AI makes up citations, LLM fabricates data, ground AI in source documents, RAG but AI still hallucinates, force AI to cite sources, factual accuracy for AI, prevent AI from inventing facts, AI confident but wrong, LLM confabulation, hallucination detection, verify AI claims against documents.
48ai-do
Describe your AI problem and get routed to the right skill with a ready-to-use prompt. Use when you are not sure which ai- skill to use, want help picking the right approach, or just want to describe what you need in plain language. Also use this when someone says I want to build an AI that..., how do I make my AI..., or describes any AI/LLM task without naming a specific skill, I need AI but do not know where to start, which AI pattern should I use, what is the best way to add AI to my app, recommend an AI approach, AI feature discovery, too many AI options, overwhelmed by AI frameworks, just tell me what to build, new to DSPy, beginner AI project help, which LLM pattern fits my use case, confused about AI architecture, help me figure out my AI approach.
21ai-improving-accuracy
Measure and improve how well your AI works. Use when AI gives wrong answers, accuracy is bad, responses are unreliable, you need to test AI quality, evaluate your AI, write metrics, benchmark performance, optimize prompts, improve results, or systematically make your AI better. Also use when you spent hours tweaking prompts, trial and error prompt engineering is not working, quality plateaued early, or you have stale prompts everywhere in your codebase. Covers DSPy evaluation, metrics, and optimization., my AI is only 60% accurate, how to measure AI quality, AI evaluation framework, benchmark my LLM, prompt optimization not working, systematic way to improve AI, AI accuracy plateaued, DSPy optimizer tutorial, MIPROv2 optimization, how to go from 70% to 90% accuracy.
20ai-reasoning
Make AI solve hard problems that need planning and multi-step thinking. Use when your AI fails on complex questions, needs to break down problems, requires multi-step logic, needs to plan before acting, gives wrong answers on math or analysis tasks, or when a simple prompt is not enough for the reasoning required. Covers ChainOfThought, ProgramOfThought, MultiChainComparison, and Self-Discovery reasoning patterns in DSPy., AI gives shallow answers, LLM does not think before answering, chain of thought prompting, make AI show its work, AI fails at math, complex analysis with LLM, multi-step problem solving, AI reasoning errors, LLM logic mistakes, think step by step DSPy, AI cannot do basic arithmetic, deep reasoning with language models, self-consistency for better answers, tree of thought.
20ai-sorting
Auto-sort, categorize, or label content using AI. Use when sorting tickets into categories, auto-tagging content, labeling emails, detecting sentiment, routing messages to the right team, triaging support requests, building a spam filter, intent detection, topic classification, or any task where text goes in and a category comes out. Also use when classification accuracy varies between runs or semantically close categories get confused., auto-categorize support tickets, AI labeling system, text classification with LLM, auto-tag content, email routing with AI, intent classification, sentiment analysis with DSPy, spam detection with AI, topic modeling with LLM, build a classifier without training data, zero-shot classification, AI triage system.
19