rlm

SKILL.md

RLM CLI

Recursive Language Models (RLM) CLI - enables LLMs to handle near-infinite context by recursively decomposing inputs and calling themselves over parts. Supports files, directories, URLs, and stdin.

Installation

pip install rlm-cli    # or: pipx install rlm-cli
uvx rlm-cli ask ...    # run without installing

Set an API key for your backend (openrouter is default):

export OPENROUTER_API_KEY=...  # default backend
export OPENAI_API_KEY=...      # for --backend openai
export ANTHROPIC_API_KEY=...   # for --backend anthropic

Commands

ask - Query with context

rlm ask <inputs> -q "question"

Inputs (combinable):

Type Example Notes
Directory rlm ask . -q "..." Recursive, respects .gitignore
File rlm ask main.py -q "..." Single file
URL rlm ask https://x.com -q "..." Auto-converts to markdown
stdin git diff | rlm ask - -q "..." - reads from pipe
Literal rlm ask "text" -q "..." --literal Treat as raw text
Multiple rlm ask a.py b.py -q "..." Combine any types

Options:

Flag Description
-q "..." Question/prompt (required)
--backend Provider: openrouter (default), openai, anthropic
--model NAME Model override (format: provider/model or just model)
--json Machine-readable output
--output-format Output format: text, json, or json-tree
--summary Show execution summary with depth statistics
--extensions .py .ts Filter by extension
--include/--exclude Glob patterns
--max-iterations N Limit REPL iterations (default: 30)
--max-depth N Recursive RLM depth (default: 1 = no recursion)
--max-budget N.NN Spending limit in USD (requires OpenRouter)
--max-timeout N Time limit in seconds
--max-tokens N Total token limit (input + output)
--max-errors N Consecutive error limit before stopping
--no-index Skip auto-indexing
--exa Enable Exa web search (requires EXA_API_KEY)
--inject-file FILE Execute Python code between iterations

JSON output structure:

{"ok": true, "exit_code": 0, "result": {"response": "..."}, "stats": {...}}

JSON-tree output (--output-format=json-tree): Adds execution tree showing nested RLM calls:

{
  "result": {
    "response": "...",
    "tree": {
      "depth": 0,
      "model": "openai/gpt-4",
      "duration": 2.3,
      "cost": 0.05,
      "iterations": [...],
      "children": [...]
    }
  }
}

Summary output (--summary): Shows depth-wise statistics after completion:

  • JSON mode: adds summary field to stats
  • Text mode: prints summary to stderr
=== RLM Execution Summary ===
Total depth: 2 | Nodes: 3 | Cost: $0.0054 | Duration: 17.38s
Depth 0: 1 call(s) ($0.0047, 13.94s)
Depth 1: 2 call(s) ($0.0007, 3.44s)

complete - Query without context

rlm complete "prompt text"
rlm complete "Generate SQL" --json --backend openai

search - Search indexed files

rlm search "query" [options]
Flag Description
--limit N Max results (default: 20)
--language python Filter by language
--paths-only Output file paths only
--json JSON output

Auto-indexes on first use. Manual index: rlm index .

index - Build search index

rlm index .              # Index current dir
rlm index ./src --force  # Force full reindex

doctor - Check setup

rlm doctor       # Check config, API keys, deps
rlm doctor --json

Workflows

Git diff review:

git diff | rlm ask - -q "Review for bugs"
git diff --cached | rlm ask - -q "Ready to commit?"
git diff HEAD~3 | rlm ask - -q "Summarize changes"

Codebase analysis:

rlm ask . -q "Explain architecture"
rlm ask src/ -q "How does auth work?" --extensions .py

Search + analyze:

rlm search "database" --paths-only
rlm ask src/db.py -q "How is connection pooling done?"

Compare files:

rlm ask old.py new.py -q "What changed?"

Configuration

Precedence: CLI flags > env vars > config file > defaults

Config locations: ./rlm.yaml, ./.rlm.yaml, ~/.config/rlm/config.yaml

backend: openrouter
model: google/gemini-3-flash-preview
max_iterations: 30

Environment variables:

  • RLM_BACKEND - Default backend
  • RLM_MODEL - Default model
  • RLM_CONFIG - Config file path
  • RLM_JSON=1 - Always output JSON

Recursion and Budget Limits

Recursive RLM (--max-depth)

Enable recursive llm_query() calls where child RLMs process sub-tasks:

# 2 levels of recursion
rlm ask . -q "Research thoroughly" --max-depth 2

# With budget cap
rlm ask . -q "Analyze codebase" --max-depth 3 --max-budget 0.50

Budget Control (--max-budget)

Limit spending per completion. Raises BudgetExceededError when exceeded:

# Cap at $1.00
rlm complete "Complex task" --max-budget 1.00

# Very low budget (will likely exceed)
rlm ask . -q "Analyze everything" --max-budget 0.001

Requirements: OpenRouter backend (returns cost data in responses).

Other Limits

Timeout (--max-timeout) - Stop after N seconds:

rlm complete "Complex task" --max-timeout 30

Token limit (--max-tokens) - Stop after N total tokens:

rlm ask . -q "Analyze" --max-tokens 10000

Error threshold (--max-errors) - Stop after N consecutive code errors:

rlm complete "Write code" --max-errors 3

Stop Conditions

RLM execution stops when any of these occur:

  1. Final answer - LLM calls FINAL_VAR("variable_name") with the NAME of a variable (as a string)
  2. Max iterations - Exceeds --max-iterations (exit code 0, graceful - forces final answer)

FINAL_VAR usage (common mistake - pass variable NAME, not value):

# CORRECT:
result = {"answer": "hello", "score": 42}
FINAL_VAR("result")  # pass the variable NAME as a string

# WRONG:
FINAL_VAR(result)  # passing the dict directly causes AttributeError
  1. Max budget exceeded - Spending > --max-budget (exit code 20, error)
  2. Max timeout exceeded - Time > --max-timeout (exit code 20, error with partial answer)
  3. Max tokens exceeded - Tokens > --max-tokens (exit code 20, error with partial answer)
  4. Max errors exceeded - Consecutive errors > --max-errors (exit code 20, error with partial answer)
  5. User cancellation - Ctrl+C or SIGUSR1 (exit code 0, returns partial answer as success)
  6. Max depth reached - Child RLM at depth 0 cannot recurse further

Note on max iterations: This is a soft limit. When exceeded, RLM prompts the LLM one more time to provide a final answer. Modern LLMs typically complete in 1-2 iterations.

Partial answers: When timeout, tokens, or errors stop execution, the error includes partial_answer if any response was generated before stopping.

Early exit (Ctrl+C): Pressing Ctrl+C (or sending SIGUSR1) returns the partial answer as success (exit code 0) with early_exit: true in the result.

Inject File (--inject-file)

Update REPL variables mid-run by modifying an inject file:

# Create inject file
echo 'focus = "authentication"' > inject.py

# Run with inject file
rlm ask . -q "Analyze based on 'focus'" --inject-file inject.py

# In another terminal, update mid-run
echo 'focus = "authorization"' > inject.py

The file is checked before each iteration and executed if modified.

Exit Codes

Code Meaning
0 Success
2 CLI usage error
10 Input error (file not found)
11 Config error (missing API key)
20 Backend/API error (includes budget exceeded)
30 Runtime error
40 Index/search error

LLM Search Tools

When rlm ask runs on a directory, the LLM gets search tools:

Tool Cost Privacy Use For
rg.search() Free Local Exact patterns, function names, imports
tv.search() Free Local Topics, concepts, related files
exa.search() $ API Web search (requires --exa flag)
pi.* $$$ API Hierarchical PDF/document navigation

Free Local Tools (auto-loaded)

  • rg.search(pattern, paths, globs) - ripgrep for exact patterns
  • tv.search(query, limit) - Tantivy BM25 for concepts

Exa Web Search (--exa flag, Costs Money)

⚠️ Opt-in: Requires --exa flag and EXA_API_KEY environment variable.

Setup:

export EXA_API_KEY=...  # Get from https://exa.ai

Usage in REPL:

from rlm_cli.tools_search import exa, web

# Basic search
results = exa.search(query="Python async patterns", limit=5)
for r in results:
    print(f"{r['title']}: {r['url']}")

# With highlights (relevant excerpts)
results = exa.search(
    query="error handling best practices",
    limit=3,
    include_highlights=True
)

# Semantic alias
results = web(query="machine learning tutorial", limit=5)

# Find similar pages
results = exa.find_similar(url="https://example.com/article", limit=5)

exa.search() parameters:

Param Default Description
query required Search query
limit 10 Max results
search_type "auto" "auto", "neural", or "keyword"
include_domains None Only these domains
exclude_domains None Exclude these domains
include_text False Include full page text
include_highlights True Include relevant excerpts
category None "company", "research paper", "news", etc.

When to use exa.search() / web():

  • Finding external documentation, tutorials, articles
  • Researching topics beyond the local codebase
  • Finding similar pages to a reference URL

PageIndex (pi.* - Opt-in, Costs Money)

⚠️ WARNING: PageIndex sends document content to LLM APIs and costs money.

Only use when:

  1. User explicitly requests document/PDF analysis
  2. Document has hierarchical structure (reports, manuals)
  3. User accepts cost/privacy tradeoffs

Prerequisites:

  • OPENROUTER_API_KEY (or other backend key) must be set in environment
  • PageIndex submodule must be initialized
  • Run within rlm-cli's virtual environment (has required dependencies)

Setup (REQUIRED before any pi. operation):*

import sys
sys.path.insert(0, "/path/to/rlm-cli/rlm")        # rlm submodule
sys.path.insert(0, "/path/to/rlm-cli/pageindex")  # pageindex submodule

from rlm.clients import get_client
from rlm_cli.tools_pageindex import pi

# Configure with existing rlm backend
client = get_client(backend="openrouter", backend_kwargs={"model_name": "google/gemini-2.0-flash-001"})
pi.configure(client)

Indexing (costs $$$):

# Build tree index - THIS COSTS MONEY (no caching, re-indexes each call)
tree = pi.index(path="report.pdf")
# Returns: PITree object with doc_name, nodes, doc_description, raw

Viewing structure (free after indexing):

# Display table of contents
print(pi.toc(tree))

# Get section by node_id (IDs are "0000", "0001", "0002", etc.)
section = pi.get_section(tree, "0003")
# Returns: PINode with title, node_id, start_index, end_index, summary, children
# Returns: None if not found

if section:
    print(f"{section.title}: pages {section.start_index}-{section.end_index}")

Finding node IDs: Node IDs are assigned sequentially ("0000", "0001", ...) in tree traversal order. To see all node IDs, access the raw tree structure:

import json
print(json.dumps(tree.raw["structure"], indent=2))
# Each node has: title, node_id, start_index, end_index

pi. API Reference:*

Method Cost Returns Description
pi.configure(client) Free None Set rlm backend (REQUIRED first)
pi.status() Free dict Check availability, config, warning
pi.index(path=str) $$$ PITree Build tree from PDF
pi.toc(tree, max_depth=3) Free str Formatted table of contents
pi.get_section(tree, node_id) Free PINode or None Get section by ID
pi.available() Free bool Check if PageIndex installed
pi.configured() Free bool Check if client configured

PITree attributes: doc_name, nodes (list of PINode), doc_description, raw (dict) PINode attributes: title, node_id, start_index, end_index, summary (may be None), children (may be None)

Notes:

  • summary is only populated if add_summaries=True in pi.index()
  • children is None for leaf nodes (sections with no subsections)
  • tree.raw["structure"] is a flat list; hierarchy is in PINode.children
  • PageIndex extracts document structure (TOC), not content. Use page numbers to locate sections in the original PDF.

Example output from pi.toc():

📄 annual_report.pdf

• Executive Summary (p.1-5)
• Financial Overview (p.6-20)
  • Revenue (p.6-10)
  • Expenses (p.11-15)
  • Projections (p.16-20)
• Risk Factors (p.21-35)
Weekly Installs
5
GitHub Stars
50
First Seen
Jan 24, 2026
Installed on
claude-code5
codex5
gemini-cli5
antigravity4
opencode4
cursor4