portlang
portlang Skill
Core Concept
portlang treats agent behavior as search through a conditioned space. You don't script loops—you declare the search space:
- Boundaries: What the agent cannot do (enforced by sandbox)
- Verifiers: What success looks like (runtime reward signals)
- Context budget: Hard token ceiling
- Environment: What the agent can observe
The runtime executes the search. Every run produces a trajectory (complete event log).
Prerequisites
portlang currently only runs on apple devices. Before running portlang fields:
- Install portlang:
brew tap portofcontext/homebrew-tap
brew install portlang
- Set API key (choose one):
export ANTHROPIC_API_KEY=sk-ant-...
export OPENROUTER_API_KEY=sk-or-v1-...
- Verify installation:
portlang init # Check container support
Model naming by provider:
- Anthropic API:
anthropic/claude-sonnet-4.6,anthropic/claude-opus-4.5 - OpenRouter:
anthropic/claude-3.5-sonnet,anthropic/claude-3-opus, anything on openrouter that support tool calling - Provider auto-detected from API key
field.toml Structure
All sections are optional unless marked (required). Fields marked "inherit" pull their value from a parent field.toml one directory up (auto-detected if ../field.toml exists).
name = "my-task" # (required) identifier, used in trajectory storage
description = "..." # human-readable summary
[vars] # optional; declare {{ var_name }} template variables
# customer_id = { required = true, description = "Salesforce account ID" }
# region = { required = false, default = "us-east-1", description = "AWS region" }
[model] # (required), or: model = "inherit"
name = "anthropic/claude-sonnet-4.6" # (required)
temperature = 0.5 # default: 0.5
[prompt] # (required)
goal = "..." # (required) initial task; supports {{ var }} templates
system = "..." # optional system prompt; supports {{ var }} templates
re_observation = ["echo '=== workspace ===' && ls -1", ...] # refresh context each step
[environment] # optional; all fields have defaults
root = "./workspace" # working dir (maps to /workspace in container)
packages = ["nodejs"] # apt packages to install; list "uv" to get uv/pip
dockerfile = "./Dockerfile" # custom Dockerfile (overrides packages)
image = "custom:tag" # pre-built image (overrides dockerfile)
[boundary] # optional, or: boundary = "inherit"
allow_write = ["*.py"] # glob patterns for writable paths; default: none
network = "deny" # "deny" | "allow"; default: allow
max_tokens = 150000 # hard ceiling on total context tokens
max_cost = "$2.00" # hard ceiling on total cost (must be quoted string with $)
max_steps = 30 # hard ceiling on agent steps
bash = true # enable built-in bash tool; default: true
output_schema = """{ ... }""" # optional; JSON schema string for structured output
tools = "inherit" # optional; inherit [[tool]] list from parent instead of defining inline
[[tool]] # repeatable; type = "python" | "shell" | "mcp"; bash/glob/write are built-in defaults
# Shell verifier (default when type is omitted):
[[verifier]]
type = "shell" # default; type can be omitted for shell verifiers
name = "..." # (required)
command = "..." # shell command; exit 0 = pass, nonzero = fail; supports {{ var }} templates
trigger = "on_stop" # "on_stop" | "always" | "on_tool:<tool_name>"; default: on_stop
description = "..." # injected into context on failure
# Levenshtein verifier (normalized edit distance):
[[verifier]]
type = "levenshtein"
name = "..."
file = "output.txt" # optional; omit to use output_schema structured output
expected = "..." # reference string; supports {{ var }} templates
threshold = 0.9 # similarity [0.0–1.0] required to pass; default: 1.0
# Semantic similarity verifier (cosine via embeddings):
[[verifier]]
type = "semantic"
name = "..."
file = "output.txt" # optional; omit to use output_schema structured output
expected = "..." # reference string to embed and compare; supports {{ var }} templates
threshold = 0.85 # cosine similarity [0.0–1.0]; default: 0.8
embedding_model = "bge-small-en-v1.5" # local model (~67 MB, downloaded once)
# embedding_url = "https://..." # use OpenAI-compatible endpoint instead
# Tool call verifier (inspect or require a specific tool call):
[[verifier]]
type = "tool_call"
name = "..."
tool = "bash" # (required for on_stop) assert this tool was called
field = "/input/path" # optional; JSON pointer into {input: {...}, output: "..."}
matches = "^[a-z]+" # optional; regex the field value must match
not_matches = "^/etc" # optional; regex the field value must NOT match
Minimal field.toml
name = "my-task"
[model]
name = "anthropic/claude-sonnet-4.6"
[prompt]
goal = "Create hello.py that prints 'Hello, World!'"
[environment]
root = "./workspace"
[boundary]
allow_write = ["hello.py"]
max_tokens = 80000
max_cost = "$1.00"
max_steps = 10
[[verifier]]
name = "works"
command = "python hello.py 2>&1 | grep -q 'Hello, World!'"
trigger = "on_stop"
description = "Must print 'Hello, World!'"
Essential Commands
portlang new field.toml # Scaffold a new field.toml using the flags to configure
portlang run field.toml # Execute once
portlang run field.toml --var k=v # Pass a template variable (repeatable)
portlang run field.toml --vars p.json # Pass variables from a JSON file
portlang run field.toml --input ./data.csv # Stage a file into the workspace before the agent starts
portlang run field.toml --input '{"id":"123"}' # Stage inline JSON as portlang_input.json
portlang check field.toml # Validate configuration
portlang converge field.toml -n 10 # Run N times, measure reliability
portlang eval ./examples/ # Run all fields in a directory
portlang eval ./examples/ --resume # Resume a previous eval, skipping fields that already passed
portlang list trajectories [field] # List trajectories (--converged, --failed, --limit)
portlang list evals [dir] # List eval runs (--limit)
portlang replay <id> # Step through a trajectory (q=quit, n=next, p=prev)
portlang diff <id-a> <id-b> # Compare two trajectories
portlang report <field-name> # Adaptation analysis across runs
portlang view trajectory <id> # Open trajectory as interactive HTML
portlang view eval <id-or-dir> # Open eval results dashboard (by run ID or directory)
portlang view diff <id-a> <id-b> # Open trajectory comparison HTML
portlang view field <field-name> # Open field adaptation report HTML
Add --html to replay/diff for HTML output. Add --no-open to any view command to skip opening the browser. See reference/CLI.md for full flag details.
Key Patterns
1. Field Inheritance (shared model/boundary/tools across a suite)
If ../field.toml exists, a child field can inherit from it automatically:
# parent/field.toml — shared config for all child fields
name = "parent"
[model]
name = "anthropic/claude-sonnet-4.6"
temperature = 0.5
[boundary]
network = "deny"
max_tokens = 100000
max_cost = "$1.00"
max_steps = 20
[[tool]]
type = "python"
file = "./tools/shared_utils.py"
# parent/task-a/field.toml — inherits model, boundary, and tools
name = "task-a"
model = "inherit"
boundary = "inherit"
tools = "inherit"
[prompt]
goal = "Do task A using the shared tools."
Inheritance eliminates duplication across eval suites. Override any section by defining it inline.
2. Template Variables (parameterize a field for reuse)
Declare variables in [vars], use {{ name }} anywhere in goal/system/re_observation/verifier commands, supply at runtime with --var:
[vars]
currency = { required = false, default = "usd", description = "Currency to report" }
[prompt]
goal = "Get the account balance and return amounts in {{ currency }}."
[[verifier]]
name = "correct-currency"
type = "tool_call"
tool = "bash"
trigger = "on_stop"
description = "Agent must have run bash"
portlang run field.toml --var currency=gbp
portlang run field.toml --vars params.json # bulk vars from file
portlang run field.toml --input ./data.csv # stage input file into workspace
--input with a file copies it to the workspace root. --input '{"key":"val"}' writes portlang_input.json. Use re_observation to surface the file contents to the agent each step.
3. Structured Output (agent produces validated JSON)
Define output_schema inside [boundary] as a JSON string. Schema validation is automatic — no separate verifier needed:
[boundary]
allow_write = ["output.json"]
output_schema = '''
{
"type": "object",
"required": ["status", "count"],
"properties": {
"status": {"type": "string", "enum": ["success", "failure"]},
"count": {"type": "integer", "minimum": 0}
}
}
'''
portlang validates the output against the schema, writes output.json to /workspace, and reports schema violations as failures. Add [[verifier]] entries only for additional business logic checks beyond schema conformance. Typed verifiers (levenshtein, semantic) can omit file to validate against the structured output directly.
4. Multi-Layer Verifiers (fail fast with precise feedback)
Layer verifiers from coarse to fine — each one assumes the previous passed:
[[verifier]]
name = "compiled"
command = "python script.py 2>/dev/null"
trigger = "on_stop"
description = "script.py must run without errors"
[[verifier]]
name = "correct-output"
type = "levenshtein"
file = "output.txt"
expected = "42"
threshold = 1.0
trigger = "on_stop"
description = "output.txt must contain exactly '42'"
Verifiers run in order, stop on first failure. Use output_schema instead of json verifiers when the agent produces structured JSON output.
5. Smart Verifier Types
Prefer typed verifiers. They run in the portlang runtime — no packages required, no container dependencies. Fall back to shell verifiers only for logic that can't be expressed with a typed verifier, and only use tools guaranteed in the container baseline (see section 8).
Trigger modes: on_stop (default) runs after the agent finishes. always runs after every step. on_tool:<tool_name> runs after each call to a specific tool — useful for incremental checks, e.g. trigger = "on_tool:write" to validate files as they're written.
# Fuzzy text match (tolerates minor differences)
[[verifier]]
type = "levenshtein"
name = "close-enough"
file = "output.txt"
expected = "The answer is 42."
threshold = 0.9
trigger = "on_stop"
description = "Output must be at least 90% similar to expected"
# Semantic match (meaning, not exact text)
[[verifier]]
type = "semantic"
name = "right-idea"
file = "summary.txt"
expected = "The model achieved high accuracy on the test set."
threshold = 0.85
trigger = "on_stop"
description = "Summary must convey the correct conclusion"
Local embedding model downloaded automatically (~67 MB, no API key required).
6. Scoped Boundaries
[boundary]
allow_write = ["output.json", "logs/*.txt"]
network = "deny"
max_tokens = 100000
max_cost = "$1.00"
max_steps = 20
7. Re-observation (keep context fresh)
[prompt]
goal = "..."
re_observation = [
"echo '=== workspace ===' && ls -1 *.py *.txt 2>/dev/null | cat",
"echo '=== tests ===' && python -m pytest --tb=no -q 2>&1 | tail -5",
]
Commands run before each agent step, injecting fresh state into context.
8. Custom Environment
[environment]
root = "./workspace"
packages = ["nodejs", "npm"] # Install apt packages
# Or use a custom Dockerfile:
dockerfile = "./Dockerfile"
# Or a pre-built image:
image = "myregistry/myimage:latest"
Default container baseline: The container is minimal. Available by default: standard POSIX shell builtins, bash, curl, wc, grep, cat, ls, find. Not available unless added to packages or a custom image: python3, node, jq, git, and most other tools.
Shell verifiers run inside the container and are subject to the same constraints. Prefer typed verifiers (type = "json", "levenshtein", "semantic") over shell verifiers whenever possible — they run natively in the portlang runtime and require nothing installed. Only use shell verifiers for checks that require container-side execution, and only invoke tools you've declared in packages.
9. Custom Tools
Default tools (always available, no [[tool]] entry needed):
bash— run shell commands in the containerglob— find files by patternwrite— write files to allowed paths
Define [[tool]] entries only to add capabilities beyond these three.
Shell tool:
[[tool]]
type = "shell"
name = "word_count"
description = "Count words in a file"
command = "wc {path}"
input_schema = '{"type": "object", "properties": {"path": {"type": "string"}}, "required": ["path"]}'
Python tool (auto-schema from type hints):
# tools/calculator.py
# /// script
# dependencies = [requests]
# ///
# uv auto-installs dependencies — no packages needed in field.toml
def execute(expression: str) -> dict:
"""Evaluate a math expression and return the result."""
return {"result": eval(expression)}
[[tool]]
type = "python"
file = "./tools/calculator.py" # relative to field.toml, not workspace
function = "execute" # schema auto-extracted from type hints; omit to expose all functions
Python tool rules:
- Each tool file runs in isolation — tool files cannot import each other. Put all related logic in one file.
- Declare third-party dependencies with a
# /// scriptPEP 723 block at the top;uvinstalls them automatically.- File paths in
file =are relative tofield.toml, not the workspace root.
Tool-first design for complex tasks: For tasks involving multi-step API calls, data aggregation, or web scraping, write Python tools that encapsulate that logic before writing the field. The agent's goal should then be: call the tool, write the output file. This keeps steps under 5, cost under $0.05, and allow_write naturally minimal. Agents that try to do complex work through raw shell commands (curl pipes, temp files, bash scripts) burn budget and fail more often.
MCP server (stdio):
[[tool]]
type = "mcp"
name = "filesystem"
command = "npx"
args = ["-y", "@modelcontextprotocol/server-filesystem", "/workspace"]
transport = "stdio"
MCP server (HTTP/SSE):
[[tool]]
type = "mcp"
name = "stripe"
url = "https://mcp.stripe.com"
transport = "http"
headers = { Authorization = "Bearer ${STRIPE_KEY}" }
10. Batch Evaluation
portlang eval ./examples/
portlang view eval ./examples/ # Open interactive HTML dashboard
Useful for regression testing after changes.
Debugging Workflow
- Run fails →
portlang replay <id>to see what happened - Find failure point → Check which verifier failed and at which step
- Non-determinism →
portlang diff <id-a> <id-b>to find divergence - Visual debugging →
portlang view trajectory <id>for HTML view - Optimize →
portlang converge -n 10to measure reliability - Patterns →
portlang report <field-name>for adaptation analysis
Common Issues
Budget exhausted:
- Start conservative:
max_cost = "$0.25"for simple tasks,$1.00for network-heavy tasks; increase after profiling - Increase
max_tokensor reducemax_stepsin[boundary] - Simplify
re_observationcommands - Check for tool error loops
- Move complex logic into Python tools so the agent does orchestration, not implementation
Low convergence rate (<70%):
- Strengthen verifiers (make expectations explicit)
- Tighten boundaries (restrict file access)
- Clarify goal in
[prompt] - Lower
temperature(e.g.,temperature = 0.0)
Verifier always passes/fails:
- Weak signal (>95% or <10% pass rate) — adjust verifier command
Structured output not valid:
- Add an explicit
[[verifier]]to checkoutput.json - Ensure
[prompt].goalnames the required fields explicitly - Use
temperature = 0.0for consistent JSON output
Reference Documentation
- reference/CLI.md - Full CLI reference (all commands and flags)
- reference/verifier_patterns.md - 20 real-world verifier examples
- reference/custom_tools.md - Shell, Python, MCP guides
- reference/trajectory_analysis.md - Advanced debugging
- reference/field_recipes.md - 8 complete field.toml examples
Core Principles
- Boundaries are topology, not policy - Make bad actions impossible, not discouraged
- Verifiers are runtime reward signals - Not post-hoc checks, they steer behavior
- Context is finite - Hard ceiling, no magic compression
- Trajectories are data - Replay, diff, analyze distributions
- Engineer the space, not the searcher - Agent policy is opaque, environment is yours