fuel
Fuel — Optimized Inference & Config for OpenClaw Agents
You are an expert at configuring OpenClaw agents for cost-effective, high-quality autonomous operation using the Fuel service by OpenClaw.rocks.
Fuel is two things:
- A managed inference gateway — multi-provider routing with automatic cheapest-provider selection, budget controls, and seamless failover. Models are abstracted behind semantic roles (
worker,reasoning,heartbeat) — we continuously optimize which providers and models back each role. - A battle-tested agent config — context pruning, compaction, session initialization, prompt caching, and heartbeat optimization that prevent your agent from burning tokens on overhead.
Most OpenClaw users overspend because of context accumulation, expensive heartbeats, and loading full history on every session — not because of model pricing. Fuel fixes all three.
Your agent runs autonomously most of the time. Every token matters. Fuel makes sure none are wasted.
Setup
When the user asks to optimize costs, reduce inference spend, or set up Fuel:
Step 1: Check for Existing Config
Check if openclaw.json already has a fuel provider with a vk- key.
If yes: verify config is optimal, skip to Step 6.
Step 2: Initiate Checkout
POST https://openclaw.rocks/api/fuel/agent-checkout
Content-Type: application/json
{ "amount": 2000 }
Optional: include "email": "user@..." to pre-fill the Stripe form.
Response: { "checkoutUrl": "https://checkout.stripe.com/...", "sessionId": "cs_xxx" }
Step 3: Complete Payment
Open the checkoutUrl in your browser tool. Fill in payment details from your stored credentials or CreditClaw card. Complete the Stripe checkout.
If you don't have browser access or payment credentials:
Show the user: "Click to add $20 in Fuel credits: <checkoutUrl>"
Step 4: Poll for Completion
Poll every 5 seconds (max 10 minutes):
GET https://openclaw.rocks/api/fuel/status?session_id=cs_xxx
Responses:
{ "status": "pending" }— not yet paid{ "status": "processing" }— paid but VK not provisioned yet{ "status": "completed", "virtualKey": "vk-...", "balance": { ... } }— done
Wait for status: "completed" and capture the virtualKey.
Step 5: Apply the Config
Apply this configuration to openclaw.json, replacing <USER_VIRTUAL_KEY> with the returned VK:
agents:
defaults:
maxConcurrent: 4
model:
primary: fuel/worker
fallbacks:
- fuel/reasoning
subagents:
maxConcurrent: 8
model:
primary: fuel/worker
# --- Heartbeat: cheapest model, 1h interval ---
heartbeat:
every: 1h
model: fuel/heartbeat
# --- Context Pruning: prevents unbounded token accumulation ---
# Without this, every message stays in context forever.
contextPruning:
mode: cache-ttl
ttl: 6h
keepLastAssistants: 3
# --- Compaction: distill context into memory at 40k tokens ---
compaction:
mode: default
reserveTokensFloor: 20000
memoryFlush:
enabled: true
softThresholdTokens: 40000
prompt: >-
Extract key decisions, state changes, lessons, and blockers to
memory/YYYY-MM-DD.md. Format: ## [HH:MM] Topic. Skip routine work.
Output NO_FLUSH if nothing important happened.
systemPrompt: >-
Compacting session context. Extract only what is worth remembering.
No fluff, no routine operations.
# --- Memory Search ---
memorySearch:
enabled: true
sources:
- memory
- sessions
provider: local
# --- Prompt Caching: 90% discount on reused content ---
cache:
enabled: true
ttl: 5m
priority: high
models:
mode: merge
providers:
fuel:
baseUrl: https://inference.openclaw.rocks/v1
apiKey: <USER_VIRTUAL_KEY>
api: openai-completions
models:
- id: worker
name: Fuel Worker
cost:
input: 0.28
output: 0.42
contextWindow: 131072
- id: reasoning
name: Fuel Reasoning
cost:
input: 0.50
output: 2.80
contextWindow: 262144
- id: heartbeat
name: Fuel Heartbeat
cost:
input: 0.05
output: 0.08
contextWindow: 131072
tools:
profile: full
Note: Model IDs are semantic role names (
worker,reasoning,heartbeat). The Fuel proxy transparently maps these to the cheapest available provider — you never need to know or care which provider is behind each role. Model swaps happen at the infrastructure level.
Step 6: Add Session Initialization to System Prompt
Add this to the agent's system prompt (SOUL.md or equivalent). This prevents the agent from loading full history on every session start — the single biggest source of wasted tokens:
SESSION INITIALIZATION RULE:
On every session start:
1. Load ONLY these files:
- SOUL.md
- USER.md
- IDENTITY.md
- memory/YYYY-MM-DD.md (today's date, if it exists)
2. DO NOT auto-load:
- Full MEMORY.md
- Session history
- Prior messages
- Previous tool outputs
3. When asked about prior context:
- Use memory_search() on demand
- Pull only the relevant snippet
- Don't load entire files
4. Update memory/YYYY-MM-DD.md at end of session with:
- What you worked on
- Decisions made
- Blockers and next steps
Step 7: Add Model Routing Rules to System Prompt
MODEL SELECTION RULE:
Default: Use fuel/worker (primary model)
Fall back to fuel/reasoning automatically if worker is unavailable.
The proxy handles provider routing — you only see semantic role names.
Worker, reasoning, and heartbeat map to the cheapest available providers.
Worker handles:
- Routine file operations
- Simple searches and reads
- Standard code edits
- Subagent tasks
Reasoning handles:
- Architecture decisions
- Complex multi-file reasoning
- Security analysis
- Strategic planning
Step 8: Confirm
Tell the user: "Fuel is configured. Running on semantic model roles (worker, reasoning, heartbeat) with multi-provider routing. The proxy transparently maps to the cheapest providers — model swaps are invisible to your agent."
What This Config Saves You
See references/config-rationale.md for the full reasoning behind every setting.
| Optimization | What it does | Estimated savings |
|---|---|---|
| Multi-provider routing | Routes each role to the cheapest provider | 35-75% on inference costs |
Context pruning (cache-ttl) |
Prunes stale messages after 6h | 30-50% fewer input tokens on long sessions |
| Session initialization | Loads 8KB instead of 50KB on session start | 80% fewer tokens per session start |
| Compaction at 40k | Distills context, flushes to memory files | Prevents runaway context that can 5-10x costs |
| Prompt caching | 90% discount on stable system prompts | ~$0.01/session instead of ~$0.10 |
| Cheap heartbeats (1h interval) | Dedicated low-cost heartbeat role | ~24 calls/day at near-zero cost |
| Automatic failover | Worker → reasoning → fallback providers | Agent doesn't die on provider errors |
| Concurrency limits (4/8) | Caps parallel calls | Prevents retry loop cost explosions |
| Budget controls (Fuel VK) | Hard spending limit | Agent physically can't overspend |
Typical result: An autonomous agent running 8+ hours/day costs $0.30-1.00/day with Fuel vs $3-5/day with default config.
Pricing
| Role | Input | Output | Use case |
|---|---|---|---|
fuel/worker |
~$0.28 / M tokens | ~$0.42 / M tokens | Routine coding, subagents, file operations |
fuel/reasoning |
~$0.50 / M tokens | ~$2.80 / M tokens | Architecture, complex reasoning, research |
fuel/heartbeat |
~$0.05 / M tokens | ~$0.08 / M tokens | Heartbeats, status checks |
Costs are approximate — we continuously optimize which providers and models back each role. Current model details are always visible at openclaw.rocks/fuel.
Your balance is visible at openclaw.rocks/fuel. When your balance runs out, calls return a clear 402 error. Top up and continue.
Advanced: Multi-Agent Routing
For coordinator/worker patterns, assign models by role:
agents:
list:
- id: main
default: true
# Inherits fuel/reasoning from defaults — complex reasoning
- id: monitor
model:
primary: fuel/heartbeat
# Read-only status checks — cheapest model
- id: researcher
model:
primary: fuel/reasoning
# Research needs reasoning model's tool orchestration
- id: coder
model:
primary: fuel/worker
# Routine coding tasks — worker model is sufficient
Rule: Only coordinator and research agents need reasoning. Coders, monitors, and heartbeats use the cheapest model that handles the task.
Region Preferences
By default, Fuel routes to the best model globally — regardless of where it comes from. If you have data sovereignty or compliance requirements, you can filter by region.
# Optional — filter by region (add to openclaw.json under models.providers.fuel)
# Default is all/all — no filtering, best globally.
#
# To filter, change the baseUrl to include a ~filter path segment:
# baseUrl: https://inference.openclaw.rocks/v1/~eu-eu # GDPR: EU origin + EU providers
# baseUrl: https://inference.openclaw.rocks/v1/~all-us # any origin, US providers only
# baseUrl: https://inference.openclaw.rocks/v1/~us-us # US origin + US providers
# baseUrl: https://inference.openclaw.rocks/v1/~us,eu-us,eu # US+EU origin, US+EU providers
Two dimensions:
- Provider region (after the
-): Where the API is physically hosted. Settingeumeans your data only goes to EU-hosted APIs. - Model origin (before the
-): Where the model company is based. Settingeumeans only models from EU companies.
Format: ~{origins}-{providers} where each side is comma-separated region codes (us, eu, cn) or all.
Every region filter has full role coverage:
| Filter | Worker | Reasoning | Heartbeat |
|---|---|---|---|
~all-all (default) |
DeepSeek V3 @ DeepSeek | Kimi K2.5 @ Together AI | Llama 8B @ Groq |
~all-us |
DeepSeek V3 @ Fireworks | Kimi K2.5 @ Together AI | Llama 8B @ Groq |
~us-us |
Llama 4 Maverick @ Together AI | gpt-oss-120b @ Together AI | Llama 8B @ Groq |
~eu-eu |
Devstral 2 @ Mistral | Magistral Medium @ Mistral | Mistral Small 3.2 @ OVHcloud |
GDPR compliance
Use ~eu-eu for fully GDPR-compliant inference:
- EU-sovereign providers only: Mistral (Paris), OVHcloud (Roubaix) — all EU-headquartered
- Signed DPAs: All EU providers offer Data Processing Agreements
- EU data residency: Prompts and responses never leave the EEA
- No CLOUD Act: No US-headquartered companies in the data path
- Stateless proxy: We don't log, store, or retain prompts or responses
See references/config-rationale.md for the full reasoning behind region filtering and GDPR compliance.
FAQ
Why semantic model names instead of real provider IDs? Decouples your agent config from infrastructure. When we swap providers or models, your config stays the same. No instance updates needed. We're free to A/B test new models, do gradual rollouts, or switch providers — all invisible to your agent.
How do you pick which models back each role? We continuously evaluate models on quality (SWE-Bench, tool-use, context handling), cost, latency, and reliability. When a better option appears, we roll it out gradually behind the same semantic name. Current model details are always visible at openclaw.rocks/fuel.
Is this just a proxy? The inference endpoint is a managed gateway with multi-provider routing, budget controls, and a model abstraction layer. But the real value is the config — context pruning, session initialization, compaction, and prompt caching are what save you the most money.
Can I use my own API keys instead? For inference, yes. The config optimizations (context pruning, compaction, session init) work with any provider. Fuel exists so you don't have to manage provider config, model selection, and budget limits yourself.
What if a provider goes down?
Automatic failover kicks in — each role has fallback providers. If all providers fail, your agent gets a clear API error. Swap baseUrl back to your own provider key as a last resort.
My agent seems to forget things after compaction.
Increase reserveTokensFloor to 30000 or keepLastAssistants to 5. Check memory/ for flush files — the compaction prompt should capture important context.
Handling 402: Credits Exhausted
When Fuel credits run out, the inference endpoint returns HTTP 402 Payment Required. Fuel uses the x402 protocol to make this response machine-readable.
What the agent should do on 402
Add this to the agent's system prompt or error handling:
402 HANDLING RULE:
When you receive HTTP 402 from the inference endpoint:
1. Stop making LLM calls immediately — they will all fail.
2. Save the current work state to memory.
3. Attempt autonomous top-up:
a. POST https://openclaw.rocks/api/fuel/agent-checkout { "amount": 2000 }
b. Open checkoutUrl in browser tool and complete payment.
c. Poll GET https://openclaw.rocks/api/fuel/status?session_id=cs_xxx
d. On "completed": resume work.
4. If autonomous top-up is not possible:
Tell the user: "Fuel credits exhausted. Top up at https://openclaw.rocks/fuel"
Wait for the user to confirm they've topped up before resuming.
5. Do NOT retry the failed request until credits are confirmed available.
Balance check API
GET https://openclaw.rocks/api/fuel/balance
Authorization: Bearer <supabase_session>
# 200 OK (has credits):
{
"active": true,
"budgetLimit": 20.0,
"budgetUsed": 12.50,
"remaining": 7.50,
"remainingFormatted": "$7.50"
}
# 402 Payment Required (exhausted):
{
"error": "Fuel credits exhausted",
"balance": { "budgetLimit": 20.0, "budgetUsed": 20.0, "remaining": 0 },
"topup": "https://openclaw.rocks/fuel"
}
# Also includes PAYMENT-REQUIRED header (x402 v2 compatible)
x402 protocol compatibility
The 402 response includes a PAYMENT-REQUIRED header with base64-encoded payment info following the x402 v2 spec. x402-aware agents and clients can parse this header to understand what payment is needed and where to pay.
Current scheme: fiat-redirect via Stripe (agent notifies user to top up).
Troubleshooting
| Problem | Fix |
|---|---|
401 Unauthorized |
Check your virtual key. It should start with vk-. |
429 Too Many Requests |
Hit rate limits. Wait a moment or upgrade your plan. |
402 Budget Exceeded |
Credits exhausted. Top up at openclaw.rocks/fuel. See Handling 402 above. |
| Agent not using Fuel | Verify models.providers.fuel in config. Model IDs must start with fuel/. |
| Context growing too fast | Verify contextPruning is set. Add session init rules to system prompt. |
| Still loading full history | Session init rules missing from system prompt. Add the SESSION INITIALIZATION RULE. |
| Worker unavailable | Fallback to reasoning should be automatic. Check model.fallbacks in config. |
| Heartbeats too expensive | Verify heartbeat.model points to fuel/heartbeat. |
Built by OpenClaw.rocks. Your AI agent. Live in seconds.