Restatedev

SKILL.md

Restate Skill

Product summary

Restate is a durable execution engine that adds automatic failure recovery and exactly-once execution to your services without requiring separate orchestration infrastructure. Services embed the Restate SDK and run as regular applications (containers, serverless, VMs, Kubernetes). Restate persists execution progress in a journal, replaying from the last checkpoint on failure. It supports three service types: Basic Services (stateless handlers), Virtual Objects (stateful entities with single-writer consistency), and Workflows (multi-step processes with exactly-once guarantees). SDKs available for TypeScript, Java, Kotlin, Python, Go, and Rust. Primary docs: https://docs.restate.dev. CLI commands: restate deployments register, restate invocations ls, restate kv clear. Config files: restate.toml (server), service configuration via SDK context.

When to use

Reach for Restate when:

  • Building services that must survive crashes and resume exactly where they left off (durable execution)
  • Implementing workflows with multiple steps that need exactly-once semantics
  • Orchestrating microservices with automatic retries and resilient communication
  • Processing Kafka events with stateful handlers and zero consumer management
  • Building AI agents that need fault tolerance, state persistence, and observability
  • Implementing sagas with compensating actions for distributed transactions
  • Needing idempotent operations without manual deduplication logic
  • Running long-lived processes that may suspend and resume (timers, awakeables, external events)

Do not use Restate for: simple stateless HTTP endpoints (though it works fine), real-time streaming that doesn't need durability, or systems where you control all failure scenarios.

Quick reference

Service types and when to use each

Service Type State Concurrency Best For
Basic Service None Unlimited parallel ETL, sagas, background jobs, API calls, task parallelization
Virtual Object Isolated per key Single writer + concurrent readers User accounts, shopping carts, agents, state machines, stateful event processing
Workflow Isolated per ID Single run handler + concurrent signals/queries Approvals, onboarding, multi-step flows, exactly-once processes

Handler context types

Context Service Type Capabilities
Context Basic Service Durable steps, service calls, timers, awakeables
ObjectContext Virtual Object State read/write, durable steps, service calls (exclusive)
ObjectSharedContext Virtual Object State read-only, concurrent access (no blocking)
WorkflowContext Workflow State, durable promises, durable steps, service calls (run handler only)
WorkflowSharedContext Workflow State read-only, signal/query handlers (concurrent)

Essential CLI commands

# Register a service endpoint
restate deployments register http://localhost:9080

# List invocations
restate invocations ls

# Kill stuck invocations
restate invocations kill <INVOCATION_ID>

# Clear state
restate kv clear <VIRTUAL_OBJECT_NAME>
restate kv clear <VIRTUAL_OBJECT_NAME>/<KEY>

# View service config
restate service config view <SERVICE_NAME>

# Start dev server
restate up

Durable operations (must wrap non-deterministic code)

  • ctx.run() - Wrap database calls, HTTP requests, UUID generation, random numbers
  • ctx.sleep() / ctx.timer() - Durable timers
  • ctx.awakeable() - Wait for external events
  • ctx.call() - Service-to-service calls (automatic retries)
  • ctx.get() / ctx.set() - State operations (Virtual Objects, Workflows only)

Service configuration options

# Server config (restate.toml)
[worker]
bind-address = "0.0.0.0:8080"
http-port = 8080
admin-port = 9071

[tracing]
endpoint = "http://localhost:4317"  # OTLP endpoint for OpenTelemetry

SDK-level config (example TypeScript):

conf.invocationRetryPolicy(...)
  .abortTimeout(Duration.ofMinutes(15))
  .inactivityTimeout(Duration.ofMinutes(15))
  .idempotencyRetention(Duration.ofDays(3))
  .workflowRetention(Duration.ofDays(10))

Decision guidance

When to use Virtual Objects vs Workflows

Aspect Virtual Object Workflow
State model Persistent K/V store per key Isolated per workflow ID
Concurrency Single writer per key Single run handler
Lifetime Long-lived, reactive Bounded, request-driven
Use case User accounts, counters, agents Approvals, onboarding, multi-step flows
Signals Not applicable Durable promises for signaling

When to use eager vs lazy state loading

Scenario Approach
Small state, handler uses all entries Eager (default)
Large state, handler uses few entries Lazy
AWS Lambda or FaaS with replay cost concerns Lazy (avoids replay overhead)

When to use HTTP vs Kafka invocation

Scenario Use
Synchronous request-response HTTP
Event-driven, asynchronous processing Kafka
Exactly-once per message key Kafka (Virtual Objects use key as object ID)
Parallel independent events Kafka (Basic Services ignore key)

Workflow

Typical task: Build a durable service

  1. Choose service type: Start with Basic Service unless you need state (Virtual Object) or exactly-once multi-step execution (Workflow).

  2. Define handlers: Write regular functions with Restate context as first parameter. Wrap non-deterministic operations in ctx.run().

  3. Wrap external calls: All database queries, HTTP requests, UUID generation must use ctx.run() to ensure deterministic replay on failure.

  4. Use durable operations: Replace setTimeout with ctx.sleep(), use ctx.call() for service-to-service communication, use ctx.awakeable() for external events.

  5. Register endpoint: Deploy your service and register it with Restate:

    restate deployments register http://your-service:9080
    
  6. Invoke handlers: Via HTTP, Kafka, or typed SDK clients. Restate handles retries and exactly-once semantics automatically.

  7. Monitor: Check the UI at port 9070 for execution traces, state, and invocation status.

Typical task: Implement a saga with compensation

  1. Build a list of compensating actions as you execute steps.
  2. On terminal error, reverse the list and execute compensations in order.
  3. Restate persists the compensation list in the journal, so it survives crashes.
  4. Use ctx.run() to wrap each step and compensation.

Typical task: Connect Kafka events to handlers

  1. Start Kafka and create a topic.
  2. Configure Restate with Kafka cluster details in restate.toml.
  3. Register your service endpoint.
  4. Create a subscription via Admin API:
    curl localhost:9070/subscriptions --json '{
      "source": "kafka://cluster-name/topic-name",
      "sink": "service://ServiceName/handlerName",
      "options": {"auto.offset.reset": "earliest"}
    }'
    
  5. Restate invokes the handler for each event, managing consumers and retries.

Common gotchas

  • Non-determinism errors on replay: Wrap all non-deterministic operations in ctx.run(). Don't use random numbers, UUIDs, or timestamps directly—use ctx.random() or ctx.now().

  • Changing handler logic during in-flight invocations: Use versioning. Deploy new code to a new endpoint and register it. Existing invocations continue on old code; new ones use new code. Don't modify handler signatures mid-flight.

  • State schema changes: Virtual Object state persists across versions. Ensure backward compatibility when adding/removing fields. Test state migrations.

  • Forgetting to wrap service calls: Use ctx.call() for service-to-service communication, not direct HTTP. Direct calls won't be retried or journaled.

  • Blocking operations outside ctx.run(): Don't use await, sleep(), or blocking I/O outside durable operations. This breaks deterministic replay. Always use Restate's durable primitives.

  • Concurrent state access in Virtual Objects: Only exclusive handlers (ObjectContext) can write state. Shared handlers (ObjectSharedContext) can only read. Don't try to write from shared handlers.

  • Idempotency key retention: Default is 24 hours. If you retry after that window, Restate re-executes. Tune via service configuration if needed.

  • Workflow state after run completes: Workflow state remains accessible for 24 hours (default) after the run handler completes. Other handlers can still query it. Adjust via workflowRetention.

  • Kafka key semantics: For Virtual Objects, Kafka message key becomes the object key. For Basic Services, the key is ignored. Plan your partitioning accordingly.

Verification checklist

Before submitting work with Restate services:

  • All non-deterministic operations (DB calls, HTTP, UUID, random, timestamps) wrapped in ctx.run()
  • Service calls use ctx.call(), not direct HTTP
  • No blocking operations (await, sleep, I/O) outside ctx.run() or ctx.sleep()
  • Correct service type chosen (Basic, Virtual Object, or Workflow)
  • Handler signatures match context type (Context, ObjectContext, WorkflowContext, etc.)
  • State schema is backward compatible if updating Virtual Objects
  • Idempotency keys used for critical operations (if needed)
  • Deployment registered with Restate before invoking
  • Kafka subscriptions created if using event-driven invocation
  • Tested with UI at http://localhost:9070 to verify execution traces
  • Compensation logic in place for sagas (if applicable)
  • Timeouts and retention policies configured appropriately

Resources


For additional documentation and navigation, see: https://docs.restate.dev/llms.txt

Weekly Installs
26
First Seen
Feb 28, 2026
Installed on
amp26
cline26
opencode26
cursor26
kimi-cli26
codex26