Restate Skill

Product summary

Restate is a durable execution engine that adds automatic failure recovery and exactly-once execution to your services without requiring separate orchestration infrastructure. Services embed the Restate SDK and run as regular applications (containers, serverless, VMs, Kubernetes). Restate persists execution progress in a journal, replaying from the last checkpoint on failure. It supports three service types: Basic Services (stateless handlers), Virtual Objects (stateful entities with single-writer consistency), and Workflows (multi-step processes with exactly-once guarantees). SDKs available for TypeScript, Java, Kotlin, Python, Go, and Rust. Primary docs: https://docs.restate.dev. CLI commands: restate deployments register, restate invocations ls, restate kv clear. Config files: restate.toml (server), service configuration via SDK context.

When to use

Reach for Restate when:

Building services that must survive crashes and resume exactly where they left off (durable execution)
Implementing workflows with multiple steps that need exactly-once semantics
Orchestrating microservices with automatic retries and resilient communication
Processing Kafka events with stateful handlers and zero consumer management
Building AI agents that need fault tolerance, state persistence, and observability
Implementing sagas with compensating actions for distributed transactions
Needing idempotent operations without manual deduplication logic
Running long-lived processes that may suspend and resume (timers, awakeables, external events)

Do not use Restate for: simple stateless HTTP endpoints (though it works fine), real-time streaming that doesn't need durability, or systems where you control all failure scenarios.

Quick reference

Service types and when to use each

Service Type	State	Concurrency	Best For
Basic Service	None	Unlimited parallel	ETL, sagas, background jobs, API calls, task parallelization
Virtual Object	Isolated per key	Single writer + concurrent readers	User accounts, shopping carts, agents, state machines, stateful event processing
Workflow	Isolated per ID	Single run handler + concurrent signals/queries	Approvals, onboarding, multi-step flows, exactly-once processes

Handler context types

Context	Service Type	Capabilities
`Context`	Basic Service	Durable steps, service calls, timers, awakeables
`ObjectContext`	Virtual Object	State read/write, durable steps, service calls (exclusive)
`ObjectSharedContext`	Virtual Object	State read-only, concurrent access (no blocking)
`WorkflowContext`	Workflow	State, durable promises, durable steps, service calls (run handler only)
`WorkflowSharedContext`	Workflow	State read-only, signal/query handlers (concurrent)

Essential CLI commands

# Register a service endpoint
restate deployments register http://localhost:9080

# List invocations
restate invocations ls

# Kill stuck invocations
restate invocations kill <INVOCATION_ID>

# Clear state
restate kv clear <VIRTUAL_OBJECT_NAME>
restate kv clear <VIRTUAL_OBJECT_NAME>/<KEY>

# View service config
restate service config view <SERVICE_NAME>

# Start dev server
restate up

Durable operations (must wrap non-deterministic code)

ctx.run() - Wrap database calls, HTTP requests, UUID generation, random numbers
ctx.sleep() / ctx.timer() - Durable timers
ctx.awakeable() - Wait for external events
ctx.call() - Service-to-service calls (automatic retries)
ctx.get() / ctx.set() - State operations (Virtual Objects, Workflows only)

Service configuration options

# Server config (restate.toml)
[worker]
bind-address = "0.0.0.0:8080"
http-port = 8080
admin-port = 9071

[tracing]
endpoint = "http://localhost:4317"  # OTLP endpoint for OpenTelemetry

SDK-level config (example TypeScript):

conf.invocationRetryPolicy(...)
  .abortTimeout(Duration.ofMinutes(15))
  .inactivityTimeout(Duration.ofMinutes(15))
  .idempotencyRetention(Duration.ofDays(3))
  .workflowRetention(Duration.ofDays(10))

Decision guidance

When to use Virtual Objects vs Workflows

Aspect	Virtual Object	Workflow
State model	Persistent K/V store per key	Isolated per workflow ID
Concurrency	Single writer per key	Single run handler
Lifetime	Long-lived, reactive	Bounded, request-driven
Use case	User accounts, counters, agents	Approvals, onboarding, multi-step flows
Signals	Not applicable	Durable promises for signaling

When to use eager vs lazy state loading

Scenario	Approach
Small state, handler uses all entries	Eager (default)
Large state, handler uses few entries	Lazy
AWS Lambda or FaaS with replay cost concerns	Lazy (avoids replay overhead)

When to use HTTP vs Kafka invocation

Scenario	Use
Synchronous request-response	HTTP
Event-driven, asynchronous processing	Kafka
Exactly-once per message key	Kafka (Virtual Objects use key as object ID)
Parallel independent events	Kafka (Basic Services ignore key)

Workflow

Typical task: Build a durable service

Choose service type: Start with Basic Service unless you need state (Virtual Object) or exactly-once multi-step execution (Workflow).
Define handlers: Write regular functions with Restate context as first parameter. Wrap non-deterministic operations in ctx.run().
Wrap external calls: All database queries, HTTP requests, UUID generation must use ctx.run() to ensure deterministic replay on failure.
Use durable operations: Replace setTimeout with ctx.sleep(), use ctx.call() for service-to-service communication, use ctx.awakeable() for external events.
Register endpoint: Deploy your service and register it with Restate:
```
restate deployments register http://your-service:9080
```
Invoke handlers: Via HTTP, Kafka, or typed SDK clients. Restate handles retries and exactly-once semantics automatically.
Monitor: Check the UI at port 9070 for execution traces, state, and invocation status.

Typical task: Implement a saga with compensation

Build a list of compensating actions as you execute steps.
On terminal error, reverse the list and execute compensations in order.
Restate persists the compensation list in the journal, so it survives crashes.
Use ctx.run() to wrap each step and compensation.

Typical task: Connect Kafka events to handlers

Start Kafka and create a topic.
Configure Restate with Kafka cluster details in restate.toml.
Register your service endpoint.

Create a subscription via Admin API:

curl localhost:9070/subscriptions --json '{
  "source": "kafka://cluster-name/topic-name",
  "sink": "service://ServiceName/handlerName",
  "options": {"auto.offset.reset": "earliest"}
}'

Restate invokes the handler for each event, managing consumers and retries.

Common gotchas

Non-determinism errors on replay: Wrap all non-deterministic operations in ctx.run(). Don't use random numbers, UUIDs, or timestamps directly—use ctx.random() or ctx.now().
Changing handler logic during in-flight invocations: Use versioning. Deploy new code to a new endpoint and register it. Existing invocations continue on old code; new ones use new code. Don't modify handler signatures mid-flight.
State schema changes: Virtual Object state persists across versions. Ensure backward compatibility when adding/removing fields. Test state migrations.
Forgetting to wrap service calls: Use ctx.call() for service-to-service communication, not direct HTTP. Direct calls won't be retried or journaled.
Blocking operations outside ctx.run(): Don't use await, sleep(), or blocking I/O outside durable operations. This breaks deterministic replay. Always use Restate's durable primitives.
Concurrent state access in Virtual Objects: Only exclusive handlers (ObjectContext) can write state. Shared handlers (ObjectSharedContext) can only read. Don't try to write from shared handlers.
Idempotency key retention: Default is 24 hours. If you retry after that window, Restate re-executes. Tune via service configuration if needed.
Workflow state after run completes: Workflow state remains accessible for 24 hours (default) after the run handler completes. Other handlers can still query it. Adjust via workflowRetention.
Kafka key semantics: For Virtual Objects, Kafka message key becomes the object key. For Basic Services, the key is ignored. Plan your partitioning accordingly.

Verification checklist

Before submitting work with Restate services:

Resources

Comprehensive navigation: https://docs.restate.dev/llms.txt — Full page-by-page listing for agent navigation
Quickstart: https://docs.restate.dev/quickstart — Get your first service running in minutes
Foundations: https://docs.restate.dev/foundations/key-concepts — Core concepts (durable execution, services, handlers, invocations)
SDK guides: https://docs.restate.dev/develop/ts/services (or /java/, /python/, /go/) — Language-specific implementation details

For additional documentation and navigation, see: https://docs.restate.dev/llms.txt

Restatedev