Restatedev
Restate Skill
Product summary
Restate is a durable execution engine that adds automatic failure recovery and exactly-once execution to your services without requiring separate orchestration infrastructure. Services embed the Restate SDK and run as regular applications (containers, serverless, VMs, Kubernetes). Restate persists execution progress in a journal, replaying from the last checkpoint on failure. It supports three service types: Basic Services (stateless handlers), Virtual Objects (stateful entities with single-writer consistency), and Workflows (multi-step processes with exactly-once guarantees). SDKs available for TypeScript, Java, Kotlin, Python, Go, and Rust. Primary docs: https://docs.restate.dev. CLI commands: restate deployments register, restate invocations ls, restate kv clear. Config files: restate.toml (server), service configuration via SDK context.
When to use
Reach for Restate when:
- Building services that must survive crashes and resume exactly where they left off (durable execution)
- Implementing workflows with multiple steps that need exactly-once semantics
- Orchestrating microservices with automatic retries and resilient communication
- Processing Kafka events with stateful handlers and zero consumer management
- Building AI agents that need fault tolerance, state persistence, and observability
- Implementing sagas with compensating actions for distributed transactions
- Needing idempotent operations without manual deduplication logic
- Running long-lived processes that may suspend and resume (timers, awakeables, external events)
Do not use Restate for: simple stateless HTTP endpoints (though it works fine), real-time streaming that doesn't need durability, or systems where you control all failure scenarios.
Quick reference
Service types and when to use each
| Service Type | State | Concurrency | Best For |
|---|---|---|---|
| Basic Service | None | Unlimited parallel | ETL, sagas, background jobs, API calls, task parallelization |
| Virtual Object | Isolated per key | Single writer + concurrent readers | User accounts, shopping carts, agents, state machines, stateful event processing |
| Workflow | Isolated per ID | Single run handler + concurrent signals/queries | Approvals, onboarding, multi-step flows, exactly-once processes |
Handler context types
| Context | Service Type | Capabilities |
|---|---|---|
Context |
Basic Service | Durable steps, service calls, timers, awakeables |
ObjectContext |
Virtual Object | State read/write, durable steps, service calls (exclusive) |
ObjectSharedContext |
Virtual Object | State read-only, concurrent access (no blocking) |
WorkflowContext |
Workflow | State, durable promises, durable steps, service calls (run handler only) |
WorkflowSharedContext |
Workflow | State read-only, signal/query handlers (concurrent) |
Essential CLI commands
# Register a service endpoint
restate deployments register http://localhost:9080
# List invocations
restate invocations ls
# Kill stuck invocations
restate invocations kill <INVOCATION_ID>
# Clear state
restate kv clear <VIRTUAL_OBJECT_NAME>
restate kv clear <VIRTUAL_OBJECT_NAME>/<KEY>
# View service config
restate service config view <SERVICE_NAME>
# Start dev server
restate up
Durable operations (must wrap non-deterministic code)
ctx.run()- Wrap database calls, HTTP requests, UUID generation, random numbersctx.sleep()/ctx.timer()- Durable timersctx.awakeable()- Wait for external eventsctx.call()- Service-to-service calls (automatic retries)ctx.get()/ctx.set()- State operations (Virtual Objects, Workflows only)
Service configuration options
# Server config (restate.toml)
[worker]
bind-address = "0.0.0.0:8080"
http-port = 8080
admin-port = 9071
[tracing]
endpoint = "http://localhost:4317" # OTLP endpoint for OpenTelemetry
SDK-level config (example TypeScript):
conf.invocationRetryPolicy(...)
.abortTimeout(Duration.ofMinutes(15))
.inactivityTimeout(Duration.ofMinutes(15))
.idempotencyRetention(Duration.ofDays(3))
.workflowRetention(Duration.ofDays(10))
Decision guidance
When to use Virtual Objects vs Workflows
| Aspect | Virtual Object | Workflow |
|---|---|---|
| State model | Persistent K/V store per key | Isolated per workflow ID |
| Concurrency | Single writer per key | Single run handler |
| Lifetime | Long-lived, reactive | Bounded, request-driven |
| Use case | User accounts, counters, agents | Approvals, onboarding, multi-step flows |
| Signals | Not applicable | Durable promises for signaling |
When to use eager vs lazy state loading
| Scenario | Approach |
|---|---|
| Small state, handler uses all entries | Eager (default) |
| Large state, handler uses few entries | Lazy |
| AWS Lambda or FaaS with replay cost concerns | Lazy (avoids replay overhead) |
When to use HTTP vs Kafka invocation
| Scenario | Use |
|---|---|
| Synchronous request-response | HTTP |
| Event-driven, asynchronous processing | Kafka |
| Exactly-once per message key | Kafka (Virtual Objects use key as object ID) |
| Parallel independent events | Kafka (Basic Services ignore key) |
Workflow
Typical task: Build a durable service
-
Choose service type: Start with Basic Service unless you need state (Virtual Object) or exactly-once multi-step execution (Workflow).
-
Define handlers: Write regular functions with Restate context as first parameter. Wrap non-deterministic operations in
ctx.run(). -
Wrap external calls: All database queries, HTTP requests, UUID generation must use
ctx.run()to ensure deterministic replay on failure. -
Use durable operations: Replace
setTimeoutwithctx.sleep(), usectx.call()for service-to-service communication, usectx.awakeable()for external events. -
Register endpoint: Deploy your service and register it with Restate:
restate deployments register http://your-service:9080 -
Invoke handlers: Via HTTP, Kafka, or typed SDK clients. Restate handles retries and exactly-once semantics automatically.
-
Monitor: Check the UI at port 9070 for execution traces, state, and invocation status.
Typical task: Implement a saga with compensation
- Build a list of compensating actions as you execute steps.
- On terminal error, reverse the list and execute compensations in order.
- Restate persists the compensation list in the journal, so it survives crashes.
- Use
ctx.run()to wrap each step and compensation.
Typical task: Connect Kafka events to handlers
- Start Kafka and create a topic.
- Configure Restate with Kafka cluster details in
restate.toml. - Register your service endpoint.
- Create a subscription via Admin API:
curl localhost:9070/subscriptions --json '{ "source": "kafka://cluster-name/topic-name", "sink": "service://ServiceName/handlerName", "options": {"auto.offset.reset": "earliest"} }' - Restate invokes the handler for each event, managing consumers and retries.
Common gotchas
-
Non-determinism errors on replay: Wrap all non-deterministic operations in
ctx.run(). Don't use random numbers, UUIDs, or timestamps directly—usectx.random()orctx.now(). -
Changing handler logic during in-flight invocations: Use versioning. Deploy new code to a new endpoint and register it. Existing invocations continue on old code; new ones use new code. Don't modify handler signatures mid-flight.
-
State schema changes: Virtual Object state persists across versions. Ensure backward compatibility when adding/removing fields. Test state migrations.
-
Forgetting to wrap service calls: Use
ctx.call()for service-to-service communication, not direct HTTP. Direct calls won't be retried or journaled. -
Blocking operations outside
ctx.run(): Don't useawait,sleep(), or blocking I/O outside durable operations. This breaks deterministic replay. Always use Restate's durable primitives. -
Concurrent state access in Virtual Objects: Only exclusive handlers (ObjectContext) can write state. Shared handlers (ObjectSharedContext) can only read. Don't try to write from shared handlers.
-
Idempotency key retention: Default is 24 hours. If you retry after that window, Restate re-executes. Tune via service configuration if needed.
-
Workflow state after run completes: Workflow state remains accessible for 24 hours (default) after the run handler completes. Other handlers can still query it. Adjust via
workflowRetention. -
Kafka key semantics: For Virtual Objects, Kafka message key becomes the object key. For Basic Services, the key is ignored. Plan your partitioning accordingly.
Verification checklist
Before submitting work with Restate services:
- All non-deterministic operations (DB calls, HTTP, UUID, random, timestamps) wrapped in
ctx.run() - Service calls use
ctx.call(), not direct HTTP - No blocking operations (await, sleep, I/O) outside
ctx.run()orctx.sleep() - Correct service type chosen (Basic, Virtual Object, or Workflow)
- Handler signatures match context type (Context, ObjectContext, WorkflowContext, etc.)
- State schema is backward compatible if updating Virtual Objects
- Idempotency keys used for critical operations (if needed)
- Deployment registered with Restate before invoking
- Kafka subscriptions created if using event-driven invocation
- Tested with UI at http://localhost:9070 to verify execution traces
- Compensation logic in place for sagas (if applicable)
- Timeouts and retention policies configured appropriately
Resources
- Comprehensive navigation: https://docs.restate.dev/llms.txt — Full page-by-page listing for agent navigation
- Quickstart: https://docs.restate.dev/quickstart — Get your first service running in minutes
- Foundations: https://docs.restate.dev/foundations/key-concepts — Core concepts (durable execution, services, handlers, invocations)
- SDK guides: https://docs.restate.dev/develop/ts/services (or /java/, /python/, /go/) — Language-specific implementation details
For additional documentation and navigation, see: https://docs.restate.dev/llms.txt