rust-systems

Installation
SKILL.md

Rust Systems & Services

Covers modern application-layer Rust (edition 2024): CLIs, web services, libraries. Not no_std/embedded.

Tooling

Tool Purpose
cargo Build, dep management, script runner
clippy Lint (cargo clippy --workspace --all-targets -- -D warnings)
rustfmt Formatter (cargo fmt --all)
cargo-nextest Test runner, noticeably faster than cargo test, better isolation
cargo-deny License + advisory + duplicate-dep checks
cargo-machete Find unused dependencies
  • Pin rust-toolchain.toml per repo so every contributor and CI uses the same compiler.
  • cargo update -p <crate> for single-package upgrades. cargo update rewrites everything — avoid in PR diffs.
  • Cargo.lock goes in version control for binaries and libraries (modern guidance; reproducibility wins).

Workspaces

Multi-crate projects use a workspace with layered crates. Dependencies point inward only.

Cargo.toml                  # [workspace] members + [workspace.dependencies]
crates/
  protocol/    # Shared types, no deps on other workspace crates
  storage/     # Persistence, depends on protocol
  service/    # Business logic, depends on protocol + storage
  cli/        # Binary, depends on everything
  • Centralize versions in [workspace.dependencies], reference as foo = { workspace = true } in members.

  • Keep the leaf-most crate (protocol / types) dependency-free so every other crate can depend on it without cycles.

  • Feature flags belong on the crate that introduces the dependency, not re-exported through the workspace root.

  • Library crates expose one stable facade: a thin lib.rs with a //! module doc comment stating purpose, followed by pub use re-exports of the public surface. Consumers learn one import path per concept; internal module layout can be reorganized without breaking callers.

  • Feature gates must error, never silently degrade. If runtime config requests a capability the binary wasn't compiled with (e.g. device = "gpu" on a non-CUDA build), fail at startup with a clear error. Silent fallback produces different behavior from what the operator configured, often without anyone noticing.

  • Centralize lints at the workspace root with [workspace.lints.*]. Every member crate inherits the same ruleset — no drift between crates, no per-crate #![deny(...)] stacks. Example:

    [workspace.lints.rust]
    unsafe_code = "warn"
    missing_docs = "warn"
    
    [workspace.lints.clippy]
    all = { level = "warn", priority = -1 }
    pedantic = { level = "warn", priority = -1 }
    nursery = { level = "warn", priority = -1 }
    module_name_repetitions = "allow"
    must_use_candidate = "allow"
    

    Each member crate opts in with [lints] workspace = true in its own Cargo.toml. Changing a lint in one place updates every crate.

Build Profiles

When tuning Cargo build profiles (release LTO, release-dbg symbols, release-min for distributable binaries) or adding dev-machine speedups (mold linker, target-cpu=native, share-generics), load build-profiles.md.

Error Handling

Split by crate role:

  • Libraries / lower crates: define typed errors with thiserror. Consumers can pattern-match.
  • Binaries / top-level crates: use anyhow::Result with .context("what was being attempted"). Human-readable error chains.
  • Never return Box<dyn Error> from library APIs — it erases variant information.
  • Use ? liberally. Never .unwrap() or .expect() outside tests and main. An expect("...") is acceptable only when the invariant is provably upheld and the message explains why.
  • Convert at boundaries: #[from] on thiserror variants for auto-conversion; .map_err(MyError::from) when explicit.
  • bail!("...") / ensure!(cond, "...") in application code for early exits.
  • Prefer Result<T, E> over panics for any recoverable error. Panics are for programmer bugs (broken invariants), not runtime failures.
  • #[must_use] on fallible APIs: annotate functions returning Result or newtype-wrapped results that callers frequently ignore. Catches let _ = validate(x); at compile time instead of shipping a silently-dropped error.

Ownership Discipline

  • Take &str over &String, &[T] over &Vec<T> in function signatures — accepts more call sites for free.

  • Return owned (String, Vec<T>) from constructors and public APIs. Borrow in hot paths where lifetimes are obvious.

  • Reach for Arc<T> only when sharing across threads. Single-threaded sharing uses Rc<T> or references.

  • Cow<'_, str> when a function sometimes allocates and sometimes borrows (e.g. normalization).

  • Lifetime elision handles 90% of cases. If you're writing 'a in more than one signature, reconsider whether that type should own its data instead.

  • bytes::Bytes for zero-copy slicing of shared immutable buffers — network parsers, frame decoders, protocol handlers. BytesMut for building buffers that split_to / split_off into Bytes without reallocation. Prefer Bytes over Arc<Vec<u8>> when slicing is the dominant access pattern.

  • Reduce hot-path heap allocations with stack-or-inline collections when the typical size is small and known:

    • smallvec::SmallVec<[T; N]> — inline for ≤N items, spills to heap beyond. Good for "usually 1-8 items" cases like parsed tag lists, lookup keys, small event batches.
    • arrayvec::ArrayVec<T, CAP> — fixed capacity, never heap-allocates. Returns an error when full. Good for bounded message buffers or per-request scratch space.
    • String interning for repeatedly-seen strings (enum-like values parsed from config, tenant IDs, route keys): dashmap::DashMap<String, &'static str> with Box::leak on miss gives &'static str comparisons without per-call allocations.

    These are optimizations — profile first. Vec/String on a cold path isn't the bottleneck.

Async with Tokio

  • Default runtime: #[tokio::main] with features = ["full"] for apps; features = ["rt", "macros", "sync"] for libraries that need to stay slim.
  • tokio::spawn for independent tasks. JoinSet for a dynamic group you'll await together with cancellation.
  • tokio::select! for racing futures (timeouts, cancellation, first-wins).
  • Never block the runtime: tokio::task::spawn_blocking for sync CPU work or blocking I/O libs.
  • tokio::sync::Mutex only when the guard must be held across .await. Otherwise std::sync::Mutex is faster.
  • tokio::sync::RwLock when reads dominate writes (config snapshots, route tables, hot caches). Many readers proceed in parallel; Mutex serializes them. For snapshot-swap semantics (rarely-updated config), arc-swap::ArcSwap is faster still — no lock on the read path.
  • Cancellation: CancellationToken (from tokio-util) propagates shutdown. Long-running tasks must check it.
  • Backpressure via bounded mpsc channels — unbounded channels hide memory growth until OOM.
  • Semaphore for hard concurrency limits on spawn paths that don't fit a channel model (e.g. "at most 50 concurrent outbound HTTP calls"). let _permit = sem.acquire().await?; inside the task; dropping the permit releases the slot. Pair with Arc<Semaphore> shared across spawners.
  • Don't mix async runtimes. Pick tokio and stick with it; async-std and smol don't interop cleanly.

CLI Tools (clap)

  • Use the derive API: #[derive(Parser)] + #[derive(Subcommand)]. Less boilerplate, types drive the help text.
  • One enum Commands variant per subcommand; flatten shared flags into a #[command(flatten)] struct CommonArgs.
  • --json flag on query commands for agent/pipe consumption. Emit via serde_json::to_string(&value)?.
  • Exit codes: 0 success, 1 for errors main returned, 2 for argparse (clap handles this), reserve 3+ for domain meanings documented in --help.
  • Provide --version automatically via #[command(version)].

See cli-tools.md for config layering, logging setup, progress reporting, and shell completions.

HTTP Services (axum)

  • Framework default: axum (tokio-native, tower middleware, extractor-based handlers). Pick actix-web only if an existing codebase uses it.
  • Handlers return Result<impl IntoResponse, AppError>. Implement IntoResponse for AppError to centralize error → status mapping.
  • Validate input at the boundary: axum::extract::Json<T> where T: Deserialize + Validate (use validator crate). Internal services trust input was validated.
  • Share state via State<Arc<AppState>> — not globals, not lazy_static.
  • Middleware via tower::ServiceBuilder: tracing → timeout → auth → CORS → handler. Order matters.
  • Resilience layer stack (outbound HTTP clients and shared services): ServiceBuilder::new().layer(TimeoutLayer).layer(RateLimitLayer).layer(ConcurrencyLimitLayer).layer(LoadShedLayer).layer(RetryLayer).service(client). Name each layer explicitly — LoadShedLayer sheds excess load, ConcurrencyLimitLayer caps in-flight requests, RateLimitLayer bounds request rate, RetryLayer retries classified transient errors. Combining LoadShedLayer + ConcurrencyLimitLayer produces proper backpressure instead of unbounded queueing.

See axum-service.md for project layout, extractors, error types, graceful shutdown, and OpenAPI generation.

Concurrency

Workload Approach
Independent async I/O tokio::spawn + JoinSet or futures::join!
Data-parallel CPU work rayon with par_iter
Shared mutable state across threads Arc<Mutex<T>> or Arc<RwLock<T>>, smallest scope possible
Single-producer pipelines tokio::sync::mpsc (async) or std::sync::mpsc (sync)
Broadcast / fan-out tokio::sync::broadcast

rayon and tokio coexist — use tokio::task::spawn_blocking to call a rayon pool from async code. Never call .block_on() from inside a tokio task; it deadlocks the runtime.

Testing

  • Built-in #[test]. Prefer cargo nextest run --workspace over cargo test — it runs tests in parallel processes with proper isolation.
  • Unit tests live in mod tests { ... } at the bottom of the file (access to private items).
  • Integration tests in tests/ directory. One file per public surface area.
  • #[tokio::test] for async tests. Add flavor = "multi_thread" when the code under test spawns tasks.
  • rstest for parametrized tests and fixtures. proptest / quickcheck for property-based tests on pure logic.
  • insta for snapshot testing CLI output, serialization, large structs. Review diffs with cargo insta review.
  • assert_cmd + predicates for CLI integration tests (invokes the binary, asserts on stdout/stderr/exit code).
  • Assert on error variants with matches!: assert!(matches!(result.unwrap_err(), MyError::Validation(_))). Cleaner than match arms when the test only cares whether the error is the right kind, and doesn't force updates when unrelated variants are added.
  • Coverage: cargo llvm-cov --workspace --html. Target 70%+ on application code, higher on library crates.
  • Fuzzing for parsers: cargo fuzz + libfuzzer-sys on any code that parses untrusted input (file formats, protocols, query languages). A short nightly fuzz run surfaces the panics and UB that unit tests miss.

For generic test discipline (anti-patterns, mock rules, rationalization resistance), see the ia-writing-tests skill.

Unsafe Discipline

  • Default: no unsafe. If clippy flags it, don't #[allow] it — refactor.
  • Every unsafe block gets a // SAFETY: comment above it explaining why each invariant holds. No comment = reviewer rejects.
  • Keep unsafe blocks minimal — wrap in a safe abstraction at module boundary, mark the module pub(crate).
  • Use miri (cargo +nightly miri test) on any crate containing unsafe or raw pointer arithmetic — catches UB that optimizers mask.
  • Prefer bytemuck, zerocopy, bytes over hand-rolled transmutes for zero-copy patterns.

Production Resilience

When productionizing a service (config validation, /health + /ready endpoints, graceful shutdown, retries/timeouts/jitter, connection pools, diagnostic secret redaction), load production-resilience.md.

Observability

For logging (tracing + tracing-subscriber with init recipe), #[instrument] spans, correlation IDs, metrics, and distributed tracing patterns, load observability.md. Never use println! or log:: in new code.

CI

General CI design lives with the ia-infrastructure-engineer agent. For Rust-specific callouts (rustsec/audit-check, cargo-llvm-cov, Swatinem/rust-cache, taiki-e/install-action, matrix coverage guidance, doc-test step), load ci-pipeline.md.

Discipline

  • Simplicity first — every change as simple as possible, impact minimal code.
  • Only touch what's necessary — avoid unrelated changes in a PR.
  • No #[allow(clippy::...)] as a shortcut — fix the underlying issue. Document exceptions with a rationale.
  • Before adding a trait or generic, verify it's used in 3+ places. Otherwise a concrete type is clearer.
  • Verify: see Verify section — pass all checks with zero warnings before declaring done.

Verify

  • cargo fmt --all -- --check passes with zero diffs
  • cargo clippy --workspace --all-targets --all-features -- -D warnings passes
  • cargo nextest run --workspace (or cargo test --workspace) passes with zero failures
  • cargo deny check passes (licenses, advisories, duplicates) for any crate going to production
  • No new unsafe without // SAFETY: comment

References

  • cli-tools.md — clap patterns, config layering, tracing setup, progress, shell completions
  • axum-service.md — project layout, extractors, error types, graceful shutdown, testing
  • build-profiles.md — release/release-dbg/release-min profiles, mold linker, dev compile speedups
  • ci-pipeline.md — Rust-specific CI steps (cargo audit, llvm-cov, rust-cache, matrix strategy, doc tests)
  • production-resilience.md — fail-fast config, health/ready endpoints, graceful shutdown, retries, timeouts, connection pools
  • observability.md — tracing init recipe, span instrumentation, correlation IDs, metrics, distributed tracing
Related skills
Installs
11
GitHub Stars
9
First Seen
Apr 15, 2026