rust-engineering
Rust Engineering Guide
Patterns for building reliable Rust systems that handle file-backed data, external process integration, and cross-language boundaries.
Core Philosophy
Conservative by Default
Inputs from files, subprocesses, and external systems are potentially untrusted (corrupt, half-written, out of date). Rust code should be:
- Conservative: Prefer false negatives over false positives
- Deterministic: Same input → same output
- Resilient: Never panic on user machines due to bad input
Canonical Model Ownership
If Rust is the source of truth, treat the Rust model as canonical. Everything else adapts to it:
- Internal domain model: Expressive, ergonomic for your logic
- FFI DTOs: Boring, stable, language-friendly
- File format model: Stable, versioned, round-trippable
- External input model: Strictly validated, never trusted blindly
Data Modeling
Strong Types Over Strings
Strings for status/effort/triage lead to case-mismatch bugs and invalid values. Prefer enums:
#[derive(Clone, Debug, Serialize, Deserialize)]
#[serde(rename_all = "snake_case")]
pub enum Status { Open, InProgress, Done, Dismissed }
#[derive(Clone, Debug, Serialize, Deserialize)]
#[serde(rename_all = "snake_case")]
pub enum Priority { P0, P1, P2, P3 }
#[derive(Clone, Debug, Serialize, Deserialize)]
#[serde(rename_all = "snake_case")]
pub enum Effort { Small, Medium, Large, Xl }
Versioned Data With Serde
When adding fields to serialized structures, old data won't have them. Model explicitly:
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct LockInfo {
pub pid: u32,
pub path: String,
#[serde(default)]
pub proc_started: Option<u64>, // new field - absent in old data
#[serde(default, alias = "started")]
pub created: Option<u64>, // supports old field name
}
Key patterns:
#[serde(default)]ensures missing fields deserialize cleanlyalias = "old_name"reads older formats without rewriting them- Never repurpose field meanings in-place—add new fields instead
Sentinel Values
If the spec uses sentinel strings (e.g., Related: None) but your type is Option<String>:
fn parse_optional_field(raw: &str) -> Option<String> {
let t = raw.trim();
if t.is_empty() || t.eq_ignore_ascii_case("none") {
None
} else {
Some(t.to_string())
}
}
UniFFI Boundaries
UniFFI works best with flat, stable types:
Stringfor IDs and timestamps (convertUuid/DateTimeat the boundary)- Flat enums (no associated data) or string representations
Vec<T>andOption<T>whereTis FFI-friendly- Records with stable fields
#[derive(Clone, Debug, uniffi::Record)]
pub struct IdeaDto {
pub id: String,
pub created_at_ms: i64,
pub status: String, // "open" | "in_progress" | "done"
pub priority: Option<String>,
}
Why strings for enums in DTOs? Extremely stable across languages and avoids edge cases when adding variants.
String & Text Safety
UTF-8 Slicing Can Panic
In Rust, String/&str are UTF-8. Indexing with byte offsets (&s[..60]) panics if the index isn't on a character boundary.
Safe Truncation by Characters
fn truncate_chars(s: &str, max_chars: usize) -> String {
let out: String = s.chars().take(max_chars).collect();
if s.chars().count() > max_chars {
format!("{out}...")
} else {
out
}
}
Emoji-Safe Truncation (Grapheme Clusters)
For UI-facing text, use unicode-segmentation:
use unicode_segmentation::UnicodeSegmentation;
fn truncate_graphemes(s: &str, max_graphemes: usize) -> String {
let graphemes: Vec<&str> = s.graphemes(true).collect();
if graphemes.len() > max_graphemes {
format!("{}...", graphemes[..max_graphemes].concat())
} else {
s.to_string()
}
}
Path Normalization
Path strings arrive from many sources with inconsistent formatting. Use a single normalizer for all comparisons and hashing:
fn normalize_path(path: &str) -> String {
let trimmed = path.trim_end_matches('/');
if trimmed.is_empty() {
"/".to_string()
} else {
trimmed.to_string()
}
}
Special-case root for child-of logic:
fn child_prefix(query: &str) -> String {
let q = normalize_path(query);
if q == "/" { "/".to_string() } else { format!("{}/", q) }
}
Case Normalization
If spec says values are case-insensitive, normalize early:
fn norm_key(k: &str) -> String {
k.trim().to_ascii_lowercase()
}
fn norm_value(v: &str) -> String {
v.trim().to_ascii_lowercase()
}
Parsing & Serialization
State Machine Parsing
For markdown-like formats with predictable structure, a state machine beats fragile regex:
States:
OutsideBlockInBlockHeaderInMetadataBlockInDescription
Rules:
- Match block start only on the heading line (e.g.,
### [#idea-<id>] <title>) - Parse metadata only within a contiguous region after the heading
- Treat everything else as content until delimiter (
---) or next heading
Anchored Updates
Problem: line.contains("[#idea-123]") can match references in descriptions, causing silent corruption.
Solution: Anchor on the heading line with precise patterns:
use once_cell::sync::Lazy;
use regex::Regex;
static IDEA_HEADING_RE: Lazy<Regex> = Lazy::new(|| {
Regex::new(r"^### \[#idea-([0-9A-HJKMNP-TV-Z]{26})\]\s*(.*)$").unwrap()
});
Update logic:
- Scan lines for heading that exactly matches target ID →
in_target_block = true - While in block, update only the metadata key you care about
- Stop when you hit delimiter (
---) or new heading
Round-Trip Preservation
Parse into an AST that preserves formatting for lossless round-trips:
pub struct ParsedFile {
pub format_version: u32,
pub sections: Vec<Section>,
pub trailing_text: String, // preserve unknown text/comments
}
pub struct Block {
pub id: String,
pub header_line: String, // preserve original formatting
pub fields: Vec<(String, String)>,
pub body: String,
pub separator: String, // e.g., "\n---\n"
}
Version Markers
If the format spec mandates a version marker, parsing should be strict:
#[derive(thiserror::Error, Debug)]
pub enum ParseError {
#[error("Unsupported format. Expected {expected}, found {found:?}")]
UnsupportedFormat { expected: String, found: Option<String> },
#[error("Item not found: {0}")]
NotFound(String),
}
Duplicate ID Handling
External edits can create duplicate IDs. Pick a deterministic policy:
use std::collections::HashSet;
fn dedupe_by_id<T: HasId>(items: Vec<T>) -> Vec<T> {
let mut seen = HashSet::new();
items.into_iter()
.filter(|item| seen.insert(item.id().to_string()))
.collect()
}
File I/O & Persistence
Durable-First Invariant
Never gate persistence on validation or subprocess output:
- Write raw data to storage (e.g.,
status: pending) - Return control immediately
- Async enrichment updates later (or not at all)
Atomic Writes
Write to temp file, then rename (atomic on same filesystem):
use std::{io::Write, path::Path};
use tempfile::NamedTempFile;
fn atomic_write(path: &Path, contents: &str) -> std::io::Result<()> {
let dir = path.parent().unwrap_or_else(|| Path::new("."));
let mut tmp = NamedTempFile::new_in(dir)?;
tmp.write_all(contents.as_bytes())?;
tmp.flush()?;
tmp.persist(path).map(|_| ()).map_err(|e| e.error)
}
Concurrency Control
Two things might write: your app and external processes (Claude, human edits).
In-process: Mutex works
Out-of-process: Make writes merge-friendly:
- Re-read file before write
- Apply patch to latest parsed model
- Write back atomically
Advisory locking (fs2::FileExt) is an option for stronger guarantees.
File Watching
Use debounced file watchers for external edit detection:
- Watcher thread pushes events into a channel
- Model thread parses and computes diffs
- UI receives "data changed for X" (avoid huge payloads)
Process & System Integration
PID Liveness vs Identity
kill(pid, 0) detects if a PID exists, not if it's the same process. PID reuse creates "ghost" sessions.
Solution: Store and verify process start time:
fn is_pid_alive_verified(pid: u32, expected_start: Option<u64>) -> bool {
let Some(expected) = expected_start else {
return is_pid_alive_legacy(pid);
};
match get_process_start_time(pid) {
Some(actual) => actual == expected,
None => false,
}
}
Process Start Time With sysinfo
Cache sysinfo::System to avoid repeated expensive allocations:
use sysinfo::{Pid, ProcessRefreshKind, RefreshKind, System};
thread_local! {
static SYSTEM_CACHE: std::cell::RefCell<Option<System>> =
std::cell::RefCell::new(None);
}
pub fn get_process_start_time(pid: u32) -> Option<u64> {
SYSTEM_CACHE.with(|cache| {
let mut cache = cache.borrow_mut();
let sys = cache.get_or_insert_with(|| {
System::new_with_specifics(
RefreshKind::new().with_processes(ProcessRefreshKind::new()),
)
});
sys.refresh_processes_specifics(ProcessRefreshKind::new());
sys.process(Pid::from(pid as usize)).map(|p| p.start_time())
})
}
Legacy Mitigation
For legacy data without process verification:
- PID exists (
kill(pid, 0)) - Process identity heuristic (e.g., "claude" in process name)
- Age expiry (e.g., 24h) for unverified entries
Subprocess Integration
Treat subprocess output as hostile input:
use std::process::{Command, Stdio};
use std::io::Write;
pub fn run_subprocess(prompt: &str, stdin_payload: &str) -> anyhow::Result<String> {
let mut child = Command::new("claude")
.args(["--print", "--output-format", "json", prompt])
.stdin(Stdio::piped())
.stdout(Stdio::piped())
.stderr(Stdio::piped())
.spawn()?;
{
let stdin = child.stdin.as_mut()
.ok_or_else(|| anyhow::anyhow!("no stdin"))?;
stdin.write_all(stdin_payload.as_bytes())?;
}
let output = child.wait_with_output()?;
if !output.status.success() {
return Err(anyhow::anyhow!(
"subprocess failed: {}",
String::from_utf8_lossy(&output.stderr)
));
}
Ok(String::from_utf8(output.stdout)?)
}
Validation: Parse JSON strictly with serde_json. If invalid, mark as failed but keep data intact.
Timeouts: Enforce timeouts for background operations. On timeout, kill process and record failure.
Timestamp Handling
The number-one timestamp bug: unit mismatch (seconds vs milliseconds).
fn normalize_epoch_to_secs(v: u64) -> u64 {
if v >= 1_000_000_000_000 { v / 1000 } else { v }
}
fn normalize_epoch_to_ms(v: u64) -> u64 {
if v < 1_000_000_000_000 { v * 1000 } else { v }
}
Use saturating_sub for age computations to prevent underflow:
let age = now.saturating_sub(created);
Parse ISO timestamps for legacy data:
fn parse_rfc3339_to_secs(s: &str) -> Option<u64> {
chrono::DateTime::parse_from_rfc3339(s)
.ok()
.map(|dt| dt.timestamp() as u64)
}
Performance
Cache Compiled Regex
Compiling regex during every parse is wasteful:
use once_cell::sync::Lazy;
use regex::Regex;
static HEADING_RE: Lazy<Regex> = Lazy::new(|| {
Regex::new(r"^### \[#item-([A-Z0-9]+)\]").unwrap()
});
Cache Parse Results
Re-parsing files repeatedly hits scaling issues. Cache by (mtime, size) or content hash:
struct CachedParse {
mtime: SystemTime,
size: u64,
result: ParsedData,
}
Deterministic Selection
Directory iteration order is unstable. When selecting from multiple candidates, be deterministic:
fn prefer_newer(a: &Info, b: &Info) -> bool {
let a_c = a.created.unwrap_or(0);
let b_c = b.created.unwrap_or(0);
a_c > b_c || (a_c == b_c && a.path > b.path) // tie-breaker
}
Avoid N×Filter Patterns
Maintain indexed structures for frequent lookups:
// Instead of filtering Vec on every access
HashMap<ProjectPath, Vec<ItemId>>
Update incrementally on change events.
Error Handling
Never Panic on Bad Input
File I/O and JSON parsing must never panic on user machines:
- If metadata can't be parsed → ignore that entry (don't crash, don't guess)
- If timestamps are missing → treat conservatively (older for selection, unverified for safety)
- Use
Optionand early returns liberally
Explicit Error Types
Define a single error type for the module with thiserror:
#[derive(thiserror::Error, Debug)]
pub enum DataError {
#[error("I/O error: {0}")]
Io(#[from] std::io::Error),
#[error("Parse error: {0}")]
Parse(String),
#[error("Not found: {0}")]
NotFound(String),
#[error("Subprocess failed: {0}")]
SubprocessFailed(String),
#[error("Timeout after {0}ms")]
Timeout(u64),
#[error("Unsupported format: expected {expected}, found {found:?}")]
UnsupportedFormat { expected: String, found: Option<String> },
}
Graceful Degradation
Errors should degrade to "core functionality still works":
- Capture succeeds even if validation fails
- Data is preserved even if enrichment times out
Testing
Unit Tests
- Round-trip tests: parse → serialize → parse again → same data
- Mutation tests: updates only touch intended fields
- Normalization tests: case-insensitive values parse equivalently
Golden Files
Store sample fixtures in tests/fixtures/ and assert exact output after transformations.
Contract Tests
Create fixture files representing real-world scenarios:
- Verified data with all fields
- Legacy data with old field names
- Missing timestamps
- Corrupted JSON
- Unit mismatches (ms vs seconds)
Property Tests
Use proptest to generate random inputs and ensure:
- Parser doesn't panic
- IDs are preserved through round-trips
Subprocess Integration Tests
Don't depend on real external processes in CI:
- Provide a fake executable in PATH that returns deterministic output
- Verify timeout and parsing logic
Essential Test Cases
- Version marker: missing/wrong → error
- Empty file: initializes properly
- Metadata injection: description resembling metadata stays in description
- Anchored updates: update affects only intended block
- Duplicate IDs: deterministic dedupe enforced
- PID correlation: mismatched PID rejected
Quick Reference
Do
- ✅ Truncate strings with
chars()or graphemes - ✅ Anchor identification to heading lines (
^### [#item-...]) - ✅ Parse metadata only within defined regions
- ✅ Normalize keys/values early per spec
- ✅ Cache compiled regex and parse results
- ✅ Write files atomically
- ✅ Verify PID identity with process start time
- ✅ Use
saturating_subfor time arithmetic - ✅ Use explicit error variants
- ✅ Test edge cases and known failure modes
Don't
- ❌ Slice strings with
&s[..N]without checking char boundaries - ❌ Use
.contains()to decide which block to update - ❌ Treat metadata patterns anywhere as real metadata
- ❌ Ignore version markers if spec requires enforcement
- ❌ Recompile regex each parse
- ❌ Assume IDs are unique with external edits
- ❌ Trust subprocess output without validation
- ❌ Mix timestamp units without normalization
- ❌ Panic on malformed input
Change Checklist
When modifying these systems, verify:
Schema / Serde
- New fields are
Option+#[serde(default)] - Old field names supported via
alias - No field meaning repurposed in-place
Paths
- All comparisons use shared normalizer
- Root handled explicitly (no accidental
//) - Hashing uses normalized paths
PID Safety
- Existence ≠ identity unless legacy mode
- Verified entries check
proc_started - Legacy mode has mitigations + age expiry
Timestamps
- Units consistent or normalized on read
- Selection deterministic on ties
- Age checks immune to unit mismatch
Robustness
- No panics on file I/O or parse errors
- Unreadable data ignored, not guessed
- Performance stable under frequent refresh