testing-expected-results
Testing Expected Results
Run real commands and verify they produce the ACTUAL side effects and outputs you expect - not just "exit code 0." Catches the dangerous cases where commands "succeed" but don't do what they claim.
When to use me
Use this skill when:
- A command returns 0 but you're not sure it actually worked
- You need to verify side effects (files created, data changed, services running)
- Exit code checking gives false confidence
- "It ran without error" isn't enough proof
- Commands have complex side effects across multiple systems
- You're debugging "why did the deploy succeed but the app is down?"
What I do
1. Capture Pre-State
Before running the command, capture:
- Filesystem state (files, directories, permissions)
- Database state (records, schema)
- Process state (running services)
- Network state (ports, connections)
- Environment variables
2. Run the Command
Execute the actual command with:
- Timeout protection
- Resource limits
- Security sandboxing
- Output capture (stdout, stderr)
- Exit code capture
3. Capture Post-State
After the command completes, capture the same state.
4. Smart Comparison
Compare actual vs expected with intelligence:
- Exact match - For deterministic output
- Pattern match - For variable content (timestamps, UUIDs)
- Range match - For numeric values (response time, file size)
- Structure match - For JSON/XML (ignore key order)
- Semantic match - For content meaning (not just bytes)
- Existence check - For "should exist" / "should not exist"
- Delta check - For "should have changed by X"
5. Side Effect Verification
Verify specific side effects:
- Filesystem - File created/modified/deleted, permissions changed
- Database - Records inserted/updated, schema migrated
- Processes - Service started/stopped/restarted
- Network - Port bound, connection made, API called
- External - Cloud resources created, messages queued
6. Async/Delayed Effect Handling
For commands with eventual consistency:
- Poll with configurable intervals
- Wait for specific conditions
- Timeout handling
- Retry logic
Examples
# Verify a backup actually created a valid backup
bash scripts/verify.sh \
--command "./backup.sh --source=/data --dest=/backups" \
--expected 'file_exists:/backups/backup-$(date +%Y%m%d).tar.gz' \
--expected 'file_size:>100MB' \
--expected 'file_integrity:sha256' \
--negative 'file_modified:/data' \
--timeout 300
# Verify a deployment actually started the service
bash scripts/verify.sh \
--command "./deploy.sh --version=v2.0.0" \
--expected 'process_running:my-service' \
--expected 'port_listening:8080' \
--expected 'http_healthy:http://localhost:8080/health' \
--poll-interval 5 --timeout 120
# Verify a database migration actually changed the schema
bash scripts/verify.sh \
--command "./migrate.sh up" \
--expected 'db_table_exists:new_table' \
--expected 'db_column_exists:new_table.new_column' \
--expected 'db_constraint:unique_on_email' \
--db-connection "postgresql://localhost/mydb"
# Verify an export actually produced correct data
bash scripts/verify.sh \
--command "./export.sh --format=csv --output=/exports/users.csv" \
--expected 'file_exists:/exports/users.csv' \
--expected 'file_contains:"user_id,email,name"' \
--expected 'line_count:>1000' \
--expected 'csv_valid:yes' \
--negative 'file_contains:ERROR'
# Verify negative side effects (what shouldn't happen)
bash scripts/verify.sh \
--command "./cleanup.sh --days=30" \
--expected 'file_deleted:/tmp/old_stuff' \
--negative 'file_exists:/important/data' \
--negative 'file_deleted:/critical/config'
Verification Types
Filesystem Effects
file_exists:
path: /path/to/file
optional:
- min_size: 100MB # File must be at least this big
- max_size: 1GB # File must be at most this big
- permissions: 644 # Specific permissions
- owner: appuser # Specific owner
- modified_after: now # Modified after command started
- content_type: text # MIME type or magic number
file_contains:
path: /path/to/file
pattern: "string or regex"
optional:
- count: 1 # Must appear exactly N times
- line_number: 5 # Must be on specific line
file_hash:
path: /path/to/file
algorithm: sha256 # sha256, md5, sha512
expected: abc123... # Hash value (optional - just check hash exists)
directory_structure:
path: /path/to/dir
expected: |
dir/
dir/file1.txt
dir/subdir/
dir/subdir/file2.txt
Database Effects
db_table_exists:
name: users
connection: ${DB_URL}
db_column_exists:
table: users
column: email
type: varchar(255)
nullable: false
db_row_count:
table: users
where: "created_at > NOW() - INTERVAL '1 day'"
expected: 100
tolerance: +/- 10 # Allow 90-110
db_query_result:
query: "SELECT COUNT(*) FROM users WHERE active = true"
expected: "> 1000"
Process Effects
process_running:
name: my-service
optional:
- user: appuser
- cpu_percent: < 50
- memory_mb: < 1024
- uptime_seconds: > 60
port_listening:
port: 8080
protocol: tcp # tcp, udp
optional:
- interface: 0.0.0.0 # Specific bind address
- process_name: app # Must be owned by this process
Network Effects
http_request:
url: http://localhost:8080/health
method: GET
expected_status: 200
optional:
- timeout: 5
- expected_body: '{"status": "healthy"}'
- expected_headers: 'Content-Type: application/json'
- retry: 3
tcp_connect:
host: localhost
port: 5432
timeout: 5
Content Verification
csv_valid:
file: /path/to/file.csv
expected_columns: id,name,email
row_count: "> 100"
json_valid:
file: /path/to/file.json
schema: /path/to/schema.json # JSON Schema validation
required_paths:
- $.status
- $.data.users[0].name
Comparison Strategies
Handling Non-Determinism
Timestamps:
# Match any ISO8601 timestamp
--expected 'file_contains:{{TIMESTAMP}}'
# Match timestamp within range
--expected 'file_contains:{{TIMESTAMP_RANGE:2024-01-01,2024-12-31}}'
UUIDs:
# Match any UUID
--expected 'file_contains:{{UUID}}'
# Match UUID pattern but validate it
--expected 'file_contains:{{UUID_FORMAT}}'
Order-Independent:
# For JSON arrays, sets, etc.
--expected 'json_path:$.data.items contains [1,2,3] (any order)'
Partial Matching
# File must contain ALL these patterns
--expected 'file_contains_all:["success", "completed", "exit 0"]'
# File must contain AT LEAST ONE of these
--expected 'file_contains_any:["success", "done", "finished"]'
# File must contain pattern EXACTLY N times
--expected 'file_contains:"ERROR" count:0' # No errors
Security
Sandboxing:
# Run in container
bash scripts/verify.sh --sandbox container ...
# Run with limited permissions
bash scripts/verify.sh --sandbox chroot --chroot-dir /tmp/sandbox ...
# Resource limits
bash scripts/verify.sh --max-memory 1GB --max-cpu 50% --timeout 300 ...
Secret Masking:
# Automatically mask common secret patterns in output
bash scripts/verify.sh --mask-secrets ...
Output Format
Verification Report
===================
Command: ./backup.sh --source=/data --dest=/backups
Exit Code: 0
Duration: 45.2s
Pre-State Captured:
Files: 1,247
Database tables: 23
Processes: 12
Post-State Captured:
Files: 1,248 (+1)
Database tables: 23 (unchanged)
Processes: 12 (unchanged)
Expected Results Verification:
✅ file_exists:/backups/backup-20240308.tar.gz
- Path exists: yes
- Size: 1.2GB (expected: >100MB) ✅
- Permissions: 644 ✅
- Created: 2024-03-08T10:30:15Z (after command start) ✅
- Hash (sha256): abc123... ✅
❌ file_integrity (custom check)
- Can extract archive: yes
- Can restore from backup: FAILED
- Error: "table users has wrong schema version"
Negative Assertions:
✅ file_modified:/data - No changes detected
✅ file_deleted:/important - No deletions detected
Async Effects:
✅ service_health (after 30s polling)
- Service responsive: yes
- Health check passed: yes
Result: FAILED
Discrepancy Analysis:
The backup file was created with correct size and permissions,
but integrity check reveals it cannot be restored. The schema
version mismatch suggests the backup captured incompatible data.
Recommendations:
1. Run schema migration before backup
2. Add schema version check to backup script
3. Include test restore in backup verification
Commands Run:
Pre-state capture: 0.5s
Command execution: 42.1s
Post-state capture: 0.4s
Verification: 2.2s
Total: 45.2s
Limitations
What we CAN'T verify:
- In-memory state changes (caches, variables)
- Browser/client-side state
- Side effects in systems we can't access
- Changes that happen after verification timeout
- Non-deterministic behavior that changes between runs
- Effects in distributed systems with eventual consistency (we can poll but may miss)
Known Issues:
- Time-of-check-time-of-use (TOCTOU) race conditions
- State changes between capture and verification
- Verification is only as good as the expected results definition
Trust But Verify
This skill implements "trust but verify" - we trust the command ran, but we verify it did what it claimed. Always remember:
- Exit code 0 doesn't mean success
- Success doesn't mean correctness
- Correctness doesn't mean safety
- Safety doesn't mean completeness
Use multiple verification layers for critical operations.
Notes
- Verification adds overhead. Use selectively for critical commands.
- Define expected results exhaustively - partial verification gives false confidence.
- Include negative assertions (what shouldn't happen) alongside positive ones.
- For long-running commands, use async verification with polling.
- Always include timeout to prevent hanging on failed commands.
- Use semantic matchers (structure, pattern) over exact string comparison when possible.
- Document in your expected results WHY each check matters - future you will thank you.
More from wojons/skills
adversarial-thinking
Apply systematic adversarial thinking patterns including devil's advocate, assumption busting, red teaming, and white hat security approaches
45devils-advocate
Challenge ideas, assumptions, and decisions by playing devil's advocate to identify weaknesses and prevent groupthink
41redteam
Think and act like an attacker to identify security vulnerabilities, weaknesses, and penetration vectors through adversarial security testing
37code-migration
Guide framework and library migrations with incremental strategies, breaking change analysis, compatibility testing, and automated migration tools
34observability-logging
Use logs as part of comprehensive observability strategy including metrics, traces, alerts, and dashboards for system understanding and operational excellence
34gap-analysis
Identify discrepancies between documented requirements and actual implementation through systematic comparison and analysis
34