skills/wojons/skills/testing-expected-results

testing-expected-results

SKILL.md

Testing Expected Results

Run real commands and verify they produce the ACTUAL side effects and outputs you expect - not just "exit code 0." Catches the dangerous cases where commands "succeed" but don't do what they claim.

When to use me

Use this skill when:

  • A command returns 0 but you're not sure it actually worked
  • You need to verify side effects (files created, data changed, services running)
  • Exit code checking gives false confidence
  • "It ran without error" isn't enough proof
  • Commands have complex side effects across multiple systems
  • You're debugging "why did the deploy succeed but the app is down?"

What I do

1. Capture Pre-State

Before running the command, capture:

  • Filesystem state (files, directories, permissions)
  • Database state (records, schema)
  • Process state (running services)
  • Network state (ports, connections)
  • Environment variables

2. Run the Command

Execute the actual command with:

  • Timeout protection
  • Resource limits
  • Security sandboxing
  • Output capture (stdout, stderr)
  • Exit code capture

3. Capture Post-State

After the command completes, capture the same state.

4. Smart Comparison

Compare actual vs expected with intelligence:

  • Exact match - For deterministic output
  • Pattern match - For variable content (timestamps, UUIDs)
  • Range match - For numeric values (response time, file size)
  • Structure match - For JSON/XML (ignore key order)
  • Semantic match - For content meaning (not just bytes)
  • Existence check - For "should exist" / "should not exist"
  • Delta check - For "should have changed by X"

5. Side Effect Verification

Verify specific side effects:

  • Filesystem - File created/modified/deleted, permissions changed
  • Database - Records inserted/updated, schema migrated
  • Processes - Service started/stopped/restarted
  • Network - Port bound, connection made, API called
  • External - Cloud resources created, messages queued

6. Async/Delayed Effect Handling

For commands with eventual consistency:

  • Poll with configurable intervals
  • Wait for specific conditions
  • Timeout handling
  • Retry logic

Examples

# Verify a backup actually created a valid backup
bash scripts/verify.sh \
  --command "./backup.sh --source=/data --dest=/backups" \
  --expected 'file_exists:/backups/backup-$(date +%Y%m%d).tar.gz' \
  --expected 'file_size:>100MB' \
  --expected 'file_integrity:sha256' \
  --negative 'file_modified:/data' \
  --timeout 300

# Verify a deployment actually started the service
bash scripts/verify.sh \
  --command "./deploy.sh --version=v2.0.0" \
  --expected 'process_running:my-service' \
  --expected 'port_listening:8080' \
  --expected 'http_healthy:http://localhost:8080/health' \
  --poll-interval 5 --timeout 120

# Verify a database migration actually changed the schema
bash scripts/verify.sh \
  --command "./migrate.sh up" \
  --expected 'db_table_exists:new_table' \
  --expected 'db_column_exists:new_table.new_column' \
  --expected 'db_constraint:unique_on_email' \
  --db-connection "postgresql://localhost/mydb"

# Verify an export actually produced correct data
bash scripts/verify.sh \
  --command "./export.sh --format=csv --output=/exports/users.csv" \
  --expected 'file_exists:/exports/users.csv' \
  --expected 'file_contains:"user_id,email,name"' \
  --expected 'line_count:>1000' \
  --expected 'csv_valid:yes' \
  --negative 'file_contains:ERROR'

# Verify negative side effects (what shouldn't happen)
bash scripts/verify.sh \
  --command "./cleanup.sh --days=30" \
  --expected 'file_deleted:/tmp/old_stuff' \
  --negative 'file_exists:/important/data' \
  --negative 'file_deleted:/critical/config'

Verification Types

Filesystem Effects

file_exists:
  path: /path/to/file
  optional:
    - min_size: 100MB       # File must be at least this big
    - max_size: 1GB        # File must be at most this big
    - permissions: 644     # Specific permissions
    - owner: appuser       # Specific owner
    - modified_after: now # Modified after command started
    - content_type: text  # MIME type or magic number

file_contains:
  path: /path/to/file
  pattern: "string or regex"
  optional:
    - count: 1             # Must appear exactly N times
    - line_number: 5       # Must be on specific line

file_hash:
  path: /path/to/file
  algorithm: sha256        # sha256, md5, sha512
  expected: abc123...      # Hash value (optional - just check hash exists)

directory_structure:
  path: /path/to/dir
  expected: |
    dir/
    dir/file1.txt
    dir/subdir/
    dir/subdir/file2.txt

Database Effects

db_table_exists:
  name: users
  connection: ${DB_URL}

db_column_exists:
  table: users
  column: email
  type: varchar(255)
  nullable: false

db_row_count:
  table: users
  where: "created_at > NOW() - INTERVAL '1 day'"
  expected: 100
  tolerance: +/- 10        # Allow 90-110

db_query_result:
  query: "SELECT COUNT(*) FROM users WHERE active = true"
  expected: "> 1000"

Process Effects

process_running:
  name: my-service
  optional:
    - user: appuser
    - cpu_percent: < 50
    - memory_mb: < 1024
    - uptime_seconds: > 60

port_listening:
  port: 8080
  protocol: tcp           # tcp, udp
  optional:
    - interface: 0.0.0.0   # Specific bind address
    - process_name: app   # Must be owned by this process

Network Effects

http_request:
  url: http://localhost:8080/health
  method: GET
  expected_status: 200
  optional:
    - timeout: 5
    - expected_body: '{"status": "healthy"}'
    - expected_headers: 'Content-Type: application/json'
    - retry: 3

tcp_connect:
  host: localhost
  port: 5432
  timeout: 5

Content Verification

csv_valid:
  file: /path/to/file.csv
  expected_columns: id,name,email
  row_count: "> 100"

json_valid:
  file: /path/to/file.json
  schema: /path/to/schema.json  # JSON Schema validation
  required_paths:
    - $.status
    - $.data.users[0].name

Comparison Strategies

Handling Non-Determinism

Timestamps:

# Match any ISO8601 timestamp
--expected 'file_contains:{{TIMESTAMP}}'

# Match timestamp within range
--expected 'file_contains:{{TIMESTAMP_RANGE:2024-01-01,2024-12-31}}'

UUIDs:

# Match any UUID
--expected 'file_contains:{{UUID}}'

# Match UUID pattern but validate it
--expected 'file_contains:{{UUID_FORMAT}}'

Order-Independent:

# For JSON arrays, sets, etc.
--expected 'json_path:$.data.items contains [1,2,3] (any order)'

Partial Matching

# File must contain ALL these patterns
--expected 'file_contains_all:["success", "completed", "exit 0"]'

# File must contain AT LEAST ONE of these
--expected 'file_contains_any:["success", "done", "finished"]'

# File must contain pattern EXACTLY N times
--expected 'file_contains:"ERROR" count:0'  # No errors

Security

Sandboxing:

# Run in container
bash scripts/verify.sh --sandbox container ...

# Run with limited permissions
bash scripts/verify.sh --sandbox chroot --chroot-dir /tmp/sandbox ...

# Resource limits
bash scripts/verify.sh --max-memory 1GB --max-cpu 50% --timeout 300 ...

Secret Masking:

# Automatically mask common secret patterns in output
bash scripts/verify.sh --mask-secrets ...

Output Format

Verification Report
===================
Command: ./backup.sh --source=/data --dest=/backups
Exit Code: 0
Duration: 45.2s

Pre-State Captured:
  Files: 1,247
  Database tables: 23
  Processes: 12

Post-State Captured:
  Files: 1,248 (+1)
  Database tables: 23 (unchanged)
  Processes: 12 (unchanged)

Expected Results Verification:
  ✅ file_exists:/backups/backup-20240308.tar.gz
     - Path exists: yes
     - Size: 1.2GB (expected: >100MB) ✅
     - Permissions: 644 ✅
     - Created: 2024-03-08T10:30:15Z (after command start) ✅
     - Hash (sha256): abc123...
  ❌ file_integrity (custom check)
     - Can extract archive: yes
     - Can restore from backup: FAILED
     - Error: "table users has wrong schema version"
     
Negative Assertions:
  ✅ file_modified:/data - No changes detected
  ✅ file_deleted:/important - No deletions detected

Async Effects:
  ✅ service_health (after 30s polling)
     - Service responsive: yes
     - Health check passed: yes

Result: FAILED

Discrepancy Analysis:
  The backup file was created with correct size and permissions,
  but integrity check reveals it cannot be restored. The schema
  version mismatch suggests the backup captured incompatible data.

Recommendations:
  1. Run schema migration before backup
  2. Add schema version check to backup script
  3. Include test restore in backup verification

Commands Run:
  Pre-state capture: 0.5s
  Command execution: 42.1s
  Post-state capture: 0.4s
  Verification: 2.2s
  Total: 45.2s

Limitations

What we CAN'T verify:

  • In-memory state changes (caches, variables)
  • Browser/client-side state
  • Side effects in systems we can't access
  • Changes that happen after verification timeout
  • Non-deterministic behavior that changes between runs
  • Effects in distributed systems with eventual consistency (we can poll but may miss)

Known Issues:

  • Time-of-check-time-of-use (TOCTOU) race conditions
  • State changes between capture and verification
  • Verification is only as good as the expected results definition

Trust But Verify

This skill implements "trust but verify" - we trust the command ran, but we verify it did what it claimed. Always remember:

  • Exit code 0 doesn't mean success
  • Success doesn't mean correctness
  • Correctness doesn't mean safety
  • Safety doesn't mean completeness

Use multiple verification layers for critical operations.

Notes

  • Verification adds overhead. Use selectively for critical commands.
  • Define expected results exhaustively - partial verification gives false confidence.
  • Include negative assertions (what shouldn't happen) alongside positive ones.
  • For long-running commands, use async verification with polling.
  • Always include timeout to prevent hanging on failed commands.
  • Use semantic matchers (structure, pattern) over exact string comparison when possible.
  • Document in your expected results WHY each check matters - future you will thank you.
Weekly Installs
3
Repository
wojons/skills
GitHub Stars
1
First Seen
5 days ago
Installed on
gemini-cli3
github-copilot3
codex3
kimi-cli3
cursor3
amp3