Production Smoke Suite
Production Smoke Suite
Production smoke testing is the last line of defense between a deployment and a broken user experience. Unlike comprehensive end-to-end suites that may run for 30 minutes or more, a smoke suite must complete in under 2 minutes and verify that the application's critical paths are functional. This skill covers the philosophy behind production smoke testing, the architecture of a reliable smoke suite, concrete implementation patterns with Playwright and raw HTTP clients, integration with deployment pipelines, and strategies for handling third-party dependencies, authentication flows, and flaky network conditions in real production environments.
Core Principles
1. Speed Over Coverage
A smoke suite is not a regression suite. Its sole purpose is to answer one question: "Did this deployment break anything critical?" If the suite takes more than 2 minutes, it is too slow. Every test must justify its inclusion by covering a revenue-critical or user-critical path. Ruthlessly prune tests that do not protect high-value user journeys.
2. Production-Safe Execution
Smoke tests run against real production infrastructure. They must never create permanent data, modify user accounts, trigger billing events, or send real notifications. Every test must be read-only or use dedicated smoke-test accounts with sandboxed permissions. A smoke test that accidentally charges a customer or sends spam emails is worse than having no smoke tests at all.
3. Deterministic Assertions
Production environments experience variable latency, CDN caching, and third-party service delays. Smoke tests must use generous timeouts, retry logic, and assertions that tolerate minor variations. Check that a response contains expected fields rather than expecting exact string matches. Verify that status codes are in acceptable ranges rather than asserting a single value.