visual-testing
Discovery Questions
Before implementing visual testing, gather context. Check .agents/qa-project-context.md first -- if it exists, use it and skip questions already answered there.
Tool Selection
- Playwright built-in or dedicated tool? Playwright's
toHaveScreenshotis free and requires no external service. Dedicated tools (Chromatic, Percy, Argos) add review workflows, browser rendering farms, and historical tracking. Choose based on team size and review needs. - Storybook in the project? If yes, Chromatic is the natural fit -- it captures every story as a visual test. If no Storybook, Playwright or Percy are better options.
- CI platform? Visual testing generates large artifacts (screenshots, diffs). Ensure CI has storage and the pipeline can handle the extra time.
Scope
- Full-page or component screenshots? Full-page catches layout issues but is sensitive to unrelated changes. Component-level screenshots are more stable and focused.
- Which pages/components are visually critical? Not everything needs visual testing. Focus on user-facing pages, marketing pages, design system components, and complex layouts.
- Which viewports? Desktop, tablet, mobile? Define the viewport matrix upfront.
Dynamic Content
- What content changes between runs? Dates, timestamps, user-generated content, analytics IDs, randomized content, advertisements, avatars. All must be masked or frozen.
- Are there animations or transitions? These cause false positives if not disabled or waited for.
- Does the page load external resources? Fonts, images from CDNs, third-party widgets can vary between runs.
Core Principles
1. Visual Tests Catch What Functional Tests Miss
Functional tests assert behavior: "clicking Submit shows a success message." Visual tests assert appearance: "the success message is green, correctly positioned, and does not overlap the form." Both are needed. Visual tests complement functional tests, they do not replace them.
2. Baseline Management Is the Hard Part
Taking screenshots is easy. Managing baselines -- updating them when design changes intentionally, reviewing diffs, coordinating approvals across a team -- is the real challenge. Invest in the review workflow early.
3. Dynamic Content Causes False Positives
Any content that changes between runs (timestamps, avatars, ads, random IDs) produces pixel differences that are not real regressions. Aggressively mask or freeze dynamic content. A visual test suite with a 10% false positive rate will be ignored within a month.
4. Threshold Tuning Is Iterative
The right diff threshold depends on the specific component, rendering engine, and what you consider "visually different." Start strict (zero tolerance), observe false positives, and loosen thresholds per-component as needed. Document why each threshold was chosen.
5. Screenshots Are Artifacts, Not Test Results
The screenshot file itself is evidence. Store it, version it, and make it accessible for review. A test that says "visual diff detected" without showing the diff is useless.
Playwright Visual Comparisons
Playwright's built-in toHaveScreenshot and toMatchSnapshot provide visual regression testing without external services.
Basic Screenshot Comparison
import { test, expect } from '@playwright/test';
test('dashboard matches baseline', async ({ page }) => {
await page.goto('/dashboard');
// Wait for all data to load before capturing
await expect(page.getByRole('heading', { name: 'Dashboard' })).toBeVisible();
await expect(page.getByTestId('chart-container')).toBeVisible();
await expect(page).toHaveScreenshot('dashboard.png');
});
On first run, this creates the baseline screenshot. On subsequent runs, it compares against the baseline and fails if pixels differ beyond the threshold.
Configuration Options
// Comparison with explicit thresholds
await expect(page).toHaveScreenshot('dashboard.png', {
maxDiffPixels: 100, // Allow up to 100 pixels to differ
// OR
maxDiffPixelRatio: 0.01, // Allow up to 1% of pixels to differ
threshold: 0.2, // Per-pixel color difference tolerance (0-1)
animations: 'disabled', // Freeze CSS animations and transitions
caret: 'hide', // Hide blinking cursor
timeout: 15000, // Wait up to 15s for stable screenshot
});
When to use which threshold:
| Option | Use When |
|---|---|
maxDiffPixels: 0 |
Pixel-perfect components (icons, logos, design system atoms) |
maxDiffPixels: 50-100 |
Full-page layouts where antialiasing varies slightly |
maxDiffPixelRatio: 0.01 |
Full-page screenshots where absolute pixel count varies with viewport |
threshold: 0.2 |
Cross-browser testing where color rendering differs slightly |
playwright.config.ts Visual Settings
import { defineConfig } from '@playwright/test';
export default defineConfig({
expect: {
toHaveScreenshot: {
maxDiffPixelRatio: 0.005, // Global default: 0.5% tolerance
animations: 'disabled',
caret: 'hide',
},
toMatchSnapshot: {
maxDiffPixelRatio: 0.005,
},
},
projects: [
{
name: 'visual-desktop',
use: {
viewport: { width: 1280, height: 720 },
colorScheme: 'light',
},
testMatch: /.*visual.*\.spec\.ts/,
},
{
name: 'visual-mobile',
use: {
viewport: { width: 375, height: 667 },
colorScheme: 'light',
isMobile: true,
},
testMatch: /.*visual.*\.spec\.ts/,
},
],
});
Masking Dynamic Regions
test('profile page visual test', async ({ page }) => {
await page.goto('/profile');
await expect(page.getByRole('heading', { name: 'Profile' })).toBeVisible();
await expect(page).toHaveScreenshot('profile.png', {
mask: [
page.getByTestId('user-avatar'), // User-specific image
page.getByTestId('last-login-time'), // Timestamp
page.getByTestId('activity-feed'), // Dynamic content
],
maskColor: '#FF00FF', // Visible mask color for debugging
});
});
Freezing Dynamic Content Before Capture
test('dashboard with frozen data', async ({ page }) => {
// Freeze time to eliminate timestamp differences
await page.clock.install({ time: new Date('2026-01-15T10:00:00Z') });
// Stub API to return deterministic data
await page.route('**/api/dashboard', async (route) => {
await route.fulfill({
json: {
stats: { users: 1234, revenue: 56789 },
chart: [10, 20, 30, 40, 50],
},
});
});
// Disable font loading to prevent FOUT (Flash of Unstyled Text)
await page.route('**/*.woff2', (route) => route.abort());
await page.goto('/dashboard');
await expect(page.getByTestId('chart-container')).toBeVisible();
// Wait for animations to complete
await page.evaluate(() => {
document.getAnimations().forEach((a) => a.finish());
});
await expect(page).toHaveScreenshot('dashboard-frozen.png', {
animations: 'disabled',
});
});
Handling Animations
Two options: use Playwright's built-in animations: 'disabled' in toHaveScreenshot (preferred), or inject a style tag that zeros out animation-duration and transition-duration for all elements. Always wait for the element to be visible before capturing.
Component-Level Screenshots
test('data table renders correctly with various states', async ({ page }) => {
await page.goto('/admin/users');
await expect(page.getByRole('table')).toBeVisible();
// Screenshot just the table component, not the full page
const table = page.getByRole('table', { name: 'Users' });
await expect(table).toHaveScreenshot('users-table.png');
});
test('empty state renders correctly', async ({ page }) => {
await page.route('**/api/users', (route) => route.fulfill({ json: { users: [] } }));
await page.goto('/admin/users');
const emptyState = page.getByTestId('empty-state');
await expect(emptyState).toHaveScreenshot('users-empty-state.png');
});
test('error state renders correctly', async ({ page }) => {
await page.route('**/api/users', (route) => route.fulfill({ status: 500 }));
await page.goto('/admin/users');
const errorState = page.getByTestId('error-state');
await expect(errorState).toHaveScreenshot('users-error-state.png');
});
Updating Baselines
# Update all baselines (when design intentionally changes)
npx playwright test --update-snapshots
# Update baselines for specific tests only
npx playwright test visual-dashboard --update-snapshots
# Review what changed before committing
git diff --stat # See which baseline files changed
# Open the test report to visually review each change
npx playwright show-report
Baseline update workflow:
- Design change is implemented
- Run visual tests -- they fail with expected diffs
- Review each diff: is the change intentional?
- Update baselines:
npx playwright test --update-snapshots - Commit updated baselines with a descriptive message referencing the design change
- PR reviewers verify the baseline changes look correct
Dedicated Visual Testing Tools
Tool Comparison
| Tool | Best When | Integration | Key Feature |
|---|---|---|---|
| Chromatic | Project uses Storybook | Every story = a visual test | Review/approval UI, cross-browser |
| Percy | No Storybook, need multi-browser | Any test framework via SDK | Multi-width captures, CSS overrides |
| Argos CI | Open-source preference, budget-conscious | Playwright reporter | Self-hosting option, generous free tier |
Chromatic (Storybook)
# GitHub Actions
- uses: chromaui/action@latest
with:
projectToken: ${{ secrets.CHROMATIC_PROJECT_TOKEN }}
exitZeroOnChanges: true # Changes go to review, not CI failure
onlyChanged: true # Only test stories affected by code changes
Workflow: push code, CI captures screenshots, reviewers approve/reject in Chromatic UI, PR merges after approval.
Percy (Any Framework)
import { percySnapshot } from '@percy/playwright';
test('checkout page visual', async ({ page }) => {
await page.goto('/checkout');
await percySnapshot(page, 'Checkout Page', {
widths: [375, 768, 1280],
percyCSS: `.ad-banner { display: none !important; }`,
});
});
// CI: npx percy exec -- npx playwright test --grep @visual
Argos CI (Open Source)
import { argosScreenshot } from '@argos-ci/playwright';
test('pricing page visual', async ({ page }) => {
await page.goto('/pricing');
await argosScreenshot(page, 'pricing-page', { viewports: ['macbook-16', 'iphone-x'] });
});
Responsive Visual Testing
Test at breakpoints where layout changes, not at every possible viewport. Define a viewport matrix based on analytics data.
const VISUAL_VIEWPORTS = [
{ name: 'mobile', width: 375, height: 667, isMobile: true },
{ name: 'tablet', width: 768, height: 1024, isMobile: false },
{ name: 'desktop', width: 1280, height: 720, isMobile: false },
] as const;
for (const vp of VISUAL_VIEWPORTS) {
test.describe(`Visual @ ${vp.name}`, () => {
test.use({ viewport: { width: vp.width, height: vp.height }, isMobile: vp.isMobile });
test('homepage layout', async ({ page }) => {
await page.goto('/');
await expect(page.getByRole('main')).toBeVisible();
await expect(page).toHaveScreenshot(`homepage-${vp.name}.png`, {
fullPage: true,
animations: 'disabled',
});
});
});
}
Alternatively, use Playwright projects (in playwright.config.ts) to define viewport configurations and run all visual tests across them automatically.
Baseline Management
Git-Stored Baselines
Playwright stores baselines alongside test files by default.
e2e/
tests/
visual/
dashboard.visual.spec.ts
dashboard.visual.spec.ts-snapshots/
dashboard-chromium-linux.png # Platform-specific baselines
dashboard-chromium-darwin.png
dashboard-firefox-linux.png
Pros: Baselines are versioned with the code, reviewed in PRs, and available offline.
Cons: Repository size grows. Large baseline files bloat git history.
Use Git LFS (.gitattributes: *.png filter=lfs diff=lfs merge=lfs -text) to prevent repository bloat. Customize snapshot paths with snapshotPathTemplate in playwright.config.ts.
Platform-Specific Baselines
Playwright renders differently across operating systems. Use Docker in CI for consistency:
jobs:
visual-tests:
runs-on: ubuntu-latest
container:
image: mcr.microsoft.com/playwright:v1.50.0-noble
steps:
- uses: actions/checkout@v4
- run: npm ci
- run: npx playwright test --grep @visual
Generate baselines in CI (not locally) so they always match the CI rendering environment.
Review and Approval Workflow
- CI detects visual diff, uploads expected/actual/diff images as artifacts
- PR reviewer examines diffs
- Intentional change: update baselines (
--update-snapshots), re-commit - Unintentional regression: fix the code, re-run tests
Anti-Patterns
Full-Page Screenshots Without Masking
Capturing entire pages without masking dynamic content (timestamps, user avatars, live data). Every run produces diffs that are not real regressions. The team stops trusting visual tests and ignores them. Always mask dynamic regions and freeze time-dependent content.
No Artifact Storage in CI
Running visual tests in CI without uploading screenshot artifacts. When a test fails, there is no way to see the actual vs. expected image. The developer has to reproduce locally, which may produce different results due to platform rendering differences. Always upload screenshots, diffs, and test reports as CI artifacts.
No Review Process for Baseline Updates
Running --update-snapshots and committing without reviewing the changes. Regressions get baked into baselines and become invisible. Every baseline update should go through code review. Reviewers must look at the before/after images, not just the file diff.
Testing Visual Stability of Unstable Components
Writing visual tests for components that change frequently by design (A/B tests, personalized content, frequently updated marketing banners). These tests fail constantly with intentional changes, creating noise. Either exclude these components from visual testing or stub their content.
Pixel-Perfect Thresholds on Full Pages
Setting maxDiffPixels: 0 on full-page screenshots. Sub-pixel rendering differences across browser versions, OS updates, and font rendering changes produce false positives. Use maxDiffPixelRatio: 0.005 (0.5%) for full pages. Reserve zero tolerance for small, critical components like logos and icons.
No Consistent Rendering Environment
Running visual tests on developer machines (macOS, Windows, various displays) and expecting baselines to match. Font rendering, antialiasing, and scaling differ across platforms. Run visual tests in a consistent CI environment (Docker) and generate baselines there.
Skipping Animation Handling
Not disabling animations before taking screenshots. CSS transitions and JavaScript animations captured mid-frame produce random diffs. Use animations: 'disabled' in Playwright or inject CSS to zero-out animation durations.
Done When
- Baseline screenshots captured in CI (not locally) and committed to the repository.
- Diff threshold configured per component type (e.g.,
maxDiffPixels: 0for icons,maxDiffPixelRatio: 0.005for full pages). - Dynamic content masked or frozen before capture (timestamps, user avatars, live API data).
- CI pipeline blocks merge when a visual diff exceeds the configured threshold.
- Review workflow defined: who reviews diffs, how intentional changes get baseline updates, and PR reviewers sign off on baseline commits.
Related Skills
- playwright-automation -- The foundation for Playwright-based visual tests; Page Object Model, fixtures, and test structure apply to visual tests too.
- ci-cd-integration -- Pipeline configuration for running visual tests, uploading artifacts, and integrating review workflows.
- cross-browser-testing -- Visual tests across browsers catch rendering differences; viewport matrix and browser project configuration overlap.
- qa-project-context -- The project context file captures which pages are visually critical and what dynamic content exists.