Phoenix Playwright Test Writing

Write end-to-end tests for Phoenix using Playwright. Tests live in app/tests/ and follow established patterns.

Timeout Policy

Do not pass timeout args in test code under app/tests.
Tune timing centrally in app/playwright.config.ts (global timeout, expect.timeout, use.navigationTimeout, and webServer.timeout).

Quick Start

import { expect, test } from "@playwright/test";
import { randomUUID } from "crypto";

test.describe("Feature Name", () => {
  test.beforeEach(async ({ page }) => {
    await page.goto(`/login`);
    await page.getByLabel("Email").fill("admin@localhost");
    await page.getByLabel("Password").fill("admin123");
    await page.getByRole("button", { name: "Log In", exact: true }).click();
    await page.waitForURL("**/projects");
  });

  test("can do something", async ({ page }) => {
    // Test implementation
  });
});

Test Credentials

User	Email	Password	Role
Admin	admin@localhost	admin123	admin
Member	member@localhost.com	member123	member
Viewer	viewer@localhost.com	viewer123	viewer

Selector Patterns (Priority Order)

Role selectors (most robust):

page.getByRole("button", { name: "Save" });
page.getByRole("link", { name: "Datasets" });
page.getByRole("tab", { name: /Evaluators/i });
page.getByRole("menuitem", { name: "Edit" });
page.getByRole("cell", { name: "my-item" });
page.getByRole("heading", { name: "Title" });
page.getByRole("dialog");
page.getByRole("textbox", { name: "Name" });
page.getByRole("combobox", { name: /mapping/i });

Label selectors:

page.getByLabel("Email");
page.getByLabel("Dataset Name");
page.getByLabel("Description");

Text selectors:

page.getByText("No evaluators added");
page.getByPlaceholder("Search...");

Test IDs (when available):
```
page.getByTestId("modal");
```

CSS locators (last resort):

page.locator('button:has-text("Save")');

Common UI Patterns

Dropdown Menus

// Click button to open dropdown
await page.getByRole("button", { name: "New Dataset" }).click();
// Select menu item
await page.getByRole("menuitem", { name: "New Dataset" }).click();

Nested Menus (Submenus)

// Open menu, hover over submenu trigger, click submenu item
await page.getByRole("button", { name: "Add evaluator" }).click();
await page
  .getByRole("menuitem", { name: "Use LLM evaluator template" })
  .hover();
await page.getByRole("menuitem", { name: /correctness/i }).click();

// IMPORTANT: Always use getByRole("menuitem") for submenu items, not getByText()
// Playwright's auto-waiting handles the submenu appearance timing
// ❌ BAD - flaky in CI:
// await page.getByText("ExactMatch").first().click();
// ✅ GOOD - reliable:
// await page.getByRole("menuitem", { name: /ExactMatch/i }).click();

Dialogs/Modals

// Wait for dialog
await expect(page.getByRole("dialog")).toBeVisible();
// Fill form in dialog
await page.getByLabel("Name").fill("test-name");
// Submit
await page.getByRole("button", { name: "Create" }).click();
// Wait for close
await expect(page.getByRole("dialog")).not.toBeVisible();

Tables with Row Actions

// Find row by cell content
const row = page.getByRole("row").filter({
  has: page.getByRole("cell", { name: "item-name" }),
});
// Click action button in row (usually last button)
await row.getByRole("button").last().click();
// Select action from menu
await page.getByRole("menuitem", { name: "Edit" }).click();

Tabs

await page.getByRole("tab", { name: /Evaluators/i }).click();
await page.waitForURL("**/evaluators");
await expect(page.getByRole("tab", { name: /Evaluators/i })).toHaveAttribute(
  "aria-selected",
  "true",
);

Form Inputs in Sections

// When multiple textboxes exist, scope to section
const systemSection = page.locator('button:has-text("System")');
const systemTextbox = systemSection
  .locator("..")
  .locator("..")
  .getByRole("textbox");
await systemTextbox.fill("content");

Serial Tests (Shared State)

Use test.describe.serial when tests depend on each other:

test.describe.serial("Workflow", () => {
  const itemName = `item-${randomUUID()}`;

  test("step 1: create item", async ({ page }) => {
    // Creates itemName
  });

  test("step 2: edit item", async ({ page }) => {
    // Uses itemName from previous test
  });

  test("step 3: verify edits", async ({ page }) => {
    // Verifies itemName was edited
  });
});

Assertions

// Visibility
await expect(element).toBeVisible();
await expect(element).not.toBeVisible();

// Text content
await expect(element).toHaveText("expected");
await expect(element).toContainText("partial");

// Attributes
await expect(element).toHaveAttribute("aria-selected", "true");

// Input values
await expect(input).toHaveValue("expected value");

// URL
await page.waitForURL("**/datasets/**/examples");

Navigation Patterns

// Direct navigation
await page.goto("/datasets");
await page.waitForURL("**/datasets");

// Click navigation
await page.getByRole("link", { name: "Datasets" }).click();
await page.waitForURL("**/datasets");

// Extract ID from URL
const url = page.url();
const match = url.match(/datasets\/([^/]+)/);
const datasetId = match ? match[1] : "";

// Navigate with query params
await page.goto(`/playground?datasetId=${datasetId}`);

Running Tests

Before running Playwright tests, build the app so E2E runs against the latest frontend changes:

pnpm run build

# Run specific test file
pnpm exec playwright test tests/server-evaluators.spec.ts --project=chromium

# Run with UI mode
pnpm exec playwright test --ui

# Run specific test by name
pnpm exec playwright test -g "can create"

# Debug mode
pnpm exec playwright test --debug

Avoiding Interactive Report Server

By default, Playwright serves an HTML report after tests finish and waits for Ctrl+C, which can cause command timeouts. Use these options to avoid this:

# Use list reporter (no interactive server)
pnpm exec playwright test tests/example.spec.ts --project=chromium --reporter=list

# Use dot reporter for minimal output
pnpm exec playwright test tests/example.spec.ts --project=chromium --reporter=dot

# Set CI mode to disable interactive features
CI=1 pnpm exec playwright test tests/example.spec.ts --project=chromium

Recommended for automation: Always use --reporter=list or CI=1 when running tests programmatically to ensure the command exits cleanly after tests complete.

Phoenix-Specific Pages

Page	URL Pattern	Key Elements
Datasets	`/datasets`	Table, "New Dataset" button
Dataset Detail	`/datasets/{id}/examples`	Tabs (Experiments, Examples, Evaluators, Versions)
Dataset Evaluators	`/datasets/{id}/evaluators`	"Add evaluator" button, evaluators table
Playground	`/playground`	Prompts section, Experiment section
Playground + Dataset	`/playground?datasetId={id}`	Dataset selector, Evaluators button
Prompts	`/prompts`	"New Prompt" button, prompts table
Settings	`/settings/general`	"Add User" button, users table

UI Exploration with agent-browser

When selectors are unclear, use agent-browser to explore the Phoenix UI. For detailed agent-browser usage, invoke the /agent-browser skill.

Quick Reference for Phoenix

# Open Phoenix page (dev server runs on port 6006)
agent-browser open "http://localhost:6006/datasets"

# Get interactive snapshot with element refs
agent-browser snapshot -i

# Click using refs from snapshot
agent-browser click @e5

# Fill form fields
agent-browser fill @e2 "test value"

# Get element text
agent-browser get text @e1

Discovering Selectors Workflow

Open the page: agent-browser open "http://localhost:6006/datasets"
Get snapshot: agent-browser snapshot -i
Find element refs in output (e.g., @e1 [button] "New Dataset")
Interact: agent-browser click @e1
Re-snapshot after navigation/DOM changes: agent-browser snapshot -i

Translating to Playwright

agent-browser output	Playwright selector
`@e1 [button] "Save"`	`page.getByRole("button", { name: "Save" })`
`@e2 [link] "Datasets"`	`page.getByRole("link", { name: "Datasets" })`
`@e3 [textbox] "Name"`	`page.getByRole("textbox", { name: "Name" })`
`@e4 [menuitem] "Edit"`	`page.getByRole("menuitem", { name: "Edit" })`
`@e5 [tab] "Evaluators 0"`	`page.getByRole("tab", { name: /Evaluators/i })`

File Naming

Feature tests: {feature-name}.spec.ts
Access control: {role}-access.spec.ts
Rate limiting: {feature}.rate-limit.spec.ts (runs last)

Common Gotchas

Dialog not closing: Wait for a deterministic post-action signal (e.g., dialog hidden + success row visible)
Multiple elements: Use .first(), .last(), or .nth(n)
Dynamic content: Use regex in name: { name: /pattern/i }
Flaky waits: Prefer waitForURL over waitForTimeout
Menu not appearing: Wait for specific menu state/element visibility

Debugging Flaky Tests

Critical Lessons Learned

Don't assume parallelism is the problem
- Phoenix tests run with 7 parallel workers without issues
- The app handles concurrent logins, database operations, and session management properly
- If tests fail with parallelism, it's usually a test timing issue, not infrastructure
- Playwright's browser context isolation is robust - each worker gets isolated cookies/sessions

waitForTimeout is almost always wrong

page.waitForTimeout() is the #1 cause of flakiness in Phoenix tests
Arbitrary timeouts race against rendering and network speed

Always replace with state-based waits:

// ❌ BAD - flaky, races against rendering
await page.waitForTimeout(500);
await element.click();

// ✅ GOOD - waits for actual state
await element.waitFor({ state: "visible" });
await element.click();

Test the actual failure before fixing
- Run tests with parallelism enabled to see what actually fails
- Check error messages - they often point to the real issue
- Don't optimize prematurely (e.g., caching auth state) if it's not the problem
Phoenix test infrastructure is solid
- In-memory SQLite works fine with parallel tests
- No need for per-worker databases
- No need for auth state caching
- Tests use randomUUID() for data isolation - this works well

Debugging Workflow

When tests are flaky:

Run with parallelism multiple times to catch intermittent failures:

for i in 1 2 3 4 5; do
  pnpm exec playwright test --project=chromium --reporter=dot
done

Look for waitForTimeout usage - replace with proper waits:
```
grep -r "waitForTimeout" app/tests/
```
Check for race conditions in element interactions:
- Wait for element visibility before interacting
- Wait for network idle when needed: page.waitForLoadState("networkidle")
- Use waitForURL after navigation actions
Verify selectors are stable:
- Avoid CSS selectors that depend on DOM structure
- Use role/label selectors that match ARIA attributes
- Test selectors don't break when UI updates

Run with trace on failure to see what happened:

pnpm exec playwright test --trace on-first-retry

Common Flaky Patterns and Fixes

Flaky Pattern	Root Cause	Fix
Submenu item not found	Using `getByText()` instead of `getByRole()`	Use `getByRole("menuitem", { name: /pattern/i })` for submenu items
Menu click fails	Menu not fully rendered	`await menu.waitFor({ state: "visible" })` before click
Dialog assertion fails	Dialog animation not complete	Assert specific completion signal (hidden dialog + next-state element)
Navigation timeout	Page still loading	Remove `waitForLoadState("networkidle")` - it's flaky in CI
Element not found	Dynamic content loading	Wait for element visibility, not arbitrary timeout
Stale element	Re-render between locate and click	Store locator, not element handle

Test Stability Best Practices

Use proper waits:

// Wait for element state
await element.waitFor({ state: "visible" | "hidden" | "attached" })

// Wait for network
await page.waitForLoadState("networkidle" | "domcontentloaded" | "load")

// Wait for URL change
await page.waitForURL("**/expected-path")

Use unique test data:

const uniqueName = `test-${randomUUID()}`;

Prefer role selectors - they're less brittle:

page.getByRole("button", { name: "Save" }) // ✅ Good
page.locator('button.save-btn') // ❌ Brittle

Don't fight animations - wait for them:

await expect(dialog).not.toBeVisible();

Verify URL changes after navigation:
```
await page.waitForURL("**/datasets");
```

phoenix-playwright-tests