phoenix-playwright-tests
Phoenix Playwright Test Writing
Write end-to-end tests for Phoenix using Playwright. Tests live in app/tests/ and follow established patterns.
Timeout Policy
- Do not pass timeout args in test code under
app/tests. - Tune timing centrally in
app/playwright.config.ts(globaltimeout,expect.timeout,use.navigationTimeout, andwebServer.timeout).
Quick Start
import { expect, test } from "@playwright/test";
import { randomUUID } from "crypto";
test.describe("Feature Name", () => {
test.beforeEach(async ({ page }) => {
await page.goto(`/login`);
await page.getByLabel("Email").fill("admin@localhost");
await page.getByLabel("Password").fill("admin123");
await page.getByRole("button", { name: "Log In", exact: true }).click();
await page.waitForURL("**/projects");
});
test("can do something", async ({ page }) => {
// Test implementation
});
});
Test Credentials
| User | Password | Role | |
|---|---|---|---|
| Admin | admin@localhost | admin123 | admin |
| Member | member@localhost.com | member123 | member |
| Viewer | viewer@localhost.com | viewer123 | viewer |
Selector Patterns (Priority Order)
-
Role selectors (most robust):
page.getByRole("button", { name: "Save" }); page.getByRole("link", { name: "Datasets" }); page.getByRole("tab", { name: /Evaluators/i }); page.getByRole("menuitem", { name: "Edit" }); page.getByRole("cell", { name: "my-item" }); page.getByRole("heading", { name: "Title" }); page.getByRole("dialog"); page.getByRole("textbox", { name: "Name" }); page.getByRole("combobox", { name: /mapping/i }); -
Label selectors:
page.getByLabel("Email"); page.getByLabel("Dataset Name"); page.getByLabel("Description"); -
Text selectors:
page.getByText("No evaluators added"); page.getByPlaceholder("Search..."); -
Test IDs (when available):
page.getByTestId("modal"); -
CSS locators (last resort):
page.locator('button:has-text("Save")');
Common UI Patterns
Dropdown Menus
// Click button to open dropdown
await page.getByRole("button", { name: "New Dataset" }).click();
// Select menu item
await page.getByRole("menuitem", { name: "New Dataset" }).click();
Nested Menus (Submenus)
// Open menu, hover over submenu trigger, click submenu item
await page.getByRole("button", { name: "Add evaluator" }).click();
await page
.getByRole("menuitem", { name: "Use LLM evaluator template" })
.hover();
await page.getByRole("menuitem", { name: /correctness/i }).click();
// IMPORTANT: Always use getByRole("menuitem") for submenu items, not getByText()
// Playwright's auto-waiting handles the submenu appearance timing
// ❌ BAD - flaky in CI:
// await page.getByText("ExactMatch").first().click();
// ✅ GOOD - reliable:
// await page.getByRole("menuitem", { name: /ExactMatch/i }).click();
Dialogs/Modals
// Wait for dialog
await expect(page.getByRole("dialog")).toBeVisible();
// Fill form in dialog
await page.getByLabel("Name").fill("test-name");
// Submit
await page.getByRole("button", { name: "Create" }).click();
// Wait for close
await expect(page.getByRole("dialog")).not.toBeVisible();
Tables with Row Actions
// Find row by cell content
const row = page.getByRole("row").filter({
has: page.getByRole("cell", { name: "item-name" }),
});
// Click action button in row (usually last button)
await row.getByRole("button").last().click();
// Select action from menu
await page.getByRole("menuitem", { name: "Edit" }).click();
Tabs
await page.getByRole("tab", { name: /Evaluators/i }).click();
await page.waitForURL("**/evaluators");
await expect(page.getByRole("tab", { name: /Evaluators/i })).toHaveAttribute(
"aria-selected",
"true",
);
Form Inputs in Sections
// When multiple textboxes exist, scope to section
const systemSection = page.locator('button:has-text("System")');
const systemTextbox = systemSection
.locator("..")
.locator("..")
.getByRole("textbox");
await systemTextbox.fill("content");
Serial Tests (Shared State)
Use test.describe.serial when tests depend on each other:
test.describe.serial("Workflow", () => {
const itemName = `item-${randomUUID()}`;
test("step 1: create item", async ({ page }) => {
// Creates itemName
});
test("step 2: edit item", async ({ page }) => {
// Uses itemName from previous test
});
test("step 3: verify edits", async ({ page }) => {
// Verifies itemName was edited
});
});
Assertions
// Visibility
await expect(element).toBeVisible();
await expect(element).not.toBeVisible();
// Text content
await expect(element).toHaveText("expected");
await expect(element).toContainText("partial");
// Attributes
await expect(element).toHaveAttribute("aria-selected", "true");
// Input values
await expect(input).toHaveValue("expected value");
// URL
await page.waitForURL("**/datasets/**/examples");
Navigation Patterns
// Direct navigation
await page.goto("/datasets");
await page.waitForURL("**/datasets");
// Click navigation
await page.getByRole("link", { name: "Datasets" }).click();
await page.waitForURL("**/datasets");
// Extract ID from URL
const url = page.url();
const match = url.match(/datasets\/([^/]+)/);
const datasetId = match ? match[1] : "";
// Navigate with query params
await page.goto(`/playground?datasetId=${datasetId}`);
Running Tests
Before running Playwright tests, build the app so E2E runs against the latest frontend changes:
pnpm run build
# Run specific test file
pnpm exec playwright test tests/server-evaluators.spec.ts --project=chromium
# Run with UI mode
pnpm exec playwright test --ui
# Run specific test by name
pnpm exec playwright test -g "can create"
# Debug mode
pnpm exec playwright test --debug
Avoiding Interactive Report Server
By default, Playwright serves an HTML report after tests finish and waits for Ctrl+C, which can cause command timeouts. Use these options to avoid this:
# Use list reporter (no interactive server)
pnpm exec playwright test tests/example.spec.ts --project=chromium --reporter=list
# Use dot reporter for minimal output
pnpm exec playwright test tests/example.spec.ts --project=chromium --reporter=dot
# Set CI mode to disable interactive features
CI=1 pnpm exec playwright test tests/example.spec.ts --project=chromium
Recommended for automation: Always use --reporter=list or CI=1 when running tests programmatically to ensure the command exits cleanly after tests complete.
Phoenix-Specific Pages
| Page | URL Pattern | Key Elements |
|---|---|---|
| Datasets | /datasets |
Table, "New Dataset" button |
| Dataset Detail | /datasets/{id}/examples |
Tabs (Experiments, Examples, Evaluators, Versions) |
| Dataset Evaluators | /datasets/{id}/evaluators |
"Add evaluator" button, evaluators table |
| Playground | /playground |
Prompts section, Experiment section |
| Playground + Dataset | /playground?datasetId={id} |
Dataset selector, Evaluators button |
| Prompts | /prompts |
"New Prompt" button, prompts table |
| Settings | /settings/general |
"Add User" button, users table |
UI Exploration with agent-browser
When selectors are unclear, use agent-browser to explore the Phoenix UI. For detailed agent-browser usage, invoke the /agent-browser skill.
Quick Reference for Phoenix
# Open Phoenix page (dev server runs on port 6006)
agent-browser open "http://localhost:6006/datasets"
# Get interactive snapshot with element refs
agent-browser snapshot -i
# Click using refs from snapshot
agent-browser click @e5
# Fill form fields
agent-browser fill @e2 "test value"
# Get element text
agent-browser get text @e1
Discovering Selectors Workflow
- Open the page:
agent-browser open "http://localhost:6006/datasets" - Get snapshot:
agent-browser snapshot -i - Find element refs in output (e.g.,
@e1 [button] "New Dataset") - Interact:
agent-browser click @e1 - Re-snapshot after navigation/DOM changes:
agent-browser snapshot -i
Translating to Playwright
| agent-browser output | Playwright selector |
|---|---|
@e1 [button] "Save" |
page.getByRole("button", { name: "Save" }) |
@e2 [link] "Datasets" |
page.getByRole("link", { name: "Datasets" }) |
@e3 [textbox] "Name" |
page.getByRole("textbox", { name: "Name" }) |
@e4 [menuitem] "Edit" |
page.getByRole("menuitem", { name: "Edit" }) |
@e5 [tab] "Evaluators 0" |
page.getByRole("tab", { name: /Evaluators/i }) |
File Naming
- Feature tests:
{feature-name}.spec.ts - Access control:
{role}-access.spec.ts - Rate limiting:
{feature}.rate-limit.spec.ts(runs last)
Common Gotchas
- Dialog not closing: Wait for a deterministic post-action signal (e.g., dialog hidden + success row visible)
- Multiple elements: Use
.first(),.last(), or.nth(n) - Dynamic content: Use regex in name:
{ name: /pattern/i } - Flaky waits: Prefer
waitForURLoverwaitForTimeout - Menu not appearing: Wait for specific menu state/element visibility
Debugging Flaky Tests
Critical Lessons Learned
-
Don't assume parallelism is the problem
- Phoenix tests run with 7 parallel workers without issues
- The app handles concurrent logins, database operations, and session management properly
- If tests fail with parallelism, it's usually a test timing issue, not infrastructure
- Playwright's browser context isolation is robust - each worker gets isolated cookies/sessions
-
waitForTimeout is almost always wrong
page.waitForTimeout()is the #1 cause of flakiness in Phoenix tests- Arbitrary timeouts race against rendering and network speed
- Always replace with state-based waits:
// ❌ BAD - flaky, races against rendering await page.waitForTimeout(500); await element.click(); // ✅ GOOD - waits for actual state await element.waitFor({ state: "visible" }); await element.click();
-
Test the actual failure before fixing
- Run tests with parallelism enabled to see what actually fails
- Check error messages - they often point to the real issue
- Don't optimize prematurely (e.g., caching auth state) if it's not the problem
-
Phoenix test infrastructure is solid
- In-memory SQLite works fine with parallel tests
- No need for per-worker databases
- No need for auth state caching
- Tests use
randomUUID()for data isolation - this works well
Debugging Workflow
When tests are flaky:
-
Run with parallelism multiple times to catch intermittent failures:
for i in 1 2 3 4 5; do pnpm exec playwright test --project=chromium --reporter=dot done -
Look for
waitForTimeoutusage - replace with proper waits:grep -r "waitForTimeout" app/tests/ -
Check for race conditions in element interactions:
- Wait for element visibility before interacting
- Wait for network idle when needed:
page.waitForLoadState("networkidle") - Use
waitForURLafter navigation actions
-
Verify selectors are stable:
- Avoid CSS selectors that depend on DOM structure
- Use role/label selectors that match ARIA attributes
- Test selectors don't break when UI updates
-
Run with trace on failure to see what happened:
pnpm exec playwright test --trace on-first-retry
Common Flaky Patterns and Fixes
| Flaky Pattern | Root Cause | Fix |
|---|---|---|
| Submenu item not found | Using getByText() instead of getByRole() |
Use getByRole("menuitem", { name: /pattern/i }) for submenu items |
| Menu click fails | Menu not fully rendered | await menu.waitFor({ state: "visible" }) before click |
| Dialog assertion fails | Dialog animation not complete | Assert specific completion signal (hidden dialog + next-state element) |
| Navigation timeout | Page still loading | Remove waitForLoadState("networkidle") - it's flaky in CI |
| Element not found | Dynamic content loading | Wait for element visibility, not arbitrary timeout |
| Stale element | Re-render between locate and click | Store locator, not element handle |
Test Stability Best Practices
-
Use proper waits:
// Wait for element state await element.waitFor({ state: "visible" | "hidden" | "attached" }) // Wait for network await page.waitForLoadState("networkidle" | "domcontentloaded" | "load") // Wait for URL change await page.waitForURL("**/expected-path") -
Use unique test data:
const uniqueName = `test-${randomUUID()}`; -
Prefer role selectors - they're less brittle:
page.getByRole("button", { name: "Save" }) // ✅ Good page.locator('button.save-btn') // ❌ Brittle -
Don't fight animations - wait for them:
await expect(dialog).not.toBeVisible(); -
Verify URL changes after navigation:
await page.waitForURL("**/datasets");