skills/arize-ai/phoenix/phoenix-playwright-tests

phoenix-playwright-tests

SKILL.md

Phoenix Playwright Test Writing

Write end-to-end tests for Phoenix using Playwright. Tests live in app/tests/ and follow established patterns.

Timeout Policy

  • Do not pass timeout args in test code under app/tests.
  • Tune timing centrally in app/playwright.config.ts (global timeout, expect.timeout, use.navigationTimeout, and webServer.timeout).

Quick Start

import { expect, test } from "@playwright/test";
import { randomUUID } from "crypto";

test.describe("Feature Name", () => {
  test.beforeEach(async ({ page }) => {
    await page.goto(`/login`);
    await page.getByLabel("Email").fill("admin@localhost");
    await page.getByLabel("Password").fill("admin123");
    await page.getByRole("button", { name: "Log In", exact: true }).click();
    await page.waitForURL("**/projects");
  });

  test("can do something", async ({ page }) => {
    // Test implementation
  });
});

Test Credentials

User Email Password Role
Admin admin@localhost admin123 admin
Member member@localhost.com member123 member
Viewer viewer@localhost.com viewer123 viewer

Selector Patterns (Priority Order)

  1. Role selectors (most robust):

    page.getByRole("button", { name: "Save" });
    page.getByRole("link", { name: "Datasets" });
    page.getByRole("tab", { name: /Evaluators/i });
    page.getByRole("menuitem", { name: "Edit" });
    page.getByRole("cell", { name: "my-item" });
    page.getByRole("heading", { name: "Title" });
    page.getByRole("dialog");
    page.getByRole("textbox", { name: "Name" });
    page.getByRole("combobox", { name: /mapping/i });
    
  2. Label selectors:

    page.getByLabel("Email");
    page.getByLabel("Dataset Name");
    page.getByLabel("Description");
    
  3. Text selectors:

    page.getByText("No evaluators added");
    page.getByPlaceholder("Search...");
    
  4. Test IDs (when available):

    page.getByTestId("modal");
    
  5. CSS locators (last resort):

    page.locator('button:has-text("Save")');
    

Common UI Patterns

Dropdown Menus

// Click button to open dropdown
await page.getByRole("button", { name: "New Dataset" }).click();
// Select menu item
await page.getByRole("menuitem", { name: "New Dataset" }).click();

Nested Menus (Submenus)

// Open menu, hover over submenu trigger, click submenu item
await page.getByRole("button", { name: "Add evaluator" }).click();
await page
  .getByRole("menuitem", { name: "Use LLM evaluator template" })
  .hover();
await page.getByRole("menuitem", { name: /correctness/i }).click();

// IMPORTANT: Always use getByRole("menuitem") for submenu items, not getByText()
// Playwright's auto-waiting handles the submenu appearance timing
// ❌ BAD - flaky in CI:
// await page.getByText("ExactMatch").first().click();
// ✅ GOOD - reliable:
// await page.getByRole("menuitem", { name: /ExactMatch/i }).click();

Dialogs/Modals

// Wait for dialog
await expect(page.getByRole("dialog")).toBeVisible();
// Fill form in dialog
await page.getByLabel("Name").fill("test-name");
// Submit
await page.getByRole("button", { name: "Create" }).click();
// Wait for close
await expect(page.getByRole("dialog")).not.toBeVisible();

Tables with Row Actions

// Find row by cell content
const row = page.getByRole("row").filter({
  has: page.getByRole("cell", { name: "item-name" }),
});
// Click action button in row (usually last button)
await row.getByRole("button").last().click();
// Select action from menu
await page.getByRole("menuitem", { name: "Edit" }).click();

Tabs

await page.getByRole("tab", { name: /Evaluators/i }).click();
await page.waitForURL("**/evaluators");
await expect(page.getByRole("tab", { name: /Evaluators/i })).toHaveAttribute(
  "aria-selected",
  "true",
);

Form Inputs in Sections

// When multiple textboxes exist, scope to section
const systemSection = page.locator('button:has-text("System")');
const systemTextbox = systemSection
  .locator("..")
  .locator("..")
  .getByRole("textbox");
await systemTextbox.fill("content");

Serial Tests (Shared State)

Use test.describe.serial when tests depend on each other:

test.describe.serial("Workflow", () => {
  const itemName = `item-${randomUUID()}`;

  test("step 1: create item", async ({ page }) => {
    // Creates itemName
  });

  test("step 2: edit item", async ({ page }) => {
    // Uses itemName from previous test
  });

  test("step 3: verify edits", async ({ page }) => {
    // Verifies itemName was edited
  });
});

Assertions

// Visibility
await expect(element).toBeVisible();
await expect(element).not.toBeVisible();

// Text content
await expect(element).toHaveText("expected");
await expect(element).toContainText("partial");

// Attributes
await expect(element).toHaveAttribute("aria-selected", "true");

// Input values
await expect(input).toHaveValue("expected value");

// URL
await page.waitForURL("**/datasets/**/examples");

Navigation Patterns

// Direct navigation
await page.goto("/datasets");
await page.waitForURL("**/datasets");

// Click navigation
await page.getByRole("link", { name: "Datasets" }).click();
await page.waitForURL("**/datasets");

// Extract ID from URL
const url = page.url();
const match = url.match(/datasets\/([^/]+)/);
const datasetId = match ? match[1] : "";

// Navigate with query params
await page.goto(`/playground?datasetId=${datasetId}`);

Running Tests

Before running Playwright tests, build the app so E2E runs against the latest frontend changes:

pnpm run build
# Run specific test file
pnpm exec playwright test tests/server-evaluators.spec.ts --project=chromium

# Run with UI mode
pnpm exec playwright test --ui

# Run specific test by name
pnpm exec playwright test -g "can create"

# Debug mode
pnpm exec playwright test --debug

Avoiding Interactive Report Server

By default, Playwright serves an HTML report after tests finish and waits for Ctrl+C, which can cause command timeouts. Use these options to avoid this:

# Use list reporter (no interactive server)
pnpm exec playwright test tests/example.spec.ts --project=chromium --reporter=list

# Use dot reporter for minimal output
pnpm exec playwright test tests/example.spec.ts --project=chromium --reporter=dot

# Set CI mode to disable interactive features
CI=1 pnpm exec playwright test tests/example.spec.ts --project=chromium

Recommended for automation: Always use --reporter=list or CI=1 when running tests programmatically to ensure the command exits cleanly after tests complete.

Phoenix-Specific Pages

Page URL Pattern Key Elements
Datasets /datasets Table, "New Dataset" button
Dataset Detail /datasets/{id}/examples Tabs (Experiments, Examples, Evaluators, Versions)
Dataset Evaluators /datasets/{id}/evaluators "Add evaluator" button, evaluators table
Playground /playground Prompts section, Experiment section
Playground + Dataset /playground?datasetId={id} Dataset selector, Evaluators button
Prompts /prompts "New Prompt" button, prompts table
Settings /settings/general "Add User" button, users table

UI Exploration with agent-browser

When selectors are unclear, use agent-browser to explore the Phoenix UI. For detailed agent-browser usage, invoke the /agent-browser skill.

Quick Reference for Phoenix

# Open Phoenix page (dev server runs on port 6006)
agent-browser open "http://localhost:6006/datasets"

# Get interactive snapshot with element refs
agent-browser snapshot -i

# Click using refs from snapshot
agent-browser click @e5

# Fill form fields
agent-browser fill @e2 "test value"

# Get element text
agent-browser get text @e1

Discovering Selectors Workflow

  1. Open the page: agent-browser open "http://localhost:6006/datasets"
  2. Get snapshot: agent-browser snapshot -i
  3. Find element refs in output (e.g., @e1 [button] "New Dataset")
  4. Interact: agent-browser click @e1
  5. Re-snapshot after navigation/DOM changes: agent-browser snapshot -i

Translating to Playwright

agent-browser output Playwright selector
@e1 [button] "Save" page.getByRole("button", { name: "Save" })
@e2 [link] "Datasets" page.getByRole("link", { name: "Datasets" })
@e3 [textbox] "Name" page.getByRole("textbox", { name: "Name" })
@e4 [menuitem] "Edit" page.getByRole("menuitem", { name: "Edit" })
@e5 [tab] "Evaluators 0" page.getByRole("tab", { name: /Evaluators/i })

File Naming

  • Feature tests: {feature-name}.spec.ts
  • Access control: {role}-access.spec.ts
  • Rate limiting: {feature}.rate-limit.spec.ts (runs last)

Common Gotchas

  1. Dialog not closing: Wait for a deterministic post-action signal (e.g., dialog hidden + success row visible)
  2. Multiple elements: Use .first(), .last(), or .nth(n)
  3. Dynamic content: Use regex in name: { name: /pattern/i }
  4. Flaky waits: Prefer waitForURL over waitForTimeout
  5. Menu not appearing: Wait for specific menu state/element visibility

Debugging Flaky Tests

Critical Lessons Learned

  1. Don't assume parallelism is the problem

    • Phoenix tests run with 7 parallel workers without issues
    • The app handles concurrent logins, database operations, and session management properly
    • If tests fail with parallelism, it's usually a test timing issue, not infrastructure
    • Playwright's browser context isolation is robust - each worker gets isolated cookies/sessions
  2. waitForTimeout is almost always wrong

    • page.waitForTimeout() is the #1 cause of flakiness in Phoenix tests
    • Arbitrary timeouts race against rendering and network speed
    • Always replace with state-based waits:
      // ❌ BAD - flaky, races against rendering
      await page.waitForTimeout(500);
      await element.click();
      
      // ✅ GOOD - waits for actual state
      await element.waitFor({ state: "visible" });
      await element.click();
      
  3. Test the actual failure before fixing

    • Run tests with parallelism enabled to see what actually fails
    • Check error messages - they often point to the real issue
    • Don't optimize prematurely (e.g., caching auth state) if it's not the problem
  4. Phoenix test infrastructure is solid

    • In-memory SQLite works fine with parallel tests
    • No need for per-worker databases
    • No need for auth state caching
    • Tests use randomUUID() for data isolation - this works well

Debugging Workflow

When tests are flaky:

  1. Run with parallelism multiple times to catch intermittent failures:

    for i in 1 2 3 4 5; do
      pnpm exec playwright test --project=chromium --reporter=dot
    done
    
  2. Look for waitForTimeout usage - replace with proper waits:

    grep -r "waitForTimeout" app/tests/
    
  3. Check for race conditions in element interactions:

    • Wait for element visibility before interacting
    • Wait for network idle when needed: page.waitForLoadState("networkidle")
    • Use waitForURL after navigation actions
  4. Verify selectors are stable:

    • Avoid CSS selectors that depend on DOM structure
    • Use role/label selectors that match ARIA attributes
    • Test selectors don't break when UI updates
  5. Run with trace on failure to see what happened:

    pnpm exec playwright test --trace on-first-retry
    

Common Flaky Patterns and Fixes

Flaky Pattern Root Cause Fix
Submenu item not found Using getByText() instead of getByRole() Use getByRole("menuitem", { name: /pattern/i }) for submenu items
Menu click fails Menu not fully rendered await menu.waitFor({ state: "visible" }) before click
Dialog assertion fails Dialog animation not complete Assert specific completion signal (hidden dialog + next-state element)
Navigation timeout Page still loading Remove waitForLoadState("networkidle") - it's flaky in CI
Element not found Dynamic content loading Wait for element visibility, not arbitrary timeout
Stale element Re-render between locate and click Store locator, not element handle

Test Stability Best Practices

  1. Use proper waits:

    // Wait for element state
    await element.waitFor({ state: "visible" | "hidden" | "attached" })
    
    // Wait for network
    await page.waitForLoadState("networkidle" | "domcontentloaded" | "load")
    
    // Wait for URL change
    await page.waitForURL("**/expected-path")
    
  2. Use unique test data:

    const uniqueName = `test-${randomUUID()}`;
    
  3. Prefer role selectors - they're less brittle:

    page.getByRole("button", { name: "Save" }) // ✅ Good
    page.locator('button.save-btn') // ❌ Brittle
    
  4. Don't fight animations - wait for them:

    await expect(dialog).not.toBeVisible();
    
  5. Verify URL changes after navigation:

    await page.waitForURL("**/datasets");
    
Weekly Installs
49
GitHub Stars
8.9K
First Seen
Feb 17, 2026
Installed on
opencode49
kilo49
antigravity49
claude-code49
github-copilot49
codex49