qa-dashboard

Installation
SKILL.md

Discovery Questions

  1. What tool do you use for test reporting today? Console output, JUnit XML, HTML reports, or a dedicated platform? Identify the starting point.
  2. Who will look at the dashboard? Developers need failure details and traces. QA leads need trends and flakiness. Leadership needs release confidence and defect rates. Each audience needs a different view.
  3. What decisions should the dashboard drive? "Is this build safe to release?" "Which tests need fixing?" "Is quality improving sprint over sprint?" Dashboards without a decision context become shelfware.
  4. Where do test results live? GitHub Actions artifacts, S3, a database? The storage location determines which dashboard tool is practical.
  5. What CI platform? GitHub Actions, GitLab CI, Jenkins? Each has different artifact and reporting integrations.
  6. Check .agents/qa-project-context.md first. Respect existing reporting conventions and infrastructure.

Core Principles

1. Dashboards answer questions, they do not display numbers. Every panel must map to a question someone actually asks. "What is the flakiness rate?" is a question. "Total test count" is trivia.

2. Different audiences need different views. A developer debugging a CI failure needs stack traces, screenshots, and traces. A VP needs a single number: "Are we ready to release?" Do not force both through the same dashboard.

3. Real-time for CI, trends for leadership. CI dashboards update on every pipeline run. Leadership dashboards aggregate weekly or per-sprint. Mixing cadences confuses both audiences.

4. Drill-down to action. Every red indicator must link to the specific failing test, the specific flaky test, or the specific coverage gap. A dashboard that shows "5 failures" but does not link to the failures is a notification, not a tool.

5. Automate report generation. Reports that require manual effort (running scripts, copying data, formatting slides) will not survive the first busy sprint. Generate reports from CI pipelines automatically.


Allure Report

Allure generates rich HTML reports from test results with history, categories, and retries built in. It works with Playwright, Jest, Vitest, pytest, and most test frameworks.

Allure with Playwright

npm i -D allure-playwright
// playwright.config.ts
import { defineConfig } from "@playwright/test";

export default defineConfig({
  reporter: [
    ["list"],
    ["allure-playwright", {
      outputFolder: "allure-results",
      detail: true,
      suiteTitle: true,
      environmentInfo: {
        Browser: "Chromium",
        Environment: process.env.TEST_ENV ?? "local",
        BaseURL: process.env.BASE_URL ?? "http://localhost:3000",
      },
    }],
  ],
});

Adding metadata to tests:

import { test, expect } from "@playwright/test";
import { allure } from "allure-playwright";

test.describe("Checkout Flow", () => {
  test("should complete purchase with valid card", async ({ page }) => {
    await allure.severity("critical");
    await allure.feature("Checkout");
    await allure.story("Payment Processing");
    await allure.tag("smoke");

    // Attach custom data to report
    await allure.attachment("Test Config", JSON.stringify({
      paymentProvider: "stripe-test",
      currency: "USD",
    }), "application/json");

    await page.goto("/checkout");
    await page.fill('[data-testid="card-number"]', "4242424242424242");
    await page.fill('[data-testid="card-expiry"]', "12/28");
    await page.fill('[data-testid="card-cvc"]', "123");
    await page.click('[data-testid="pay-button"]');

    await expect(page.locator('[data-testid="confirmation"]')).toBeVisible();
  });
});

Allure with Jest/Vitest

# Jest
npm i -D jest-allure2-reporter allure-jest

# Vitest
npm i -D allure-vitest

Vitest configuration:

// vitest.config.ts
import { defineConfig } from "vitest/config";

export default defineConfig({
  test: {
    reporters: [
      "default",
      ["allure-vitest/reporter", {
        resultsDir: "allure-results",
        environmentInfo: {
          Node: process.version,
          OS: process.platform,
        },
      }],
    ],
    setupFiles: ["allure-vitest/setup"],
  },
});

Generating Allure Reports

# Install Allure CLI
brew install allure  # macOS
# or: npm i -D allure-commandline

# Generate HTML report from results
npx allure generate allure-results --clean -o allure-report

# Open report in browser
npx allure open allure-report

# Serve report (for CI artifact viewing)
npx allure serve allure-results

History and Trends

Allure tracks test history across runs when you preserve the allure-report/history directory.

# GitHub Actions: preserve Allure history across runs
- name: Download previous Allure history
  uses: actions/download-artifact@v4
  with:
    name: allure-history
    path: allure-history
  continue-on-error: true  # First run has no history

- name: Run tests
  run: npx playwright test

- name: Copy history to results
  run: |
    mkdir -p allure-results/history
    cp -r allure-history/history/* allure-results/history/ 2>/dev/null || true

- name: Generate Allure report
  run: npx allure generate allure-results --clean -o allure-report

- name: Upload Allure report
  uses: actions/upload-artifact@v4
  with:
    name: allure-report
    path: allure-report/
    retention-days: 30

- name: Upload Allure history
  uses: actions/upload-artifact@v4
  with:
    name: allure-history
    path: allure-report/history/
    retention-days: 90

Custom Categories

Define categories to group failures by type instead of showing a flat list.

// allure-results/categories.json
[
  {
    "name": "Product Bugs",
    "matchedStatuses": ["failed"],
    "messageRegex": ".*Expected.*but received.*"
  },
  {
    "name": "Test Infrastructure",
    "matchedStatuses": ["broken"],
    "messageRegex": ".*(ECONNREFUSED|timeout|navigation).*"
  },
  {
    "name": "Flaky Tests",
    "matchedStatuses": ["failed"],
    "messageRegex": ".*(intermittent|race condition|retry).*"
  },
  {
    "name": "Missing Test Data",
    "matchedStatuses": ["broken"],
    "messageRegex": ".*(seed|fixture|not found in database).*"
  }
]

Grafana Dashboards

Grafana provides real-time dashboards with alerting. Best for teams that already use Grafana for infrastructure monitoring and want to add test metrics alongside production metrics.

Data Pipeline: CI to Grafana

CI Pipeline Run
  ├── Test results (JUnit XML)
  ├── Coverage report (JSON)
  └── Timing data (JSON)
  Parser script (post-test CI step)
  Time-series DB (InfluxDB / Prometheus pushgateway)
  Grafana queries + panels

Pushing Test Metrics to InfluxDB

// scripts/push-test-metrics.ts
import { InfluxDB, Point } from "@influxdata/influxdb-client";

interface TestResult {
  name: string;
  suite: string;
  status: "passed" | "failed" | "skipped";
  duration: number;
  retries: number;
}

async function pushMetrics(results: TestResult[], runId: string, branch: string) {
  const client = new InfluxDB({
    url: process.env.INFLUXDB_URL!,
    token: process.env.INFLUXDB_TOKEN!,
  });

  const writeApi = client.getWriteApi("qa", "test-results", "ms");

  for (const result of results) {
    const point = new Point("test_execution")
      .tag("suite", result.suite)
      .tag("test_name", result.name)
      .tag("status", result.status)
      .tag("branch", branch)
      .tag("run_id", runId)
      .floatField("duration_ms", result.duration)
      .intField("retries", result.retries)
      .intField("passed", result.status === "passed" ? 1 : 0)
      .intField("failed", result.status === "failed" ? 1 : 0);

    writeApi.writePoint(point);
  }

  // Push summary metrics
  const total = results.length;
  const passed = results.filter((r) => r.status === "passed").length;
  const failed = results.filter((r) => r.status === "failed").length;
  const avgDuration = results.reduce((sum, r) => sum + r.duration, 0) / total;

  const summary = new Point("test_run_summary")
    .tag("branch", branch)
    .tag("run_id", runId)
    .intField("total", total)
    .intField("passed", passed)
    .intField("failed", failed)
    .floatField("pass_rate", (passed / total) * 100)
    .floatField("avg_duration_ms", avgDuration);

  writeApi.writePoint(summary);
  await writeApi.close();
}
# GitHub Actions: push metrics after tests
- name: Push test metrics to Grafana
  if: always()
  env:
    INFLUXDB_URL: ${{ secrets.INFLUXDB_URL }}
    INFLUXDB_TOKEN: ${{ secrets.INFLUXDB_TOKEN }}
  run: |
    npx tsx scripts/push-test-metrics.ts \
      --results-file test-results/results.json \
      --run-id "${{ github.run_id }}" \
      --branch "${{ github.head_ref || github.ref_name }}"

Recommended Grafana Panels

Panel 1: Pass Rate Trend (time series)

Query: SELECT mean("pass_rate") FROM "test_run_summary"
       WHERE "branch" = 'main'
       GROUP BY time(1d)
Visualization: Time series, thresholds at 95% (yellow) and 99% (green)

Panel 2: Flakiness Top 10 (table)

Query: SELECT "test_name", count("retries") as retry_count
       FROM "test_execution"
       WHERE "retries" > 0 AND time > now() - 14d
       GROUP BY "test_name"
       ORDER BY retry_count DESC
       LIMIT 10
Visualization: Table with sortable columns

Additional panels: Test Duration Distribution (histogram, buckets at 1s/5s/10s/30s/60s), Coverage Trend (line + branch coverage over time), CI Duration Trend (with target line at 600s).

Grafana Alerting

Set up alerts for: pass rate below 95% (warning, 10m window), CI duration above 15 min (warning), and coverage drop of more than 2% in a week (info). Route to the QA team's Slack channel.


ReportPortal

ReportPortal is a self-hosted test reporting platform with AI-powered failure analysis, test result aggregation across frameworks, and real-time dashboards.

Setup with Docker Compose

# ReportPortal provides an official docker-compose
curl -LO https://raw.githubusercontent.com/reportportal/reportportal/master/docker-compose.yml
docker compose up -d
# Access at http://localhost:8080 (default: superadmin/erebus)

Integration with Playwright

npm i -D @reportportal/agent-js-playwright
// playwright.config.ts
import { defineConfig } from "@playwright/test";

export default defineConfig({
  reporter: [
    ["list"],
    ["@reportportal/agent-js-playwright", {
      apiKey: process.env.RP_API_KEY,
      endpoint: process.env.RP_ENDPOINT ?? "http://localhost:8080/api/v1",
      project: "my-project",
      launch: `E2E Tests - ${process.env.CI ? "CI" : "local"}`,
      attributes: [
        { key: "branch", value: process.env.GITHUB_HEAD_REF ?? "local" },
        { key: "build", value: process.env.GITHUB_RUN_ID ?? "dev" },
      ],
      description: `Playwright E2E suite run on ${new Date().toISOString()}`,
    }],
  ],
});

ReportPortal Features

Feature What It Does
Auto-analysis ML-based failure classification: product bug, test bug, system issue, or to investigate
Defect type mapping Custom defect categories with sub-types for your project
Flaky test detection Identifies tests that flip between pass/fail across launches
Merge launches Combine results from sharded CI runs into one unified view
Quality gates Define pass/fail criteria for launches (max failures, min pass rate)
Comparison Side-by-side comparison of two launches to spot regressions

ReportPortal quality gates can be queried via API after test completion (GET /api/v1/$PROJECT/launch/$LAUNCH_ID/quality-gate) and used as a CI gate -- fail the pipeline if the gate status is not PASSED.


Stakeholder Reports

Weekly QA Summary -- Automate via scheduled CI job. Include: pass rate + trend, new vs fixed failures, top 5 flaky tests, coverage delta, avg CI duration. Classify health: STABLE (>= 98%), NEEDS ATTENTION (>= 95%), CRITICAL (< 95%). Post to Slack automatically.

Release Quality Report -- Generate before each release. Gate on: E2E pass rate >= 99%, unit pass rate 100%, branch coverage >= 80%, zero critical bugs, major bugs <= 2, performance budget (LCP < 2500ms, FID < 100ms, CLS < 0.1). Output a READY/NOT READY verdict with per-gate pass/fail breakdown.


Recommended Dashboard Panels

A practical set of panels that cover the most common questions teams ask.

Panel Question It Answers Data Source Audience
Pass/Fail Trend Is quality improving or degrading? CI test results over time Everyone
Flakiness Top 10 Which tests waste the most time? Tests with retries in last 14 days Developers, QA
Coverage Heatmap Where are we blind? Coverage by module/directory Developers
Defect Escape Trend Are bugs reaching production? Production incidents tagged as test escapes QA leads, Leadership
CI Duration Is the pipeline getting slower? Pipeline duration over time DevOps, Developers
Test Velocity Are we writing tests proportional to features? New tests added per sprint QA leads
Failure Categories Are failures product bugs or test infra? Categorized failure reasons QA leads
Release Readiness Can we ship? Composite score from all gates Leadership

Anti-Patterns

Dashboard with 30 panels. No one reads a dashboard with 30 panels. Start with 5-6 panels that answer the most urgent questions. Add panels only when someone asks a question the dashboard cannot answer.

Metrics without context. "Pass rate: 97%" means nothing without "target: 99%" and "last week: 98.5%." Every metric needs a target and a trend to be actionable.

Manual report generation. If generating the weekly QA summary requires someone to SSH into a server, run queries, and paste into a slide deck, it will stop happening by week 3. Automate everything into the CI pipeline.

Same dashboard for developers and leadership. Developers need failure details, stack traces, and reproduction steps. Leadership needs a single traffic light: green/yellow/red. Build separate views.

Reporting test counts as progress. "We added 200 tests this sprint" says nothing about quality. Report coverage of critical paths, defect escape rate, and mean time to detect regressions instead.

No alerting on regressions. A dashboard that no one checks is useless. Set up Grafana alerts or Slack notifications for pass rate drops, coverage decreases, and CI duration increases. Dashboards are for investigation; alerts are for detection.

Allure without history. A single Allure report is a snapshot. Without history, you cannot see trends, identify intermittent failures, or measure improvement. Always preserve the history/ directory across CI runs.


Done When

  • Dashboard is deployed and accessible to the full team without requiring local setup or manual report generation.
  • Test execution trends (pass rate, failure count, duration) are visible over at least 2 weeks of historical data.
  • Flakiness trend chart is configured showing the top flaky tests with retry counts over a rolling 14-day window.
  • Stakeholder-facing report template is created and generating automatically (weekly summary or per-release quality report with a clear READY/NOT READY verdict).
  • Alert is configured to notify the team (via Slack or equivalent) when the main branch pass rate drops by more than 2 percentage points in a single day.

Related Skills

  • qa-metrics -- Defining quality KPIs, measurement frameworks, and metric interpretation.
  • ci-cd-integration -- Pipeline configuration for automated report generation and artifact management.
  • ai-bug-triage -- AI-powered failure classification that feeds into dashboard categories.
Weekly Installs
11
GitHub Stars
4
First Seen
Apr 1, 2026
Installed on
amp10
cline10
opencode10
cursor10
kimi-cli10
warp10