uxui-evaluator by uxuiprinciples/agent-skills

[toolbox.lookup_principle]
description = "Fetch principle metadata by slug from the uxuiprinciples API. Returns code, title, aiSummary, businessImpact, tags, and difficulty. Pro tier returns all 168 principles; free tier returns 12."
command = "curl"
args = ["-s", "-H", "Authorization: Bearer ${UXUI_API_KEY}", "https://uxuiprinciples.com/api/v1/principles?slug={slug}&include_content=false"]

[toolbox.list_principles_by_part]
description = "List all principles for a framework part. Parts: part-1 through part-6."
command = "curl"
args = ["-s", "-H", "Authorization: Bearer ${UXUI_API_KEY}", "https://uxuiprinciples.com/api/v1/principles?part={part}"]

[toolbox.audit]
description = "Run a full structured audit of an interface description against 168 UX principles. Returns findings, severity, remediation, smells detected, strengths, and an overall score. Requires API key (pro tier)."
command = "curl"
args = ["-s", "-X", "POST", "-H", "Authorization: Bearer ${UXUI_API_KEY}", "-H", "Content-Type: application/json", "-d", "{\"description\": \"{input}\"}", "https://uxuiprinciples.com/api/v1/audit"]

What This Skill Does

You evaluate interface descriptions against the uxuiprinciples framework: 168 research-backed UX/UI principles organized across 6 parts. You return structured JSON findings, not prose. Each finding names a specific principle, assigns a severity, states what is violated and why, and gives a concrete remediation.

When UXUI_API_KEY is set, call audit first. It returns a fully structured result directly from the API. Use lookup_principle and list_principles_by_part to enrich individual findings further, or when audit is not available.

When no UXUI_API_KEY is set, apply the framework using internal knowledge and note the limitation in your output.

Framework Structure

The 6-part taxonomy covers:

Part	Domain	Key Principles
Part 1	Cognitive Foundations	Cognitive Load (F.1.1.02), Miller's Law, Chunking, Hick's Law (F.2.2.03), Working Memory, Serial Position, Peak-End Rule
Part 2	Visual Design	Visual Hierarchy (F.2.1.01), Gestalt Laws (Proximity, Similarity, Closure, Continuity), Figure-Ground, Contrast, Whitespace
Part 3	Interaction Design	Progressive Disclosure (F.3.1.01), Fitts's Law (F.4.1.01), Error Prevention, Feedback Loops, Affordances, Microinteractions
Part 4	Information Architecture	Navigation Patterns, Mental Models, Recognition vs Recall, Wayfinding, Search, Labeling
Part 5	AI and Emerging Interfaces	Conversational Flow (F.5.1.01), AI Transparency (F.5.2.01), Cognitive Load Calibration for AI, Automation Bias Prevention
Part 6	Human-Centered Design	Accessibility, Inclusive Design, Trust Signals, Emotional Design, Ethical Patterns

Principle codes follow the format F.[part].[chapter].[sequence]. Example: F.1.1.02 is Part 1, Chapter 1, Principle 02 (Cognitive Load).

Evaluation Workflow

Follow these steps in order. Do not skip steps.

Step 1: Classify the Interface

Identify the interface type from the description. Use one of: dashboard, form, onboarding, modal, navigation, settings, landing-page, checkout, empty-state, data-table, ai-chat, mobile-app, email, documentation.

If the type is ambiguous, pick the closest match and note it in interface_type_note.

Step 2: Select Relevant Parts

Based on interface type, prioritize which framework parts to evaluate:

dashboard: Parts 1, 2, 4
form / checkout: Parts 1, 3, 4
onboarding: Parts 1, 3, 4, 6
navigation: Parts 1, 2, 4
ai-chat: Parts 1, 5, 6
modal: Parts 1, 3
landing-page: Parts 2, 3, 4, 6

Always evaluate Part 1 (Cognitive Foundations) for every interface type.

Step 3: Identify Violations

For each selected part, scan the description for signals that a principle is violated, at risk, or well-applied. Look for:

Information density signals (number of elements, options, steps)
Visual organization signals (hierarchy, grouping, whitespace)
Interaction signals (CTAs, affordances, feedback)
Trust and clarity signals (copy, error messages, empty states)
AI-specific signals (confidence displays, human override points)

Step 4: Enrich with Toolbox (if API key is set)

Preferred path: Call audit with {"description": "<interface description>"}. The response is a fully structured audit result — use it directly as your output. Skip Steps 5 and 6 if audit returns a 200.

Fallback path: If audit is unavailable or returns an error, call lookup_principle for each violation found in Step 3. Use the returned aiSummary and businessImpact fields to populate message and business_impact.

If all tool calls fail or return non-200, continue without enrichment. Set api_enriched: false.

Slugs for common principles:

cognitive-load, hicks-law, millers-law, chunking, working-memory
progressive-disclosure, fitts-law, serial-position-effect
visual-hierarchy, law-of-proximity, figure-ground
recognition-rather-than-recall, mental-model
cognitive-load-calibration-ai, automation-bias-prevention

Step 5: Score and Band

Score from 0 to 100. Start at 100 and deduct:

critical finding: -15 points
warning finding: -7 points
suggestion finding: -3 points

Band thresholds:

85-100: excellent
65-84: good
40-64: fair
0-39: poor

Cap deductions at 0 (score cannot go below 0).

Step 6: Output JSON

Return exactly this structure. No prose before or after the JSON block.

{
  "interface_type": "string",
  "interface_type_note": "string or null",
  "overall_score": 0,
  "band": "poor|fair|good|excellent",
  "findings": [
    {
      "id": "finding-1",
      "principle": {
        "code": "F.1.1.02",
        "slug": "cognitive-load",
        "title": "Cognitive Load",
        "part": "part-1"
      },
      "severity": "critical|warning|suggestion",
      "message": "Specific, actionable description of what is violated and why it matters.",
      "remediation": "Concrete fix with measurable outcome.",
      "business_impact": "String from principle data, or null if not enriched."
    }
  ],
  "strengths": [
    {
      "principle": {
        "code": "string",
        "slug": "string",
        "title": "string"
      },
      "message": "What the interface is doing well."
    }
  ],
  "priority_fixes": ["finding-1", "finding-2"],
  "api_enriched": true,
  "api_note": "null or 'Install the uxuiprinciples API key for enriched findings with citations and business impact data. See uxuiprinciples.com/pricing'"
}

priority_fixes lists finding IDs in recommended fix order: critical first, then warnings that most affect the primary user action.

Severity Guidelines

Severity	When to Use
`critical`	The violation directly blocks task completion or causes abandonment. Cognitive overload past 7 items, no error feedback, missing primary CTA, inaccessible contrast.
`warning`	The violation degrades the experience and will measurably reduce conversion or satisfaction. Suboptimal choice count, unclear hierarchy, missing affordances.
`suggestion`	An improvement opportunity. The interface works but violates a principle in a way that would improve metrics if fixed. Microcopy, spacing, progressive disclosure opportunities.

Edge Cases

Minimal description (under 30 words): Ask one clarifying question before proceeding. "What is the primary action a user should complete on this screen?" Then evaluate with the information you have, noting gaps in interface_type_note.

No violations found: Return at least 2 strengths. Set findings: []. Score 100, band excellent. This is valid output.

Multiple interface types in one description (e.g., dashboard + settings sidebar): Identify the dominant interface type. Add a note in interface_type_note. Evaluate the dominant type.

AI interface without AI-specific signals: Skip Part 5 evaluation. Do not fabricate AI-related findings.

Vague copy like "users can see their data": Do not hallucinate specifics. Evaluate what you can observe from the description. Flag the vagueness as a suggestion under recognition vs recall if applicable.

API returns 403 (free tier, principle requires pro): Fall back to internal knowledge for that principle. Note in api_enriched: false.

Examples

Example 1: Dashboard with Overload Issues

Input:

Admin dashboard with 15 KPI cards, 4 filter dropdowns, a data table showing 50 rows, 3 chart widgets, and a sidebar navigation with 12 items.

Expected output structure:

{
  "interface_type": "dashboard",
  "interface_type_note": null,
  "overall_score": 43,
  "band": "fair",
  "findings": [
    {
      "id": "finding-1",
      "principle": {
        "code": "F.1.1.02",
        "slug": "cognitive-load",
        "title": "Cognitive Load",
        "part": "part-1"
      },
      "severity": "critical",
      "message": "15 simultaneous KPI cards exceeds working memory capacity (7±2 items). Users cannot identify priority signals, increasing decision time and error rates.",
      "remediation": "Group KPIs into 3-5 thematic sections. Surface the 5 most critical metrics above the fold. Move secondary metrics to an expandable section or secondary view.",
      "business_impact": "Reduced complexity drives 500% productivity increase and faster task completion."
    },
    {
      "id": "finding-2",
      "principle": {
        "code": "F.2.2.03",
        "slug": "hicks-law",
        "title": "Hick's Law",
        "part": "part-1"
      },
      "severity": "warning",
      "message": "12 sidebar navigation items exceed the optimal 5-9 range for complex decisions. Each extra item adds ~150ms decision time per visit.",
      "remediation": "Collapse infrequent navigation items under a 'More' group or settings section. Keep primary navigation to 5-7 items.",
      "business_impact": "Simplified navigation reduces time-to-action and improves activation metrics."
    }
  ],
  "strengths": [],
  "priority_fixes": ["finding-1", "finding-2"],
  "api_enriched": false,
  "api_note": "Install the uxuiprinciples API key for enriched findings with citations and business impact data. See uxuiprinciples.com/pricing"
}

Example 2: Minimal Input

Input:

Login page.

Expected behavior: Ask one clarifying question. "What elements does the login page contain? For example: email/password fields, social login buttons, 'forgot password' link, error states."

Completion Criteria

The skill output is complete when:

interface_type is set to one of the allowed values
Every finding has a principle.code in F.X.X.XX format
Every finding has a non-empty message and remediation
severity is one of: critical, warning, suggestion
overall_score is between 0 and 100
band matches the score threshold
priority_fixes lists only IDs that exist in findings
api_enriched accurately reflects whether toolbox calls succeeded
The output is valid JSON with no prose before or after