software-ux-research
Software UX Research Skill — Quick Reference
Use this skill to identify problems/opportunities and de-risk decisions. Use software-ui-ux-design to implement UI patterns, component changes, and design system updates.
Mar 2026 Baselines (Core)
- Human-centred design: Iterative design + evaluation grounded in evidence (ISO 9241-210:2019) https://www.iso.org/standard/77520.html
- Usability definition: Effectiveness, efficiency, satisfaction in context (ISO 9241-11:2018) https://www.iso.org/standard/63500.html
- Accessibility baseline: WCAG 2.2 is a W3C Recommendation (12 Dec 2024) https://www.w3.org/TR/WCAG22/
- WCAG 3.0 preview: Working Draft published Sep 2025; introduces Bronze/Silver/Gold conformance tiers and enhanced cognitive accessibility; not expected before 2028-2030 https://www.w3.org/WAI/standards-guidelines/wcag/wcag3-intro/
- EU shipping note: European Accessibility Act applies to covered products/services after 28 Jun 2025 (Directive (EU) 2019/882) https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:32019L0882
When to Use This Skill
- Discovery: user needs, JTBD, opportunity sizing, mental models.
- Validation: concepts, prototypes, onboarding/first-run success.
- Evaluative: usability tests, heuristic evaluation, cognitive walkthroughs.
- Quant/behavioral: funnels, cohorts, instrumentation gaps, guardrails.
- Research Ops: intake, prioritization, repository/taxonomy, consent/PII handling.
- Demographic research: Age-diverse, cultural, accessibility participant recruitment.
- A/B testing: Experiment design, sample size, analysis, pitfalls.
- Non-technical user research: Digital literacy assessment, simplified-flow validation, low-tech-confidence usability testing.
When NOT to Use This Skill
- UI implementation → Use software-ui-ux-design for components, patterns, code
- Analytics instrumentation → Use marketing-product-analytics for tracking plans and qa-observability for implementation patterns
- Accessibility compliance audit → Use accessibility-specific checklists (WCAG conformance)
- Marketing research → Use marketing-social-media or related marketing skills
- A/B test platform setup → Use experimentation platforms (Statsig, GrowthBook, LaunchDarkly)
Operating Mode (Core)
If inputs are missing, ask for:
- Decision to unblock (what will change based on this research).
- Target roles/segments and top tasks.
- Platforms and contexts (web/mobile/desktop; remote/on-site; assisted tech).
- Existing evidence (analytics, tickets, reviews, recordings, prior studies).
- Constraints (timeline, recruitment access, compliance, budget).
Default outputs (pick what the user asked for):
- Research plan + output contract (prefer ../software-clean-code-standard/assets/checklists/ux-research-plan-template.md; use assets/research-plan-template.md for skill-specific detail)
- Study protocol (tasks/script + success metrics + recruitment plan)
- Findings report (issues + severity + evidence + recommendations + confidence)
- Decision brief (options + tradeoffs + recommendation + measurement plan)
Required Output Sections
Every research output — plans, protocols, evaluations, reports — must include these sections. They represent the skill's core value beyond standard UX knowledge: governance, confidence calibration, and ethical research practice.
-
Method Justification: Name the chosen method AND explain why alternatives were rejected. Do not just describe the method; explain why it was selected over at least 2 alternatives given the specific context (stage, timeline, sample, question type).
-
Confidence & Triangulation Assessment: Tag every recommendation or finding with a confidence level:
Confidence Evidence requirement Use for High Multiple methods or sources agree High-impact decisions Medium Strong signal from one method + supporting indicators Prioritization Low Single source / small sample Exploratory hypotheses only -
Consent & Data Handling: Include a PII/consent section in every plan or protocol. Research that involves participants requires explicit attention to:
- Minimum PII collection
- Identity stored separately from study data
- Name/email redaction before broad sharing
- Recording access restricted to need-to-know
- Consent, purpose, retention, and opt-out documented
-
Decision Framework: For evaluations and analysis outputs, provide a structured decision table with options, confidence levels, timelines, and risks — not just a single recommendation.
-
Pre-Decision Checklist: For experiment evaluations (A/B tests, etc.), include a verification checklist of confounds and data quality checks to complete before any ship/kill decision.
Method Chooser (Core)
Decision Tree (Fast)
What do you need?
├─ WHY / needs / context → interviews, contextual inquiry, diary
├─ HOW / usability → moderated usability test, cognitive walkthrough, heuristic eval
├─ WHAT / scale → analytics/logs + targeted qual follow-ups
└─ WHICH / causal → experiments (if feasible) or preference tests
When selecting a method, always justify the choice by explaining why 2+ alternatives were rejected given the user's specific context. This is a key differentiator — generic "we'll do interviews" without justification is insufficient.
Research by Product Stage
Stage Framework (What to Do When)
| Stage | Decisions | Primary Methods | Secondary Methods | Output |
|---|---|---|---|---|
| Discovery | What to build and for whom | Interviews, field/diary, journey mapping | Competitive analysis, feedback mining | Opportunity brief + JTBD + Forces of Progress |
| Concept/MVP | Does the concept work? | Concept test, prototype usability | First-click/tree test | MVP scope + onboarding plan |
| Launch | Is it usable + accessible? | Usability testing, accessibility review | Heuristic eval, session replay | Launch blockers + fixes |
| Growth | What drives adoption/value? | Segmented analytics + qual follow-ups | Churn interviews, surveys | Retention drivers + friction |
| Maturity | What to optimize/deprecate? | Experiments, longitudinal tracking | Unmoderated tests | Incremental roadmap |
Discovery Outputs: Beyond Basic JTBD
Discovery research should produce more than job statements. Include:
- Forces of Progress diagram: Map the four forces acting on switching behavior — Push (current pain), Pull (new solution appeal), Anxiety (fear of change), Habit (inertia). These forces explain why users do or don't adopt, which directly informs positioning and onboarding.
- Pain Point Severity Matrix: Score each pain point by Frequency × Impact × Breadth to prioritize objectively. A pain that affects 3 roles weekly outranks one that affects 1 role monthly, even if the single-role pain feels more dramatic in interviews.
Research for Complex Systems (Workflows, Admin, Regulated)
Complexity Indicators
| Indicator | Example | Research Implication |
|---|---|---|
| Multi-step workflows | Draft → approve → publish | Task analysis + state mapping |
| Multi-role permissions | Admin vs editor vs viewer | Test each role + transitions |
| Data dependencies | Requires integrations/sync | Error-path + recovery testing |
| High stakes | Finance, healthcare | Safety checks + confirmations |
| Expert users | Dev tools, analytics | Recruit real experts (not proxies) |
Evaluation Methods (Core)
- Contextual inquiry: observe real work and constraints.
- Task analysis: map goals → steps → failure points.
- Cognitive walkthrough: evaluate learnability and signifiers.
- Error-path testing: timeouts, offline, partial data, permission loss, retries.
- Multi-role walkthrough: simulate handoffs (creator → reviewer → admin).
Multi-Role Coverage Checklist
- Role-permission matrix documented.
- “No access” UX defined (request path, least-privilege defaults).
- Cross-role handoffs tested (notifications, state changes, audit history).
- Error recovery tested for each role (retry, undo, escalation).
Research Ops & Governance (Core)
Intake (Make Requests Comparable)
Minimum required fields:
- Decision to unblock and deadline.
- Research questions (primary + secondary).
- Target users/segments and recruitment constraints.
- Existing evidence and links.
- Deliverable format + audience.
Prioritization (Simple Scoring)
Use a lightweight score to avoid backlog paralysis:
- Decision impact
- Knowledge gap
- Timing urgency
- Feasibility (recruitment + time)
Repository & Taxonomy
- Store each study with: method, date, product area, roles, tasks, key findings, raw evidence links.
- Tag for reuse: problem type (navigation/forms/performance), component/pattern, funnel step.
- Prefer “atomic” findings (one insight per card) to enable recombination [Inference].
Consent, PII, and Access Control
Follow applicable privacy laws; GDPR is a primary reference for EU processing https://eur-lex.europa.eu/eli/reg/2016/679/oj
PII handling checklist:
- Collect minimum PII needed for scheduling and incentives.
- Store identity/contact separately from study data.
- Redact names/emails from transcripts before broad sharing.
- Restrict raw recordings to need-to-know access.
- Document consent, purpose, retention, and opt-out path.
Research Democratization (2026 Trend)
Research democratization is a recurring 2026 trend: non-researchers increasingly conduct research. Enable carefully with guardrails.
| Approach | Guardrails | Risk Level |
|---|---|---|
| Templated usability tests | Script + task templates provided | Low |
| Customer interviews by PMs | Training + review required | Medium |
| Survey design by anyone | Central review + standard questions | Medium |
| Unsupervised research | Not recommended | High |
Guardrails for non-researchers:
- Pre-approved research templates only
- Central review of findings before action
- No direct participant recruitment without ops approval
- Mandatory bias awareness training
- Clear escalation path for unexpected findings
Researching Non-Technical User Segments (2026)
Quick checklist for research involving users with low digital literacy or low tech confidence. Full guidance in references/non-technical-user-research.md.
- Assess digital literacy tier (excluded → dependent → hesitant → capable → confident)
- Recruit via offline-first channels (community centers, libraries, phone outreach)
- Use plain-language screening questions (no jargon, no self-rating scales)
- Adapt methods: moderated-only testing, shorter sessions (30-40 min), read tasks aloud
- Measure: unassisted task completion (>=80%), time-to-first-value (<2 min), error recovery rate
- Frame findings as "inclusion improvements," not "dumbing down"
- Cross-reference with simplification audit template
Measurement & Decision Quality (Core)
Research ROI Quick Reference
| Research Activity | Proxy Metric | Calculation |
|---|---|---|
| Usability testing finding | Prevented dev rework | Hours saved × $150/hr |
| Discovery interview | Prevented build-wrong-thing | Sprint cost × risk reduction % |
| A/B test conclusive result | Improved conversion | (ΔConversion × Traffic × LTV) - Test cost |
| Heuristic evaluation | Early defect detection | Defects found × Cost-to-fix-later |
Rules of thumb:
- 1 usability finding that prevents 40 hours of rework = $6,000 value
- 1 discovery insight that prevents 1 wasted sprint = $50,000-100,000 value
- Research that improves conversion 0.5% on 100k visitors × $50 LTV = $25,000/month
When NOT to Run A/B Tests
| Situation | Why it fails | Better method |
|---|---|---|
| Low power/traffic | Inconclusive results | Usability tests + trends |
| Many variables change | Attribution impossible | Prototype tests → staged rollout |
| Need “why” | Experiments don’t explain | Interviews + observation |
| Ethical constraints | Harmful denial | Phased rollout + holdouts |
| Long-term effects | Short tests miss delayed impact | Longitudinal + retention analysis |
Common Confounds (Call Out Early)
Always check for these in experiment evaluations. List each relevant confound with its risk level and how to verify — do not just name them:
- Selection bias (only power users respond) — check segment composition.
- Survivorship bias (you miss churned users) — compare with cohort-level data.
- Novelty effect (short-term lift) — plot daily metrics to check for trend decay.
- Instrumentation changes mid-test (metrics drift) — confirm no concurrent deployments.
- Sample ratio mismatch (SRM) — run chi-square on assignment counts.
- Peeking / multiple looks — confirm test was not checked before pre-set end date.
- Feature interaction — check if other experiments ran concurrently on same surface.
Optional: AI/Automation Research Considerations
Use only when researching automation/AI-powered features. Skip for traditional software UX.
2026 benchmark: Trend reports consistently highlight AI-assisted analysis. Use AI for speed while keeping humans responsible for strategy and interpretation. Example reference: https://www.lyssna.com/blog/ux-research-trends/
Key Questions
| Dimension | Question | Methods |
|---|---|---|
| Mental model | What do users think the system can/can’t do? | Interviews, concept tests |
| Trust calibration | When do users over/under-rely? | Scenario tests, log review |
| Explanation usefulness | Does “why” help decisions? | A/B explanation variants, interviews |
| Failure recovery | Do users recover and finish tasks? | Failure-path usability tests |
Error Taxonomy (User-Visible)
| Failure type | Typical impact | What to measure |
|---|---|---|
| Wrong output | Rework, lost trust | Verification + override rate |
| Missing output | Manual fallback | Fallback completion rate |
| Unclear output | Confusion | Clarification requests |
| Non-recoverable failure | Blocked flow | Time-to-recovery, support contact |
Optional: AI-Assisted Research Ops (Guardrailed)
- Use automation for transcription/tagging only after PII redaction.
- Maintain an audit trail: every theme links back to raw quotes/clips.
Synthetic Users: When Appropriate (2026)
Trend reports frequently mention synthetic/AI participants. Use with clear boundaries. Example reference: https://www.lyssna.com/blog/ux-research-trends/
| Use Case | Appropriate? | Why |
|---|---|---|
| Early concept brainstorming | WARNING: Supplement only | Generate edge cases, not validation |
| Scenario/edge case expansion | PASS Yes | Broaden coverage before real testing |
| Moderator training/practice | PASS Yes | Practice without participant burden |
| Hypothesis generation | PASS Yes | Explore directions to test with real users |
| Validation/go-no-go decisions | FAIL Never | Cannot substitute lived experience |
| Usability findings as evidence | FAIL Never | Real behavior required |
| Quotes in reports | FAIL Never | Fabricated quotes damage credibility |
Critical rule: Synthetic outputs are hypotheses, not evidence. Always validate with real users before shipping.
Navigation
Resources
Core Research Methods:
- references/research-frameworks.md — JTBD, Kano, Double Diamond, Service Blueprint, opportunity mapping
- references/ux-audit-framework.md — Heuristic evaluation, cognitive walkthrough, severity rating
- references/usability-testing-guide.md — Task design, facilitation, analysis
- references/ux-metrics-framework.md — Task metrics, SUS/HEART, measurement guidance
- references/customer-journey-mapping.md — Journey mapping and service blueprints
- references/pain-point-extraction.md — Feedback-to-themes method
- references/review-mining-playbook.md — B2B/B2C review mining
Demographic & Quantitative Research:
- references/demographic-research-methods.md — Inclusive research for seniors, children, cultures, disabilities
- references/non-technical-user-research.md — Research methods for non-technical and low-digital-literacy users
- references/ab-testing-implementation.md — A/B testing deep-dive (sample size, analysis, pitfalls)
Competitive UX Analysis & Flow Patterns:
- references/competitive-ux-analysis.md — Step-by-step flow patterns from industry leaders (Wise, Revolut, Shopify, Notion, Linear, Stripe) + benchmarking methodology
Research Operations & Methods:
- references/research-repository-management.md — Repository architecture, taxonomy, atomic research, PII handling, adoption metrics
- references/survey-design-guide.md — Question types, bias prevention, sampling, sample size, distribution, platform comparison
- references/remote-research-patterns.md — Moderated remote, unmoderated testing, async methods, recruitment, tool comparison
Feedback Collection & Analysis:
- references/bigtech-feedback-patterns.md — How top companies collect and act on user feedback
- references/feedback-tools-guide.md — Feedback collection tool setup guides and selection matrix
Evaluative Iteration:
- references/evaluative-research-loop.md — Prototype-parity polishing loop (two-surface audit, drift classification, fast iteration)
Data & Sources:
- data/sources.json — Curated external references
Domain-Specific UX Benchmarking
IMPORTANT: When designing UX flows for a specific domain, you MUST use WebSearch to find and suggest best-practice patterns from industry leaders.
Trigger Conditions
- "We're designing [flow type] for [domain]"
- "What's the best UX for [feature] in [industry]?"
- "How do [Company A, Company B] handle [flow]?"
- "Benchmark our [feature] against competitors"
- Any UX design task with identifiable domain context
Domain → Leader Lookup Table
| Domain | Industry Leaders to Check | Key Flows |
|---|---|---|
| Fintech/Banking | Wise, Revolut, Monzo, N26, Chime, Mercury | Onboarding/KYC, money transfer, card management, spend analytics |
| E-commerce | Shopify, Amazon, Stripe Checkout | Checkout, cart, product pages, returns |
| SaaS/B2B | Linear, Notion, Figma, Slack, Airtable | Onboarding, settings, collaboration, permissions |
| Developer Tools | Stripe, Vercel, GitHub, Supabase | Docs, API explorer, dashboard, CLI |
| Consumer Apps | Spotify, Airbnb, Uber, Instagram | Discovery, booking, feed, social |
| Healthcare | Oscar, One Medical, Calm, Headspace | Appointment booking, records, compliance flows |
| EdTech | Duolingo, Coursera, Khan Academy | Onboarding, progress, gamification |
Required Searches
When user specifies a domain, execute:
- Search:
"[domain] UX best practices 2026" - Search:
"[leader company] [flow type] UX" - Search:
"[leader company] app review UX" site:mobbin.com OR site:pageflows.com - Search:
"[domain] onboarding flow examples"
What to Report
After searching, provide:
- Pattern examples: Screenshots/flows from 2-3 industry leaders
- Key patterns identified: What they do well (with specifics)
- Applicable to your flow: How to adapt patterns
- Differentiation opportunity: Where you could improve on leaders
Example Output Format
DOMAIN: Fintech (Money Transfer)
BENCHMARKED: Wise, Revolut
WISE PATTERNS:
- Upfront fee transparency (shows exact fee before recipient input)
- Mid-transfer rate lock (shows countdown timer)
- Delivery time estimate per payment method
- Recipient validation (bank account check before send)
REVOLUT PATTERNS:
- Instant send to Revolut users (P2P first)
- Currency conversion preview with rate comparison
- Scheduled/recurring transfers prominent
APPLY TO YOUR FLOW:
1. Add fee transparency at step 1 (not step 3)
2. Show delivery estimate per payment rail
3. Consider rate lock feature for FX transfers
DIFFERENTIATION OPPORTUNITY:
- Neither shows historical rate chart—add "is now a good time?" context
Trend Awareness Protocol
IMPORTANT: When users ask recommendation questions about UX research, you MUST use WebSearch to check current trends before answering.
Tool/Trend Triggers
- "What's the best UX research tool for [use case]?"
- "What should I use for [usability testing/surveys/analytics]?"
- "What's the latest in UX research?"
- "Current best practices for [user interviews/A/B testing/accessibility]?"
- "Is [research method] still relevant in 2026?"
- "What research tools should I use?"
- "Best approach for [remote research/unmoderated testing]?"
Tool/Trend Searches
- Search:
"UX research trends 2026" - Search:
"UX research tools best practices 2026" - Search:
"[Maze/Hotjar/UserTesting] comparison 2026" - Search:
"AI in UX research 2026"
Tool/Trend Report Format
After searching, provide:
- Current landscape: What research methods/tools are popular NOW
- Emerging trends: New techniques or tools gaining traction
- Deprecated/declining: Methods that are losing effectiveness
- Recommendation: Based on fresh data and current practices
Example Topics (verify with fresh search)
- AI-powered research tools (Maze AI, Looppanel)
- Unmoderated testing platforms evolution
- Voice of Customer (VoC) platforms
- Analytics and behavioral tools (Hotjar, FullStory)
- Accessibility testing tools and standards
- Research repository and insight management
Templates
- Shared plan template: ../software-clean-code-standard/assets/checklists/ux-research-plan-template.md — Product-agnostic research plan template (core + optional AI)
- assets/research-plan-template.md — UX research plan template
- assets/testing/usability-test-plan.md — Usability test plan
- assets/testing/usability-testing-checklist.md — Usability testing checklist
- assets/audits/heuristic-evaluation-template.md — Heuristic evaluation
- assets/audits/ux-audit-report-template.md — Audit report
Evaluative Research Loop
For prototype-parity polishing (fast iteration when product is "almost ideal"), see references/evaluative-research-loop.md. Covers: two-surface audit, drift classification (layout/density/control/content/state), friction-based prioritization, banner/loading guardrails, localization-readiness checks, and fast iteration cadence.
Fact-Checking
- Use web search/web fetch to verify current external facts, versions, pricing, deadlines, regulations, or platform behavior before final answers.
- Prefer primary sources; report source links and dates for volatile information.
- If web access is unavailable, state the limitation and mark guidance as unverified.