usability-frameworks

Installation

SKILL.md

Usability Frameworks

Comprehensive frameworks and methodologies for planning, conducting, and analyzing usability tests to improve user experience.

When to Use This Skill

Auto-loaded by agents:

research-ops - For usability testing and heuristic evaluation

Use when you need:

Planning usability tests
Conducting user testing sessions
Evaluating interface designs
Identifying usability problems
Testing prototypes or live products
Applying Nielsen's heuristics
Measuring usability metrics

Core Concepts

What is Usability Testing?

Usability testing is a method for evaluating a product by testing it with representative users. Users attempt to complete typical tasks while observers watch, listen, and take notes.

Purpose: Identify usability problems, discover opportunities for improvement, and learn about user behavior and preferences.

When to use:

Before development (testing prototypes)
During development (iterative testing)
After launch (validation and optimization)
Before major redesigns

The Five Usability Quality Components (Jakob Nielsen)

Learnability: How easy is it for users to accomplish basic tasks the first time?
Efficiency: How quickly can users perform tasks once they've learned the design?
Memorability: Can users remember how to use it after time away?
Errors: How many errors do users make, how severe, and how easily can they recover?
Satisfaction: How pleasant is it to use the design?

Usability Testing Methodologies

1. Moderated Testing

Setup: Researcher guides participants through tasks in real-time Location: In-person or remote (video call)

Best for:

Early-stage prototypes needing clarification
Complex products requiring guidance
Exploring "why" behind user behavior
Uncovering emotional reactions

Process:

Welcome and set expectations
Pre-task questions (background, experience)
Task scenarios with think-aloud protocol
Post-task questions and discussion
Wrap-up and thank you

Advantages:

Rich qualitative insights
Can probe deeper into issues
Observe non-verbal cues
Clarify misunderstandings immediately

Limitations:

More time-intensive (30-60 min per session)
Researcher bias possible
Smaller sample sizes
Scheduling logistics

2. Unmoderated Testing

Setup: Participants complete tasks independently, recorded for later review Location: Remote, on participant's own schedule

Best for:

Mature products with clear tasks
Large sample sizes needed
Quick turnaround required
Benchmarking and metrics

Process:

Automated instructions and consent
Participants record screen/audio while completing tasks
Automated post-task surveys
Researcher reviews recordings later

Advantages:

Faster data collection
Larger sample sizes
More natural environment
Lower cost per participant

Limitations:

Can't probe or clarify
May miss nuanced insights
Technical issues harder to resolve
Participants may skip think-aloud

3. Hybrid Approaches

Combination methods:

Moderated first impressions + unmoderated task completion
Unmoderated testing + follow-up interviews with interesting cases
Moderated pilot + unmoderated scale testing

Nielsen's 10 Usability Heuristics

Quick reference for evaluating interfaces. See references/nielsens-10-heuristics.md for detailed explanations and examples.

Visibility of system status - Keep users informed
Match between system and real world - Speak users' language
User control and freedom - Provide escape hatches
Consistency and standards - Follow platform conventions
Error prevention - Prevent problems before they occur
Recognition rather than recall - Minimize memory load
Flexibility and efficiency of use - Accelerators for experts
Aesthetic and minimalist design - Remove irrelevant information
Help users recognize, diagnose, and recover from errors - Plain language error messages
Help and documentation - Provide when needed

Think-Aloud Protocol

What It Is

Participants verbalize their thoughts while completing tasks, providing real-time insight into their mental model.

Types

Concurrent think-aloud: Speak while performing tasks

More natural thought flow
May affect task performance slightly

Retrospective think-aloud: Review recording and explain thinking after

Doesn't disrupt natural behavior
May forget or rationalize thoughts

Facilitating Think-Aloud

Prompts to use:

"What are you thinking right now?"
"What are you looking for?"
"What would you expect to happen?"
"Is this what you expected?"

Don't:

Ask leading questions
Provide hints or solutions
Interrupt natural flow too often
Make participants feel tested

See references/think-aloud-protocol-guide.md for detailed facilitation techniques.

Task Scenario Design

Good task scenarios are critical to meaningful usability test results.

Characteristics of Good Task Scenarios

Realistic: Based on actual user goals Specific: Clear endpoint/success criteria Self-contained: Provide all necessary context Actionable: Clear starting point Not prescriptive: Don't tell them how to do it

Example Transformation

Poor: "Click on the 'My Account' link and change your password"

Too prescriptive, tells them exactly where to click

Good: "You've heard about recent security breaches and want to make your account more secure. Update your account to use a stronger password."

Realistic motivation, clear goal, doesn't prescribe path

Task Complexity Levels

Simple tasks (1-2 steps): Establish baseline usability Medium tasks (3-5 steps): Test core workflows Complex tasks (6+ steps): Evaluate overall experience and error recovery

See assets/task-scenario-template.md for ready-to-use templates.

Severity Rating Framework

Not all usability issues are equal. Prioritize fixes based on severity.

Three-Factor Severity Rating

Frequency: How often does this issue occur?

High: > 50% of users encounter
Medium: 10-50% encounter
Low: < 10% encounter

Impact: When it occurs, how badly does it affect users?

High: Prevents task completion / causes data loss
Medium: Causes frustration or delays
Low: Minor annoyance

Persistence: Do users overcome it with experience?

High: Problem doesn't go away
Medium: Users learn to avoid/work around
Low: One-time problem only

Combined Severity Ratings

Critical (P0): High frequency + High impact Serious (P1): High frequency + Medium impact, OR Medium frequency + High impact Moderate (P2): High frequency + Low impact, OR Medium frequency + Medium impact, OR Low frequency + High impact Minor (P3): Everything else

See assets/severity-rating-guide.md for detailed rating criteria and examples.

Usability Metrics

Quantitative Metrics

Task Success Rate: % of participants who complete task successfully

Binary: Did they complete it? (yes/no)
Partial credit: Did they complete most of it?

Time on Task: How long to complete (for successful completions)

Compare to baseline or competitor benchmarks

Error Rate: Number of errors per task

Define what counts as an error for each task

Clicks/Taps to Task Completion: Efficiency measure

More relevant for well-defined tasks

Standardized Questionnaires

SUS (System Usability Scale):

10 questions, 5-point Likert scale
Score 0-100 (industry avg ~68)
Quick, reliable, easy to administer
Good for comparing versions or benchmarking

UMUX (Usability Metric for User Experience):

4 questions, lighter than SUS
Similar reliability
Faster for participants

SEQ (Single Ease Question):

"Overall, how difficult or easy was the task to complete?" (1-7)
One question per task
Immediate subjective difficulty rating

Other scales:

SUPR-Q (for websites)
PSSUQ (post-study)
NASA-TLX (cognitive load)

Qualitative Insights

Observed behaviors:

Hesitations and confusion
Error patterns
Unexpected paths
Verbal frustrations

Verbalized thoughts (think-aloud):

Mental model mismatches
Expectation violations
Pleasantly surprising discoveries

Sample Size Guidelines

For Qualitative Insights

Nielsen's recommendation: 5 users finds ~85% of usability problems

Diminishing returns after 5
Run 3+ small rounds instead of 1 large round
Iterate between rounds

Reality check:

5 is a minimum, not ideal
Complex products may need 8-10
Multiple user types need 5 each

For Quantitative Metrics

Benchmarking: 20+ users per user group A/B testing: Depends on effect size and desired confidence Statistical significance: Use power analysis calculators

Planning Your Usability Test

1. Define Objectives

What decisions will this research inform?

Redesign priorities?
Feature cut decisions?
Success of recent changes?

2. Identify User Segments

Who needs to be tested?

New vs. experienced users?
Different roles or use cases?
Different devices or contexts?

3. Select Tasks

What tasks represent success?

Most critical user goals
Most frequent tasks
Recently changed features
Known problem areas

4. Choose Methodology

Moderated, unmoderated, or hybrid?

Consider timeline, budget, research questions

5. Create Test Script

See assets/usability-test-script-template.md for a ready-to-use structure including:

Welcome and consent
Background questions
Task instructions
Probing questions
Wrap-up

6. Recruit Participants

Define screening criteria
Aim for 5-10 per user segment
Plan for no-shows (recruit 20% extra)
Offer appropriate incentives

7. Conduct Pilot Test

Test with colleague or friend
Validate timing
Check recording setup
Refine unclear tasks

8. Run Sessions

Stay neutral and encouraging
Observe without interfering
Take detailed notes
Record if permitted

9. Analyze and Synthesize

Code issues by severity
Identify patterns across participants
Link issues to heuristics violated
Quantify task success and time

10. Report and Recommend

Prioritized issue list
Video clips of critical issues
Recommendations with rationale
Quick wins vs. strategic fixes

Integration with Product Development

When to Test

Discovery phase: Test competitors or analogous products Concept phase: Test paper prototypes or wireframes Design phase: Test high-fidelity mockups Development phase: Test working builds iteratively Pre-launch: Validate before release Post-launch: Identify optimization opportunities

Continuous Usability Testing

Build it into your process:

Weekly or bi-weekly test sessions
Rotating focus (new features, established flows, mobile vs. desktop)
Standing recruiting panel
Lightweight reporting to team

Ready-to-Use Templates

We provide templates to accelerate your usability testing:

In `assets/`:

usability-test-script-template.md: Complete moderator script structure
task-scenario-template.md: Framework for creating effective task scenarios
severity-rating-guide.md: Detailed criteria for rating usability issues

In `references/`:

nielsens-10-heuristics.md: Deep dive into each heuristic with examples
think-aloud-protocol-guide.md: Advanced facilitation techniques and troubleshooting

Common Pitfalls to Avoid

Leading participants: "Was that easy?" → "How would you describe that experience?"
Testing the wrong tasks: Tasks that aren't real user goals
Over-explaining: Let users struggle and discover issues naturally
Ignoring severity: Fixing cosmetic issues while critical issues remain
Testing too late: After it's expensive to change
Not iterating: One-and-done testing instead of continuous improvement
Confusing usability with preference: "I like green" ≠ usability issue
Sample bias: Testing only power users or only complete novices

Further Learning

Books:

"Rocket Surgery Made Easy" by Steve Krug
"Handbook of Usability Testing" by Jeffrey Rubin
"Moderating Usability Tests" by Joseph Dumas

Online resources:

Nielsen Norman Group articles
Usability.gov
Baymard Institute research

Troubleshooting

"Participants aren't finding real issues": Your tasks are too easy or too guided. Write tasks as goals ("Find a flight under $300 to Chicago"), not instructions ("Click the search button, then enter Chicago"). Let participants struggle -- that's where the insights are.

"Stakeholders dismiss usability findings": Quantify severity. Use Nielsen's severity scale (0-4) and pair each finding with task success rate data. "3 of 5 users failed to complete checkout" is harder to dismiss than "checkout is confusing."

"We only have 3 users available for testing": That's enough for a guerrilla test. Nielsen's research shows 5 users find ~85% of issues, but 3 users still find the majority. Run the test, fix the top issues, test again with 3 more.

Related Skills

interview-frameworks - User interview techniques for deeper qualitative research
synthesis-frameworks - Synthesizing usability findings into actionable insights
user-research-techniques - Broader research methods beyond usability testing
validation-frameworks - Validating solutions after usability improvements

Related skills

More from slgoodrich/agents

Installs

Repository

slgoodrich/agents

GitHub Stars

First Seen

Feb 26, 2026

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykPass