/ references / usability-testing.md
usability-testing.md
  1  # Usability Testing Guide
  2  
  3  **Purpose:** Complete guide to testing designs with real users to uncover usability problems, validate decisions, and measure user experience.
  4  
  5  **Principle:** Testing with one user early is better than testing with 100 users too late. Usability testing is not optional — it's the difference between guessing and knowing.
  6  
  7  ---
  8  
  9  ## 1. Why Usability Testing Matters
 10  
 11  ### The Business Case
 12  
 13  **Principle:** Testing reduces risk, saves money, and creates better products.**
 14  
 15  **Impact:**
 16  - **ROI:** Every $1 spent on usability returns $10-100 in benefits
 17  - **Cost savings:** Fixing problems after launch costs 100x more than during design
 18  - **Conversion:** Usability testing can increase conversion rates by 50-100%
 19  - **Support:** Good usability reduces support tickets by 25-40%
 20  
 21  **The reality:**
 22  You are not your user. Your team is not your user. Without testing, you're designing based on assumptions — and assumptions are often wrong.
 23  
 24  ### When to Test
 25  
 26  **Test early, test often:**
 27  
 28  1. **Discovery (Before Design):** Understand user needs and current pain points
 29  2. **Exploration (During Design):** Validate design directions and prototypes
 30  3. **Validation (Before Launch):** Ensure usability and effectiveness
 31  4. **Iteration (After Launch):** Monitor performance and uncover improvements
 32  
 33  **Golden rule:** If you haven't tested, you don't know if it works.
 34  
 35  ---
 36  
 37  ## 2. Types of Usability Testing
 38  
 39  ### Moderated vs. Unmoderated
 40  
 41  **Moderated Testing:**
 42  - **What it is:** Facilitator guides participant through tasks in real-time
 43  - **When to use:** Exploratory research, complex tasks, need rich feedback
 44  - **Pros:** Deeper insights, can probe follow-up questions, observe body language
 45  - **Cons:** More expensive, time-consuming, facilitator bias risk
 46  - **Sample size:** 5-8 participants per user group
 47  
 48  **Unmoderated Testing:**
 49  - **What it is:** Participant completes tasks alone using online platform
 50  - **When to use:** Large-scale validation, benchmarking, simple tasks
 51  - **Pros:** Scalable, faster, cheaper, geographic flexibility
 52  - **Cons:** Limited insights, no follow-up questions, higher dropout
 53  - **Sample size:** 20+ participants for statistical significance
 54  
 55  ### In-Person vs. Remote
 56  
 57  **In-Person:**
 58  - **Pros:** Rich observational data (body language, facial expressions), controlled environment, can test physical products
 59  - **Cons:** Expensive, geographic limitations, logistics overhead
 60  - **Best for:** Early exploratory research, physical products, sensitive topics
 61  
 62  **Remote Moderated:**
 63  - **Pros:** Lower cost, geographic diversity, convenient for participants
 64  - **Cons:** Less observational data, technical issues possible
 65  - **Best for:** Most software testing, iterative validation
 66  
 67  **Remote Unmoderated:**
 68  - **Pros:** Highly scalable, fast results, asynchronous
 69  - **Cons:** No probing, surface-level insights, higher no-show rates
 70  - **Best for:** A/B testing, large-sample validation, benchmarking
 71  
 72  ### Qualitative vs. Quantitative
 73  
 74  **Qualitative:**
 75  - **Purpose:** Understand "why" users behave the way they do
 76  - **Output:** Insights, patterns, quotes, video clips
 77  - **Sample size:** 5-10 participants per user group (uncovers 80% of problems)
 78  - **Analysis:** Thematic analysis, affinity diagramming
 79  
 80  **Quantitative:**
 81  - **Purpose:** Measure "what" users do at scale
 82  - **Output:** Metrics, percentages, statistical significance
 83  - **Sample size:** 20+ participants for metrics, 100+ for statistical power
 84  - **Analysis:** Descriptive statistics, confidence intervals, significance testing
 85  
 86  **Best practice:** Combine both approaches for the sharpest insights.
 87  
 88  ---
 89  
 90  ## 3. Planning a Usability Test
 91  
 92  ### Step 1: Define Research Questions
 93  
 94  **Start with clear objectives.**
 95  
 96  Good research questions are:
 97  - **Specific:** Not "is it usable?" but "can users complete checkout in under 2 minutes?"
 98  - **Answerable:** Can be answered with the chosen method
 99  - **Actionable:** Results will inform design decisions
100  - **Focused:** 3-5 questions per study (not 20)
101  
102  **Examples:**
103  - ❌ "Is the design good?"
104  - ✅ "Can new users create an account without help?"
105  - ✅ "Where do users get stuck in the checkout flow?"
106  - ✅ "Which navigation structure helps users find products faster?"
107  
108  ### Step 2: Choose the Method
109  
110  **Match methods to questions:**
111  
112  | Research Goal | Best Method |
113  |---------------|--------------|
114  | Understand why users struggle | Moderated testing (qualitative) |
115  | Measure task completion rates | Unmoderated testing (quantitative) |
116  | Compare two designs | A/B testing |
117  - Explore new product space | Moderated field studies or diary studies
118  | Benchmark over time | Unmoderated recurring testing |
119  | Test with many users quickly | Unmoderated large-N study |
120  | Deep dive into specific issues | Moderated interviews with tasks |
121  
122  **Consider constraints:**
123  - Timeline (moderated takes longer)
124  - Budget (moderated costs more)
125  - Access to participants (some user groups are hard to recruit)
126  - Tools and expertise (do you have a lab? testing platform?)
127  
128  ### Step 3: Write Tasks
129  
130  **Principle:** Tasks should be realistic, specific, and actionable.**
131  
132  **Task template:**
133  ```
134  [Scenario context]
135  [Action to take]
136  [Success criteria]
137  ```
138  
139  **Examples:**
140  
141  ✅ **Good task:**
142  ```
143  "You're planning a weekend trip to San Francisco.
144  Find a hotel in downtown San Francisco for under $200/night,
145  and book it for Friday and Saturday nights."
146  
147  Success: User completes booking without errors
148  ```
149  
150  ❌ **Bad task:**
151  ```
152  "Book a hotel"
153  ```
154  (Too vague — no context, no criteria)
155  
156  ❌ **Bad task:**
157  ```
158  "Click the 'Search' button, then enter 'San Francisco',
159  then select the dates from the calendar..."
160  ```
161  (Too leading — tells user exactly what to do)
162  
163  **Task best practices:**
164  - Use realistic scenarios (not "test the search feature")
165  - Provide context, not instructions
166  - Avoid leading language ("use the filter to find...")
167  - Test one thing per task
168  - Keep tasks under 5 minutes each
169  - 5-8 tasks per session (60 minutes)
170  
171  ### Step 4: Create Test Materials
172  
173  **Essential materials:**
174  
175  1. **Test Plan:**
176     - Research questions and objectives
177     - Method and timeline
178     - Participant criteria (screening)
179     - Tasks and scenarios
180     - Success metrics
181  
182  2. **Screening Questionnaire:**
183     - Demographics (age, location, role)
184     - Experience level (novice vs. expert)
185     - Usage patterns (frequency, features used)
186     - Technical setup (device, browser, internet)
187     - Exclusion criteria (competitors, industry)
188  
189  3. **Discussion Guide (for moderated):**
190     - Introduction (2-3 minutes)
191     - Warm-up questions (build rapport)
192     - Tasks (30-40 minutes)
193     - Debrief questions (10-15 minutes)
194     - Closing (2-3 minutes)
195  
196  4. **Consent Form:**
197     - Purpose of research
198     - What will be recorded (audio, video, screen)
199     - How data will be used
200     - Confidentiality guarantees
201     - Right to withdraw
202     - Contact information
203  
204  5. **Prototype or Test Environment:**
205     - Figma prototype (moderated)
206     - Live staging site (unmoderated)
207     - Working build (beta testing)
208     - Paper sketches (early concept testing)
209  
210  ### Step 5: Recruit Participants
211  
212  **Principle:** Recruit users who represent your target audience.**
213  
214  **Recruitment channels:**
215  - **User database:** Existing customers or users
216  - **Recruitment agencies:** UserResearch.com, UserInterviews.com
217  - **Social media:** Targeted ads and posts
218  - **Referrals:** Current participants refer others
219  - **Intercept recruiting:** Approaching users in context (website popup)
220  
221  **Screening criteria:**
222  - **Demographics:** Age, location, role, income (if relevant)
223  - **Experience:** Novice vs. expert users
224  - **Usage patterns:** Frequency, features used, workflows
225  - **Technical setup:** Device, browser, internet speed
226  - **Exclusions:** Competitors, industry, recent participants
227  
228  **Incentives:**
229  - **Monetary:** $50-150 per session (varies by length and user type)
230  - **Gift cards:** Amazon, Visa, etc.
231  - **Product discounts:** Free months, credits
232  - **Early access:** Beta features, previews
233  - **Charity donation:** Donate to participant's choice
234  
235  **Sample size guidance:**
236  - **Qualitative:** 5 participants per user group (uncovers 80% of problems)
237  - **Quantitative:** 20+ participants for metrics, 100+ for statistical significance
238  - **A/B testing:** 1,000+ participants per variant for statistical power
239  - **Card sorting:** 15-30 participants
240  
241  **Overschedule:** 20-30% no-show rate is typical. Schedule 6-8 participants to get 5.
242  
243  ---
244  
245  ## 4. Conducting the Test
246  
247  ### Moderated Testing Session
248  
249  **Session structure (60 minutes):**
250  
251  1. **Introduction (5 min):**
252     - Welcome and build rapport
253     - Explain the process
254     - Obtain consent
255     - Set expectations ("you can't do anything wrong")
256  
257  2. **Warm-up (5 min):**
258     - Background questions
259     - Current behaviors/solutions
260     - Get comfortable talking
261  
262  3. **Tasks (40 min):**
263     - Present tasks one at a time
264     - Use "think-aloud" method
265     - Observe behavior, not just words
266     - Probe follow-up questions ("why did you click there?")
267  
268  4. **Debrief (10 min):**
269     - Overall impressions
270     - Likes and dislikes
271     - Suggestions for improvement
272     - Rank frustrations
273  
274  **Facilitator best practices:**
275  - **Stay neutral:** Don't lead participants to answers
276  - **Probe deeper:** Ask "why" and "tell me more"
277  - **Watch body language:** Confusion, frustration, delight
278  - **Record everything:** Audio, video, notes (with consent)
279  - **Be flexible:** Follow interesting threads
280  - **Respect time:** End on schedule, even if tasks aren't finished
281  
282  **Think-aloud method:**
283  Ask participants to narrate their thoughts:
284  > "I'm looking for the search bar... I don't see it... maybe it's in the menu? Oh, there it is. I'll search for 'San Francisco'..."
285  
286  **Probing questions:**
287  - "What did you expect to happen?"
288  - "What are you looking for right now?"
289  - "Tell me more about that choice."
290  - "What would make this easier?"
291  
292  ### Unmoderated Testing Setup
293  
294  **Platform options:**
295  - **UserTesting.com:** Large participant pool, video recordings
296  - **Maze:** Prototype testing, easy setup
297  - **Lookback:** Moderated and unmoderated options
298  - **Optimal Workshop:** Card sorting, tree testing
299  - **UserZoom:** Enterprise platform with many question types
300  
301  **Test setup:**
302  1. Write tasks (same as moderated, but clearer instructions)
303  2. Set up the prototype or URL
304  3. Configure screening questions
305  4. Write post-task questions (SUS, NPS, open-ended)
306  5. Launch and monitor (check first 2-3 completions)
307  6. Close when target sample reached
308  
309  **Unmoderated best practices:**
310  - Pilot test first (run through yourself)
311  - Include screening questions
312  - Keep tasks simple (no follow-up probing possible)
313  - Use video recordings (richer data than clicks)
314  - Include open-ended questions after each task
315  - Set a fair time limit (don't let participants struggle forever)
316  
317  ---
318  
319  ## 5. Measuring Usability
320  
321  ### Core Metrics
322  
323  **Task Success Rate:**
324  ```
325  Task Success Rate = (Number who completed task / Total who attempted) × 100%
326  ```
327  - **Binary:** Complete vs. incomplete (strict)
328  - **Success levels:** Complete, partial success, failure (lenient)
329  
330  **Time on Task:**
331  ```
332  Average Time = Sum of all task times / Number who completed
333  ```
334  - Measure from task start to success/failure
335  - Report median (not mean — outliers skew)
336  - Compare to expert time or benchmark
337  
338  **Error Rate:**
339  ```
340  Error Rate = (Number of errors / Number of opportunities for error) × 100%
341  ```
342  - Click errors (wrong clicks)
343  - Recovery errors (couldn't recover from error)
344  - Task abandonment (gave up)
345  
346  **Subjective Satisfaction:**
347  - **SUS (System Usability Scale):** 10-question survey, 0-100 scale
348  - **NPS (Net Promoter Score):** "How likely to recommend?" 0-10
349  - **CSAT (Customer Satisfaction):** "How satisfied?" 1-5 scale
350  - **Custom ratings:** "Easy to use?" 1-5 scale
351  
352  ### Task Success Benchmarks
353  
354  **Industry benchmarks (percent success):**
355  | Task Type | Excellent | Good | Acceptable | Poor |
356  |-----------|-----------|------|------------|------|
357  | Simple task (1 step) | 100% | 95%+ | 90%+ | <90% |
358  | Moderate task (2-3 steps) | 95%+ | 85%+ | 75%+ | <75% |
359  | Complex task (4+ steps) | 85%+ | 70%+ | 50%+ | <50% |
360  
361  **Time on task benchmarks:**
362  - Users should complete tasks in ~2x expert time
363  - If experts take 30 seconds, users should take <60 seconds
364  
365  **SUS benchmarks:**
366  - **80+:** Excellent
367  - **68-80:** Good
368  - **50-68:** OK (needs improvement)
369  - **<50:** Poor (major problems)
370  
371  ---
372  
373  ## 6. Analyzing Results
374  
375  ### Quantitative Analysis
376  
377  **Descriptive statistics:**
378  - Task success rates (percentage)
379  - Average time on task (median, range)
380  - Error rates (percentage)
381  - Satisfaction scores (mean, distribution)
382  
383  **Comparative statistics (for A/B tests):**
384  - **Chi-square:** Compare success rates between designs
385  - **T-test:** Compare time on task between designs
386  - **Confidence intervals:** Report 95% CI for all metrics
387  - **Statistical significance:** p < 0.05 means difference is real, not chance
388  
389  **Example reporting:**
390  > "Design A had a 78% success rate (CI: 72-84%) compared to Design B's 65% (CI: 59-71%), a statistically significant difference (χ²=4.2, p<0.05)."
391  
392  ### Qualitative Analysis
393  
394  **Affinity Diagramming:**
395  1. Write each observation/quote on a sticky note
396  2. Group notes by theme (patterns emerge)
397  3. Label themes with concise descriptors
398  4. Identify insights and opportunities
399  
400  **Thematic Analysis:**
401  1. **Open coding:** Tag data points with codes (e.g., "navigation confusion")
402  2. **Pattern recognition:** Group codes into themes
403  3. **Insight synthesis:** Identify "so what?" — what does this mean for design?
404  4. **Illustrate with quotes:** Support insights with direct quotes
405  
406  **Common themes to look for:**
407  - Navigation problems ("I couldn't find...")
408  - Confusion about terminology ("What does X mean?")
409  - Missing features ("I wish I could...")
410  - Workflow issues ("I expected to...")
411  - Emotional reactions (frustration, delight, surprise)
412  
413  ### Prioritizing Findings
414  
415  **Impact vs. Effort Matrix:**
416  ```
417  High Impact, Low Effort → Fix immediately
418  High Impact, High Effort → Plan for next iteration
419  Low Impact, Low Effort → Quick wins (if time)
420  Low Impact, High Effort → Ignore (or deprioritize)
421  ```
422  
423  **Severity rating for usability issues:**
424  - **Critical:** Blocks task completion, affects all users
425  - **Serious:** Causes errors, frustration, workarounds
426  - **Minor:** Annoying but doesn't block task
427  - **Cosmetic:** Visual preference, no functional impact
428  
429  **Frequency × Severity Matrix:**
430  - Fix issues that are critical + frequent first
431  - Then fix serious + frequent
432  - Then consider critical + rare (edge cases)
433  
434  ---
435  
436  ## 7. Reporting Findings
437  
438  ### Report Structure
439  
440  **1. Executive Summary (1 page):**
441  - Key findings (3-5 bullet points)
442  - Recommendations (prioritized list)
443  - Business impact (metrics, quotes)
444  
445  **2. Background:**
446  - Research questions and objectives
447  - Methods used (moderated vs. unmoderated, N=)
448  - Participants (who, how many)
449  - Timeline
450  
451  **3. Findings:**
452  - Organized by theme or research question
453  - Support with data (quotes, metrics, video clips)
454  - Distinguish between critical and nice-to-have
455  - Use visuals (screenshots, clips, heatmaps)
456  
457  **4. Recommendations:**
458  - Specific, actionable design changes
459  - Prioritized by impact and effort
460  - Aligned with business goals
461  - Include "quick wins" vs. long-term
462  
463  **5. Appendices:**
464  - Detailed methodology
465  - Full transcript excerpts
466  - Screening criteria
467  - Test materials (tasks, consent form)
468  
469  ### Presentation Tips
470  
471  **Start with insights, not methodology.**
472  - ❌ "We conducted a moderated usability study with 8 participants..."
473  - ✅ "8 out of 10 users couldn't complete checkout. Here's why..."
474  
475  **Use video clips and quotes.**
476  - Show, don't just tell. A 30-second clip of a user struggling is more powerful than any statistic.
477  
478  **Include stakeholders in analysis.**
479  - Invite team members to watch sessions
480  - Co-create recommendations with designers and developers
481  - Build buy-in through involvement
482  
483  **Provide clear next steps.**
484  - What should we do first?
485  - What will we test next?
486  - What did we learn that changes our roadmap?
487  
488  ---
489  
490  ## 8. Common Usability Testing Mistakes
491  
492  ### 1. Testing Too Late
493  
494  **Problem:** Testing after decisions are locked in.
495  
496  **Solution:** Test early and often. Paper sketches > no testing.
497  
498  ### 2. Testing with the Wrong Users
499  
500  **Problem:** Testing with colleagues, friends, or non-representative users.
501  
502  **Solution:** Recruit participants who match your target user profile. Use screening criteria.
503  
504  ### 3. Leading Questions
505  
506  **Problem:** "Don't you think the blue button is better?"
507  
508  **Solution:** Use neutral language. "Which button did you prefer? Why?"
509  
510  ### 4. Testing the Script, Not the Design
511  
512  **Problem:** Step-by-step instructions that tell users exactly what to do.
513  
514  **Solution:** Provide scenarios, not instructions. Let users figure it out.
515  
516  ### 5. Ignoring Context
517  
518  **Problem:** Testing in a lab that doesn't reflect real use (quiet, controlled).
519  
520  **Solution:** Combine lab testing with field studies and remote testing.
521  
522  ### 6. Analysis Paralysis
523  
524  **Problem:** Collecting data but not analyzing or acting on it.
525  
526  **Solution:** Start analysis immediately after sessions. Report within 1 week.
527  
528  ### 7. Testing Without Action
529  
530  **Problem:** Findings sit in reports but don't influence design.
531  
532  **Solution:** Involve stakeholders in testing. Present actionable recommendations. Track implementation.
533  
534  ---
535  
536  ## 9. A/B Testing
537  
538  ### When to Use A/B Testing
539  
540  **Principle:** A/B testing compares two designs to measure which performs better.**
541  
542  **Use A/B testing for:**
543  - Validating design changes (new vs. old)
544  - Testing specific elements (headline, CTA, layout)
545  - Optimizing conversion rates (sign-ups, purchases)
546  - Settling debates within the team
547  - Measuring incremental improvements
548  
549  **Don't use A/B testing for:**
550  - Exploratory research (use qualitative methods)
551  - Understanding "why" (use moderated testing)
552  - Testing radically different concepts (use concept testing)
553  - Making major strategic decisions (use broader research)
554  
555  ### A/B Testing Process
556  
557  **1. Define hypothesis:**
558  ```
559  "If we change the CTA button from green to orange,
560  then click-through rate will increase by 10%."
561  ```
562  
563  **2. Determine sample size:**
564  - Use power analysis calculators
565  - Typical: 1,000+ participants per variant
566  - More participants = smaller detectable effect
567  
568  **3. Random assignment:**
569  - 50% see Design A, 50% see Design B
570  - Ensure randomization works (no bias)
571  
572  **4. Run the test:**
573  - Run for at least 1-2 weeks (capture weekly patterns)
574  - Don't stop early (peeking invalidates results)
575  - Monitor for bugs or unexpected issues
576  
577  **5. Analyze results:**
578  - Calculate statistical significance (p < 0.05)
579  - Calculate confidence intervals
580  - Report effect size (not just significance)
581  
582  **6. Make a decision:**
583  - If significant: Implement winner
584  - If not significant: Keep current or test something else
585  - Document learning for future tests
586  
587  ### A/B Testing Best Practices
588  
589  - **Test one thing at a time:** Don't change headline + CTA + layout all at once
590  - **Use statistical significance:** Don't make decisions based on noise
591  - **Run long enough:** Capture weekly patterns (don't run Friday-Monday)
592  - **Segment your data:** Results may differ by user type, geography, device
593  - **Document everything:** Hypothesis, sample size, results, learning
594  - **Don't stop early:** Peeking at results before test ends invalidates statistics
595  
596  ---
597  
598  ## 10. Rapid Testing Methods
599  
600  ### Guerrilla Testing
601  
602  **What it is:** Quick, informal testing with whoever is available (café, conference, hallway).
603  
604  **When to use:**
605  - Early concept validation
606  - Low-risk design questions
607  - Extremely limited budget/time
608  - Exploratory research
609  
610  **How to do it:**
611  1. Create a simple prototype (paper, Figma, working build)
612  2. Go where users are (café, campus, event)
613  3. Ask 5-10 minutes of their time
614  4. Offer small incentive (coffee, gift card)
615  5. Ask 3-5 key questions
616  
617  **Pros:** Fast, cheap, flexible
618  **Cons:** Not representative, small sample, no screening
619  
620  ### hallway Testing
621  
622  **What it is:** Testing with colleagues who are not on the project.
623  
624  **When to use:**
625  - Sanity check before user testing
626  - Catch obvious issues
627  - Get quick feedback
628  
629  **How to do it:**
630  1. Grab someone from a different team
631  2. Give them a task
632  3. Watch where they struggle
633  4. Take notes (don't coach)
634  
635  **Pros:** Very fast, free, catches major issues
636  **Cons:** Not representative, biased (too familiar with tech)
637  
638  ### 5-Second Tests
639  
640  **What it is:** Users see a design for 5 seconds, then answer questions.
641  
642  **When to use:**
643  - Test first impressions
644  - Test clarity of value proposition
645  - Test visual hierarchy
646  
647  **How to do it:**
648  1. Show design for 5 seconds
649  2. Hide it
650  3. Ask: "What do you remember?" "What can you do here?"
651  4. Analyze what stood out
652  
653  **Pros:** Very fast, tests first impressions
654  **Cons:** Limited depth, doesn't test interaction
655  
656  ---
657  
658  ## 11. Usability Testing Tools
659  
660  ### Moderated Testing Platforms
661  - **Zoom/Skype:** Screen sharing + recording (free/cheap)
662  - **Lookback:** Moderated remote testing with recording
663  - **UserTesting.com:** Both moderated and unmoderated options
664  
665  ### Unmoderated Testing Platforms
666  - **UserTesting.com:** Large participant pool, video recordings
667  - **Maze:** Prototype testing, easy setup
668  - **UserZoom:** Enterprise platform with many question types
669  - **TryMyUI:** Smaller platform, good value
670  
671  ### Card Sorting and Tree Testing
672  - **Optimal Workshop:** Industry standard for card sorting
673  - **UserZoom:** Includes card sorting and tree testing
674  - **Maze:** Basic card sorting features
675  
676  ### Analytics and Heatmaps
677  - **Hotjar:** Heatmaps, session recordings, feedback polls
678  - **Crazy Egg:** Heatmaps and click tracking
679  - **FullStory:** Session replay and conversion analytics
680  
681  ### Analysis and Reporting
682  - **Miro/Lucidchart:** Affinity diagramming
683  - **Excel/Google Sheets:** Quantitative analysis
684  - **Dovetail:** Research repository and analysis
685  - **Notion:** Research planning and documentation
686  
687  ---
688  
689  ## 12. Quick Checklist
690  
691  ### Planning
692  - [ ] Define clear research questions
693  - [ ] Choose appropriate method
694  - [ ] Write realistic tasks
695  - [ ] Create screening criteria
696  - [ ] Prepare test plan and materials
697  - [ ] Create consent forms
698  
699  ### Recruitment
700  - [ ] Determine sample size (5-8 for qualitative, 20+ for quantitative)
701  - [ ] Choose recruitment channel
702  - [ ] Set incentives
703  - [ ] Schedule participants (overschedule by 20-30%)
704  
705  ### Execution
706  - [ ] Pilot test materials
707  - [ ] Conduct test sessions
708  - [ ] Record sessions (with consent)
709  - [ ] Take detailed notes
710  
711  ### Analysis
712  - [ ] Transcribe recordings (if needed)
713  - [ ] Code and categorize data
714  - [ ] Calculate metrics (success rate, time on task, satisfaction)
715  - [ ] Identify patterns and insights
716  - [ ] Prioritize findings by severity and frequency
717  
718  ### Reporting
719  - [ ] Create executive summary
720  - [ ] Illustrate with video clips and quotes
721  - [ ] Provide actionable recommendations
722  - [ ] Present to stakeholders
723  - [ ] Archive research for future reference
724  
725  ---
726  
727  ## Further Reading
728  
729  - **NN/g:** Usability Testing 101: https://www.nngroup.com/articles/usability-testing-101/
730  - **Jakob Nielsen:** Why You Only Need to Test with 5 Users: https://www.nngroup.com/articles/why-you-only-need-to-test-with-5-users/
731  - **Rolf Molich:** What Does a Usability Test Cost?: https://www.dialogdesign.dk/cost-usability-test/
732  - **Krug, Steve:** Rocket Surgery Made Easy (Book)
733  - **Ruby, Laura:** Handbook of Usability Testing (Book)
734  
735  ---
736  
737  **Remember:** Testing early is better than testing late. Testing with one user is better than testing with none.
738  
739  **You are not your user. The only way to know if your design works is to test it.**