/ references / data-visualization.md
data-visualization.md
1 # Data Visualization 2 3 **Purpose:** Complete guide to designing clear, accurate, and effective data visualizations that help users understand and act on data. 4 5 **Principle:** Data visualization is not decoration. It's communication. Every visual element should serve understanding, not aesthetics. 6 7 --- 8 9 ## 1. Why Data Visualization Matters 10 11 ### The Power of Visual Communication 12 13 **Principle:** The human brain processes visual information 60,000x faster than text.** 14 15 **Impact:** 16 - **Faster understanding:** Patterns emerge in seconds, not minutes 17 - **Better decisions:** Visual data reveals insights hidden in tables 18 - **Increased engagement:** People remember visuals 65% better than text 19 - **Broader accessibility:** Complex data becomes understandable to non-experts 20 21 **The reality:** 22 Poor data visualization misleads, confuses, and erodes trust. Good visualization clarifies, reveals, and informs. 23 24 --- 25 26 ## 2. Core Principles 27 28 ### Edward Tufte's Fundamental Principles 29 30 **1. Data-Ink Ratio** 31 32 **Principle:** Maximize data-ink, minimize non-data-ink.** 33 34 ``` 35 Data-Ink Ratio = (Data ink) / (Total ink used to print the graphic) 36 ``` 37 38 **Goal:** Remove everything that doesn't carry data information. 39 40 **Examples of non-data-ink to remove:** 41 - Excessive grid lines 42 - Decorative borders and backgrounds 43 - 3D effects on 2D data 44 - Unnecessary axis labels 45 - Redundant legends 46 47 **2. Chart Junk** 48 49 **Avoid these common elements:** 50 - **Moiré vibration:** Patterns that create optical illusions 51 - **Grid pollution:** Heavy grid lines that compete with data 52 - **Duck data:** Form over function — decorative charts that obscure meaning 53 - **Unnecessary dimensions:** 3D pie charts, 3D bar charts 54 55 **3. Small Multiples** 56 57 **Principle:** Display multiple small charts instead of one complex chart.** 58 59 **Benefits:** 60 - Easy comparison across categories or time 61 - Reveals patterns and trends 62 - Scales to large datasets 63 - Avoids chart clutter 64 65 **Example:** 12 small monthly charts instead of one chart with 12 overlapping lines. 66 67 ### Colin Ware's Visual Principles 68 69 **3 stages of visual perception:** 70 1. **Parallel processing:** Instant detection of color, size, orientation, movement 71 2. **Pattern recognition:** Grouping, closure, continuity 72 3. **Sequential processing:** Focused attention, reading text 73 74 **Design implication:** Use pre-attentive attributes (color, size, position) for the most important data distinctions. 75 76 --- 77 78 ## 3. Choosing the Right Chart 79 80 ### Chart Selection Guide 81 82 **Comparison** 83 - **Bar chart:** Compare values across categories (vertical or horizontal) 84 - **Column chart:** Compare values across categories (vertical only) 85 - **Grouped bar chart:** Compare subcategories within main categories 86 - **Bullet chart:** Compare actual vs. target, with context ranges 87 88 **Trend over time** 89 - **Line chart:** Continuous data over time (most common) 90 - **Area chart:** Show volume/magnitude over time 91 - **Spline chart:** Smooth trends (use sparingly — can misrepresent data) 92 93 **Distribution** 94 - **Histogram:** Frequency distribution of continuous data 95 - **Box plot:** Statistical distribution with quartiles and outliers 96 - **Violin plot:** Distribution shape + box plot statistics 97 - **Density plot:** Smooth distribution curve 98 99 **Relationship** 100 - **Scatter plot:** Correlation between two variables 101 - **Bubble chart:** Scatter plot with third variable as size 102 - **Heatmap:** Correlation matrix or 2D density 103 104 **Composition** 105 - **Stacked bar chart:** Parts of whole across categories 106 - **Stacked area chart:** Parts of whole over time 107 - **Treemap:** Hierarchical part-to-whole relationships 108 - **Sunburst:** Multi-level hierarchical composition 109 110 **Avoid entirely:** 111 - **Pie charts:** Hard to compare angles, poor for >3 categories 112 - **Donut charts:** Same problems as pie charts 113 - **3D charts:** Distorts data, hard to read 114 - **Radar charts:** Hard to compare, perceptually inaccurate 115 116 --- 117 118 ## 4. Design Best Practices 119 120 ### Color in Data Visualization 121 122 **Principle:** Use color to highlight, not decorate.** 123 124 **Sequential color scales (continuous data):** 125 - Use for: Temperature, altitude, intensity 126 - Example: Light blue → Dark blue 127 - Rule: Single hue, varying lightness 128 129 **Diverging color scales (deviation from midpoint):** 130 - Use for: Positive/negative values, sentiment, profit/loss 131 - Example: Red → White → Green 132 - Rule: Two hues meeting at neutral midpoint 133 134 **Categorical color scales (distinct categories):** 135 - Use for: Product lines, regions, segments 136 - Example: Blue, Orange, Teal, Purple (distinct colors) 137 - Rule: Maximum 7-10 categories, colorblind-safe palette 138 139 **Color accessibility:** 140 - Test with color blindness simulators 141 - Don't rely on color alone (use patterns, labels, icons) 142 - Maintain contrast ratio of 3:1 for data points 143 - Use OKLCH for perceptually uniform colors 144 145 ### Typography 146 147 **Principle:** Text should support data, not compete with it.** 148 149 **Guidelines:** 150 - **Font:** Sans-serif (Inter, Roboto, system-ui) 151 - **Size:** 12-14px minimum for axis labels, 14-16px for annotations 152 - **Weight:** Regular for labels, Semibold for emphasis 153 - **Color:** #525252 or darker on light backgrounds (WCAG AA compliance) 154 155 **Label placement:** 156 - Axis labels: Left of Y-axis, below X-axis 157 - Data labels: Directly on/near data points (not in legend) 158 - Annotations: Near relevant data with pointer line 159 160 ### Layout and Spacing 161 162 **Principle:** Give data room to breathe.** 163 164 **Guidelines:** 165 - **Margins:** Minimum 40-60px on all sides 166 - **Axis spacing:** 20-30px between axis and chart area 167 - **Grid lines:** Light (#e5e5e5 or lighter), dashed or dotted 168 - **Padding:** 8-12px between chart elements 169 170 **Aspect ratio:** 171 - **Time series:** Width ≥ 2x height (wide format preferred) 172 - **Comparison charts:** Square or slightly wide 173 - **Small multiples:** Consistent aspect ratio across all charts 174 175 --- 176 177 ## 5. Interactive Visualizations 178 179 ### Interaction Design Principles 180 181 **Principle:** Interactivity should reveal, not obscure.** 182 183 **Essential interactions:** 184 1. **Tooltip:** Show exact values on hover/click 185 2. **Filter/zoom:** Focus on specific data subsets 186 3. **Highlight:** Dim non-selected data, emphasize selection 187 4. **Details on demand:** Click/drill down for more granularity 188 189 **Interaction feedback:** 190 - **Hover state:** Subtle brightness increase (10-15%) 191 - **Click state:** Distinct outline or color change 192 - **Loading state:** Skeleton or spinner within chart area 193 - **Empty state:** Clear message + illustration 194 195 **Progressive disclosure:** 196 - Start with high-level overview 197 - Allow drill-down to details 198 - Maintain context (breadcrumbs, back button) 199 - Animate transitions (300-500ms) 200 201 --- 202 203 ## 6. Accessibility 204 205 ### Making Data Visualizations Accessible 206 207 **Principle:** Data visualization should be understandable by everyone.** 208 209 **Visual accessibility:** 210 - **Color blindness:** Use distinct shapes + colors, test with simulators 211 - **Low vision:** Minimum contrast ratios, scalable text 212 - **Screen readers:** Provide data tables as alternative 213 - **Keyboard navigation:** All interactive elements reachable via Tab 214 215 **Semantic markup:** 216 ```html 217 <figure role="img" aria-labelledby="chart-title"> 218 <svg>...</svg> 219 <figcaption id="chart-title"> 220 Monthly revenue showing 20% growth from January to June 221 </figcaption> 222 <table class="sr-only"> 223 <!-- Data table for screen readers --> 224 </table> 225 </figure> 226 ``` 227 228 **Text alternatives:** 229 - Provide descriptive alt text for the chart 230 - Include data table as fallback 231 - Summarize key insights in text 232 - Use ARIA attributes for interactive elements 233 234 --- 235 236 ## 7. Common Data Visualization Mistakes 237 238 ### 1. Truncated Y-Axis 239 240 **Problem:** Y-axis doesn't start at zero, exaggerating differences. 241 242 **When it's OK:** Time series with small variations (stock prices, temperature) 243 **When to avoid:** Bar charts (truncation makes bars misleading) 244 245 **Solution:** Start bar charts at zero. For line charts, consider breaking the axis with a clear visual indicator. 246 247 ### 2. Too Many Data Points 248 249 **Problem:** Overcrowded chart, unreadable labels, information overload. 250 251 **Solution:** 252 - Aggregate data (hourly → daily → weekly) 253 - Use small multiples 254 - Implement zoom/filter 255 - Show top N + "other" category 256 257 ### 3. Wrong Chart Type 258 259 **Problem:** Using pie charts for comparison, line charts for categorical data. 260 261 **Solution:** Use the chart selection guide above. Match chart type to data relationship and user goal. 262 263 ### 4. Misleading Scales 264 265 **Problem:** Inconsistent time intervals, unequal bin sizes in histograms. 266 267 **Solution:** Always use consistent scales. If you must break scale, use clear visual indicators. 268 269 ### 5. No Context 270 271 **Problem:** Numbers without meaning, trends without comparison. 272 273 **Solution:** 274 - Add reference lines (average, target, previous period) 275 - Include contextual annotations 276 - Show change percentage alongside absolute values 277 - Provide comparison to benchmark 278 279 ### 6. Decorative Over Data 280 281 **Problem:** 3D effects, gradients, animations that distract from data. 282 283 **Solution:** Remove all non-data-ink. Every element must serve understanding. 284 285 --- 286 287 ## 8. Data Storytelling 288 289 ### From Charts to Insights 290 291 **Principle:** Good visualizations answer questions, not just display data.** 292 293 **Story structure:** 294 1. **Context:** What data is this? Why does it matter? 295 2. **Pattern:** What's the key trend or insight? 296 3. **Explanation:** What caused this pattern? 297 4. **Implication:** What should we do about it? 298 299 **Annotation framework:** 300 - **Title:** Action-oriented insight (not "Sales Chart" → "Sales grew 20% in Q3") 301 - **Subtitle:** Context and timeframe 302 - **Callouts:** Highlight key data points with annotations 303 - **Summary:** 1-2 sentence takeaway 304 305 **Example transformation:** 306 ``` 307 ❌ Bad: "Revenue by Month 2024" 308 309 ✅ Good: "Q4 drove 45% of annual revenue, fueled by holiday surge" 310 Subtitle: Monthly revenue January - December 2024 311 ``` 312 313 --- 314 315 ## 9. Responsive Data Visualization 316 317 ### Designing for All Screen Sizes 318 319 **Principle:** Data visualizations must work on mobile, tablet, and desktop.** 320 321 **Mobile-first approach:** 322 - **Small screens (375px):** 323 - Simplified charts (1-2 data series) 324 - Vertical orientation preferred 325 - Larger touch targets (44×44px minimum) 326 - Minimal axis labels (show key points only) 327 - Horizontal scroll for wide charts (with indicator) 328 329 - **Medium screens (768px):** 330 - 2-3 data series 331 - Standard aspect ratios 332 - Full axis labels 333 334 - **Large screens (1440px+):** 335 - Multiple data series 336 - Small multiples 337 - Rich interactions and annotations 338 339 **Responsive techniques:** 340 - Use SVG for scalability 341 - Implement conditional rendering (simplify on mobile) 342 - Horizontal scroll with visual indicator 343 - Portrait orientation on mobile, landscape on desktop 344 - Touch-optimized interactions (larger hit areas) 345 346 --- 347 348 ## 10. Testing and Validation 349 350 ### Usability Testing for Data Visualizations 351 352 **Principle:** Test with real users, real data, real questions.** 353 354 **Test setup:** 355 - **Participants:** 5-8 users from target audience 356 - **Tasks:** Answer specific questions using the visualization 357 - **Metrics:** Accuracy, time to insight, confidence level 358 359 **Test questions:** 360 - "What is the main trend shown in this chart?" 361 - "Which month had the highest value?" 362 - "How does X compare to Y?" 363 - "What would you do based on this data?" 364 365 **Success criteria:** 366 - 80%+ accuracy on interpretation questions 367 - Time to insight < 10 seconds for simple questions 368 - User confidence rating 4/5 or higher 369 - No major misinterpretations 370 371 **Common issues to watch for:** 372 - Misread axis scales 373 - Confusion about color encoding 374 - Inability to extract specific values 375 - Wrong chart type for the question 376 377 --- 378 379 ## 11. Tools and Libraries 380 381 ### Chart Libraries (2024+) 382 383 **Web:** 384 - **D3.js:** Maximum flexibility, steep learning curve 385 - **Chart.js:** Easy to get started, good for common charts 386 - **Recharts:** React-based, composable, declarative 387 - **Victory:** React-based, consistent API 388 - **Observable Plot:** D3-based, simpler API for common charts 389 390 **Design tools:** 391 - **Figma:** Manual chart creation, use for mockups 392 - **Tableau:** Business intelligence, drag-and-drop 393 - **Datawrapper:** Journalism and publication 394 - **Google Data Studio:** Dashboards and reporting 395 396 **Color tools:** 397 - **ColorBrewer:** Sequential, diverging, qualitative palettes 398 - **OKLCH picker:** Perceptually uniform color spaces 399 - **Chroma.js:** Color scales for data visualization 400 401 --- 402 403 ## 12. Quick Checklist 404 405 ### Before Finalizing a Visualization 406 407 **Data accuracy:** 408 - [ ] Data source is cited and trustworthy 409 - [ ] Scales are accurate and consistent 410 - [ ] Calculations are correct (percentages, averages) 411 - [ ] Time periods are clearly labeled 412 413 **Visual design:** 414 - [ ] Chart type matches data relationship 415 - [ ] Color scale matches data type (sequential/diverging/categorical) 416 - [ ] Text is readable (12-14px minimum) 417 - [ ] Sufficient spacing and margins 418 - [ ] No chart junk or decorative elements 419 420 **Accessibility:** 421 - [ ] Color blindness tested 422 - [ ] Contrast ratios meet WCAG AA (3:1 minimum) 423 - [ ] Screen reader alternative provided 424 - [ ] Keyboard navigation works 425 426 **Context and clarity:** 427 - [ ] Descriptive title with insight 428 - [ ] Axis labels are clear and complete 429 - [ ] Legend is necessary (or eliminated) 430 - [ ] Annotations explain key points 431 - [ ] Data source and timeframe noted 432 433 **Testing:** 434 - [ ] Tested with target users 435 - [ ] Validated on mobile (375px) 436 - [ ] Checked in light and dark mode 437 - [ ] Verified accuracy with domain expert 438 439 --- 440 441 ## 13. Example Transformations 442 443 ### Before and After 444 445 **Before: Pie Chart** 446 ``` 447 ❌ Bad: 448 - 7 slices, hard to compare 449 - 3D effect distorts angles 450 - Legend requires eye movement 451 - No context or insight 452 ``` 453 454 **After: Bar Chart** 455 ``` 456 ✅ Good: 457 - Horizontal bars, easy to compare 458 - Ranked by value 459 - Direct labeling (no legend) 460 - Clear title with insight 461 - Reference line for average 462 ``` 463 464 **Before: Multi-Line Time Series** 465 ``` 466 ❌ Bad: 467 - 12 overlapping lines 468 - Colors indistinguishable 469 - No callouts or context 470 - Hard to extract insights 471 ``` 472 473 **After: Small Multiples** 474 ``` 475 ✅ Good: 476 - 12 small charts, one per metric 477 - Easy comparison 478 - Clear trend in each 479 - Consistent scale across all 480 ``` 481 482 --- 483 484 **Remember:** The goal of data visualization is understanding, not decoration. Every element should serve clarity, accuracy, and insight. 485 486 **Good data visualization is invisible — users see the insights, not the design.**