/ references / data-visualization.md
data-visualization.md
  1  # Data Visualization
  2  
  3  **Purpose:** Complete guide to designing clear, accurate, and effective data visualizations that help users understand and act on data.
  4  
  5  **Principle:** Data visualization is not decoration. It's communication. Every visual element should serve understanding, not aesthetics.
  6  
  7  ---
  8  
  9  ## 1. Why Data Visualization Matters
 10  
 11  ### The Power of Visual Communication
 12  
 13  **Principle:** The human brain processes visual information 60,000x faster than text.**
 14  
 15  **Impact:**
 16  - **Faster understanding:** Patterns emerge in seconds, not minutes
 17  - **Better decisions:** Visual data reveals insights hidden in tables
 18  - **Increased engagement:** People remember visuals 65% better than text
 19  - **Broader accessibility:** Complex data becomes understandable to non-experts
 20  
 21  **The reality:**
 22  Poor data visualization misleads, confuses, and erodes trust. Good visualization clarifies, reveals, and informs.
 23  
 24  ---
 25  
 26  ## 2. Core Principles
 27  
 28  ### Edward Tufte's Fundamental Principles
 29  
 30  **1. Data-Ink Ratio**
 31  
 32  **Principle:** Maximize data-ink, minimize non-data-ink.**
 33  
 34  ```
 35  Data-Ink Ratio = (Data ink) / (Total ink used to print the graphic)
 36  ```
 37  
 38  **Goal:** Remove everything that doesn't carry data information.
 39  
 40  **Examples of non-data-ink to remove:**
 41  - Excessive grid lines
 42  - Decorative borders and backgrounds
 43  - 3D effects on 2D data
 44  - Unnecessary axis labels
 45  - Redundant legends
 46  
 47  **2. Chart Junk**
 48  
 49  **Avoid these common elements:**
 50  - **Moiré vibration:** Patterns that create optical illusions
 51  - **Grid pollution:** Heavy grid lines that compete with data
 52  - **Duck data:** Form over function — decorative charts that obscure meaning
 53  - **Unnecessary dimensions:** 3D pie charts, 3D bar charts
 54  
 55  **3. Small Multiples**
 56  
 57  **Principle:** Display multiple small charts instead of one complex chart.**
 58  
 59  **Benefits:**
 60  - Easy comparison across categories or time
 61  - Reveals patterns and trends
 62  - Scales to large datasets
 63  - Avoids chart clutter
 64  
 65  **Example:** 12 small monthly charts instead of one chart with 12 overlapping lines.
 66  
 67  ### Colin Ware's Visual Principles
 68  
 69  **3 stages of visual perception:**
 70  1. **Parallel processing:** Instant detection of color, size, orientation, movement
 71  2. **Pattern recognition:** Grouping, closure, continuity
 72  3. **Sequential processing:** Focused attention, reading text
 73  
 74  **Design implication:** Use pre-attentive attributes (color, size, position) for the most important data distinctions.
 75  
 76  ---
 77  
 78  ## 3. Choosing the Right Chart
 79  
 80  ### Chart Selection Guide
 81  
 82  **Comparison**
 83  - **Bar chart:** Compare values across categories (vertical or horizontal)
 84  - **Column chart:** Compare values across categories (vertical only)
 85  - **Grouped bar chart:** Compare subcategories within main categories
 86  - **Bullet chart:** Compare actual vs. target, with context ranges
 87  
 88  **Trend over time**
 89  - **Line chart:** Continuous data over time (most common)
 90  - **Area chart:** Show volume/magnitude over time
 91  - **Spline chart:** Smooth trends (use sparingly — can misrepresent data)
 92  
 93  **Distribution**
 94  - **Histogram:** Frequency distribution of continuous data
 95  - **Box plot:** Statistical distribution with quartiles and outliers
 96  - **Violin plot:** Distribution shape + box plot statistics
 97  - **Density plot:** Smooth distribution curve
 98  
 99  **Relationship**
100  - **Scatter plot:** Correlation between two variables
101  - **Bubble chart:** Scatter plot with third variable as size
102  - **Heatmap:** Correlation matrix or 2D density
103  
104  **Composition**
105  - **Stacked bar chart:** Parts of whole across categories
106  - **Stacked area chart:** Parts of whole over time
107  - **Treemap:** Hierarchical part-to-whole relationships
108  - **Sunburst:** Multi-level hierarchical composition
109  
110  **Avoid entirely:**
111  - **Pie charts:** Hard to compare angles, poor for >3 categories
112  - **Donut charts:** Same problems as pie charts
113  - **3D charts:** Distorts data, hard to read
114  - **Radar charts:** Hard to compare, perceptually inaccurate
115  
116  ---
117  
118  ## 4. Design Best Practices
119  
120  ### Color in Data Visualization
121  
122  **Principle:** Use color to highlight, not decorate.**
123  
124  **Sequential color scales (continuous data):**
125  - Use for: Temperature, altitude, intensity
126  - Example: Light blue → Dark blue
127  - Rule: Single hue, varying lightness
128  
129  **Diverging color scales (deviation from midpoint):**
130  - Use for: Positive/negative values, sentiment, profit/loss
131  - Example: Red → White → Green
132  - Rule: Two hues meeting at neutral midpoint
133  
134  **Categorical color scales (distinct categories):**
135  - Use for: Product lines, regions, segments
136  - Example: Blue, Orange, Teal, Purple (distinct colors)
137  - Rule: Maximum 7-10 categories, colorblind-safe palette
138  
139  **Color accessibility:**
140  - Test with color blindness simulators
141  - Don't rely on color alone (use patterns, labels, icons)
142  - Maintain contrast ratio of 3:1 for data points
143  - Use OKLCH for perceptually uniform colors
144  
145  ### Typography
146  
147  **Principle:** Text should support data, not compete with it.**
148  
149  **Guidelines:**
150  - **Font:** Sans-serif (Inter, Roboto, system-ui)
151  - **Size:** 12-14px minimum for axis labels, 14-16px for annotations
152  - **Weight:** Regular for labels, Semibold for emphasis
153  - **Color:** #525252 or darker on light backgrounds (WCAG AA compliance)
154  
155  **Label placement:**
156  - Axis labels: Left of Y-axis, below X-axis
157  - Data labels: Directly on/near data points (not in legend)
158  - Annotations: Near relevant data with pointer line
159  
160  ### Layout and Spacing
161  
162  **Principle:** Give data room to breathe.**
163  
164  **Guidelines:**
165  - **Margins:** Minimum 40-60px on all sides
166  - **Axis spacing:** 20-30px between axis and chart area
167  - **Grid lines:** Light (#e5e5e5 or lighter), dashed or dotted
168  - **Padding:** 8-12px between chart elements
169  
170  **Aspect ratio:**
171  - **Time series:** Width ≥ 2x height (wide format preferred)
172  - **Comparison charts:** Square or slightly wide
173  - **Small multiples:** Consistent aspect ratio across all charts
174  
175  ---
176  
177  ## 5. Interactive Visualizations
178  
179  ### Interaction Design Principles
180  
181  **Principle:** Interactivity should reveal, not obscure.**
182  
183  **Essential interactions:**
184  1. **Tooltip:** Show exact values on hover/click
185  2. **Filter/zoom:** Focus on specific data subsets
186  3. **Highlight:** Dim non-selected data, emphasize selection
187  4. **Details on demand:** Click/drill down for more granularity
188  
189  **Interaction feedback:**
190  - **Hover state:** Subtle brightness increase (10-15%)
191  - **Click state:** Distinct outline or color change
192  - **Loading state:** Skeleton or spinner within chart area
193  - **Empty state:** Clear message + illustration
194  
195  **Progressive disclosure:**
196  - Start with high-level overview
197  - Allow drill-down to details
198  - Maintain context (breadcrumbs, back button)
199  - Animate transitions (300-500ms)
200  
201  ---
202  
203  ## 6. Accessibility
204  
205  ### Making Data Visualizations Accessible
206  
207  **Principle:** Data visualization should be understandable by everyone.**
208  
209  **Visual accessibility:**
210  - **Color blindness:** Use distinct shapes + colors, test with simulators
211  - **Low vision:** Minimum contrast ratios, scalable text
212  - **Screen readers:** Provide data tables as alternative
213  - **Keyboard navigation:** All interactive elements reachable via Tab
214  
215  **Semantic markup:**
216  ```html
217  <figure role="img" aria-labelledby="chart-title">
218    <svg>...</svg>
219    <figcaption id="chart-title">
220      Monthly revenue showing 20% growth from January to June
221    </figcaption>
222    <table class="sr-only">
223      <!-- Data table for screen readers -->
224    </table>
225  </figure>
226  ```
227  
228  **Text alternatives:**
229  - Provide descriptive alt text for the chart
230  - Include data table as fallback
231  - Summarize key insights in text
232  - Use ARIA attributes for interactive elements
233  
234  ---
235  
236  ## 7. Common Data Visualization Mistakes
237  
238  ### 1. Truncated Y-Axis
239  
240  **Problem:** Y-axis doesn't start at zero, exaggerating differences.
241  
242  **When it's OK:** Time series with small variations (stock prices, temperature)
243  **When to avoid:** Bar charts (truncation makes bars misleading)
244  
245  **Solution:** Start bar charts at zero. For line charts, consider breaking the axis with a clear visual indicator.
246  
247  ### 2. Too Many Data Points
248  
249  **Problem:** Overcrowded chart, unreadable labels, information overload.
250  
251  **Solution:**
252  - Aggregate data (hourly → daily → weekly)
253  - Use small multiples
254  - Implement zoom/filter
255  - Show top N + "other" category
256  
257  ### 3. Wrong Chart Type
258  
259  **Problem:** Using pie charts for comparison, line charts for categorical data.
260  
261  **Solution:** Use the chart selection guide above. Match chart type to data relationship and user goal.
262  
263  ### 4. Misleading Scales
264  
265  **Problem:** Inconsistent time intervals, unequal bin sizes in histograms.
266  
267  **Solution:** Always use consistent scales. If you must break scale, use clear visual indicators.
268  
269  ### 5. No Context
270  
271  **Problem:** Numbers without meaning, trends without comparison.
272  
273  **Solution:**
274  - Add reference lines (average, target, previous period)
275  - Include contextual annotations
276  - Show change percentage alongside absolute values
277  - Provide comparison to benchmark
278  
279  ### 6. Decorative Over Data
280  
281  **Problem:** 3D effects, gradients, animations that distract from data.
282  
283  **Solution:** Remove all non-data-ink. Every element must serve understanding.
284  
285  ---
286  
287  ## 8. Data Storytelling
288  
289  ### From Charts to Insights
290  
291  **Principle:** Good visualizations answer questions, not just display data.**
292  
293  **Story structure:**
294  1. **Context:** What data is this? Why does it matter?
295  2. **Pattern:** What's the key trend or insight?
296  3. **Explanation:** What caused this pattern?
297  4. **Implication:** What should we do about it?
298  
299  **Annotation framework:**
300  - **Title:** Action-oriented insight (not "Sales Chart" → "Sales grew 20% in Q3")
301  - **Subtitle:** Context and timeframe
302  - **Callouts:** Highlight key data points with annotations
303  - **Summary:** 1-2 sentence takeaway
304  
305  **Example transformation:**
306  ```
307  ❌ Bad: "Revenue by Month 2024"
308  
309  ✅ Good: "Q4 drove 45% of annual revenue, fueled by holiday surge"
310  Subtitle: Monthly revenue January - December 2024
311  ```
312  
313  ---
314  
315  ## 9. Responsive Data Visualization
316  
317  ### Designing for All Screen Sizes
318  
319  **Principle:** Data visualizations must work on mobile, tablet, and desktop.**
320  
321  **Mobile-first approach:**
322  - **Small screens (375px):**
323    - Simplified charts (1-2 data series)
324    - Vertical orientation preferred
325    - Larger touch targets (44×44px minimum)
326    - Minimal axis labels (show key points only)
327    - Horizontal scroll for wide charts (with indicator)
328  
329  - **Medium screens (768px):**
330    - 2-3 data series
331    - Standard aspect ratios
332    - Full axis labels
333  
334  - **Large screens (1440px+):**
335    - Multiple data series
336    - Small multiples
337    - Rich interactions and annotations
338  
339  **Responsive techniques:**
340  - Use SVG for scalability
341  - Implement conditional rendering (simplify on mobile)
342  - Horizontal scroll with visual indicator
343  - Portrait orientation on mobile, landscape on desktop
344  - Touch-optimized interactions (larger hit areas)
345  
346  ---
347  
348  ## 10. Testing and Validation
349  
350  ### Usability Testing for Data Visualizations
351  
352  **Principle:** Test with real users, real data, real questions.**
353  
354  **Test setup:**
355  - **Participants:** 5-8 users from target audience
356  - **Tasks:** Answer specific questions using the visualization
357  - **Metrics:** Accuracy, time to insight, confidence level
358  
359  **Test questions:**
360  - "What is the main trend shown in this chart?"
361  - "Which month had the highest value?"
362  - "How does X compare to Y?"
363  - "What would you do based on this data?"
364  
365  **Success criteria:**
366  - 80%+ accuracy on interpretation questions
367  - Time to insight < 10 seconds for simple questions
368  - User confidence rating 4/5 or higher
369  - No major misinterpretations
370  
371  **Common issues to watch for:**
372  - Misread axis scales
373  - Confusion about color encoding
374  - Inability to extract specific values
375  - Wrong chart type for the question
376  
377  ---
378  
379  ## 11. Tools and Libraries
380  
381  ### Chart Libraries (2024+)
382  
383  **Web:**
384  - **D3.js:** Maximum flexibility, steep learning curve
385  - **Chart.js:** Easy to get started, good for common charts
386  - **Recharts:** React-based, composable, declarative
387  - **Victory:** React-based, consistent API
388  - **Observable Plot:** D3-based, simpler API for common charts
389  
390  **Design tools:**
391  - **Figma:** Manual chart creation, use for mockups
392  - **Tableau:** Business intelligence, drag-and-drop
393  - **Datawrapper:** Journalism and publication
394  - **Google Data Studio:** Dashboards and reporting
395  
396  **Color tools:**
397  - **ColorBrewer:** Sequential, diverging, qualitative palettes
398  - **OKLCH picker:** Perceptually uniform color spaces
399  - **Chroma.js:** Color scales for data visualization
400  
401  ---
402  
403  ## 12. Quick Checklist
404  
405  ### Before Finalizing a Visualization
406  
407  **Data accuracy:**
408  - [ ] Data source is cited and trustworthy
409  - [ ] Scales are accurate and consistent
410  - [ ] Calculations are correct (percentages, averages)
411  - [ ] Time periods are clearly labeled
412  
413  **Visual design:**
414  - [ ] Chart type matches data relationship
415  - [ ] Color scale matches data type (sequential/diverging/categorical)
416  - [ ] Text is readable (12-14px minimum)
417  - [ ] Sufficient spacing and margins
418  - [ ] No chart junk or decorative elements
419  
420  **Accessibility:**
421  - [ ] Color blindness tested
422  - [ ] Contrast ratios meet WCAG AA (3:1 minimum)
423  - [ ] Screen reader alternative provided
424  - [ ] Keyboard navigation works
425  
426  **Context and clarity:**
427  - [ ] Descriptive title with insight
428  - [ ] Axis labels are clear and complete
429  - [ ] Legend is necessary (or eliminated)
430  - [ ] Annotations explain key points
431  - [ ] Data source and timeframe noted
432  
433  **Testing:**
434  - [ ] Tested with target users
435  - [ ] Validated on mobile (375px)
436  - [ ] Checked in light and dark mode
437  - [ ] Verified accuracy with domain expert
438  
439  ---
440  
441  ## 13. Example Transformations
442  
443  ### Before and After
444  
445  **Before: Pie Chart**
446  ```
447  ❌ Bad:
448  - 7 slices, hard to compare
449  - 3D effect distorts angles
450  - Legend requires eye movement
451  - No context or insight
452  ```
453  
454  **After: Bar Chart**
455  ```
456  ✅ Good:
457  - Horizontal bars, easy to compare
458  - Ranked by value
459  - Direct labeling (no legend)
460  - Clear title with insight
461  - Reference line for average
462  ```
463  
464  **Before: Multi-Line Time Series**
465  ```
466  ❌ Bad:
467  - 12 overlapping lines
468  - Colors indistinguishable
469  - No callouts or context
470  - Hard to extract insights
471  ```
472  
473  **After: Small Multiples**
474  ```
475  ✅ Good:
476  - 12 small charts, one per metric
477  - Easy comparison
478  - Clear trend in each
479  - Consistent scale across all
480  ```
481  
482  ---
483  
484  **Remember:** The goal of data visualization is understanding, not decoration. Every element should serve clarity, accuracy, and insight.
485  
486  **Good data visualization is invisible — users see the insights, not the design.**