Charting Intelligence & Data-to-Viz Pipeline Engineering

Enable any agent to (1) instantly identify the exact chart needed from raw data, (2) generate the precise path of queries/transforms to materialize that chart, and (3) evaluate and choose the optimal charting library/stack based on performance, scale, and interactivity requirements.

This is not "just call a library" — it is full-stack visualization strategy.

1. Core Decision Framework — Choosing the Chart That Fits the Data AND the Story

Before any code runs, answer these questions in order:

What is the goal of the viewer?

Goal	Chart Type
Compare values	Bar/Column (grouped or stacked)
Show trend over time	Line or Area
Show distribution / spread	Histogram, Box Plot, Violin
Show relationship / correlation	Scatter, Bubble, Heatmap
Show composition / parts-of-whole	Stacked Bar or Area (never pie if >5 slices)
Show hierarchy / flow	Treemap, Sunburst, Sankey
Show geographic pattern	Choropleth or Symbol Map

How many variables and what types?

Variables	Chart
1 numeric, unordered	Histogram / Density
1 numeric + time	Line
1 categorical + 1 numeric	Bar
2 numeric	Scatter
1 categorical + time series	Grouped or Stacked Line/Area
Many-to-many relationships	Heatmap or Parallel Coordinates

Audience & Context Check

Audience	Approach
Executive dashboard	Big numbers + simple bars/lines, zero clutter
Analyst/explorer	Interactive tooltips, zoom, hover details, multiple linked views
Mobile	Horizontal bars, large text, minimal colors
Accessibility	High contrast, patterns instead of color-only, alt-text descriptions

Rule of Thumb Table

Data Situation	Best Chart (first choice)	Avoid
>5 categories	Bar (horizontal)	Pie
Time series >20 points	Line	Column
Correlation between 2 measures	Scatter	Line (unless ordered)
Parts of whole >5 slices	Stacked Bar or Treemap	Pie/Donut
Outliers or distribution shape	Box + Violin	Bar
Flow between stages	Sankey	Anything else

2. The Data Pipeline Engine

Most databases do NOT have the exact aggregation ready. Auto-generate the full pipeline:

Step A — Inventory

Scan schema or sample 100 rows — detect column types, null rates, cardinality
Flag missing aggregations (e.g., "no daily_sales_by_region view exists")

Step B — Required Transformations

Auto-generate SQL or pandas code for:

Joins needed?
GROUP BY + SUM/AVG/COUNT?
Window functions for running totals or YoY?
Binning (e.g., age into decades)?
Pivot/unpivot?
Outlier flagging or imputation?

Step C — Materialization Strategy

Scale	Strategy
One-off (<10k rows)	Run query on-the-fly
Medium	Create materialized view or cached table
Large/Real-time	Pre-aggregate in Spark/DuckDB, incremental refresh
Extreme	Stream + windowed aggregates (Flink/Kafka)

Step D — Validation

Run a tiny sample query first — confirm the shape matches the chosen chart type
If not, loop back and adjust aggregation

Example

User says "show monthly revenue by product category":

"I need: LEFT JOIN orders -> products -> categories; GROUP BY month, category; SUM(revenue). No view exists -> I will create temp table or run inline. Chart type: Stacked Area. Library recommendation below."

3. Library Selection Matrix

Always output the performance trade-off and recommended stack.

Scale / Requirement	Recommended Library	Why	Fallback
<10k points, simple web dashboard	Chart.js or Recharts	<10 ms render, ~60 KB bundle	N/A
10k-500k points, interactive	Apache ECharts or Plotly.js	Canvas + WebGL, 60 fps on 100k points	D3 (slower)
500k-10M+ points, real-time	LightningChart or Highcharts Stock + WebGL	GPU accelerated, <50 ms at 5M points	Anything SVG-based fails
Python backend + web	Plotly Dash or Bokeh	Server-side render + client streaming	Matplotlib (static only)
Python notebook exploration	Seaborn + Plotly	Instant, beautiful defaults	--
Extremely large / streaming	DuckDB + Observable Plot or Perspective	In-memory columnar, sub-second on billions	--
No JavaScript (PDF reports)	Matplotlib + WeasyPrint or ReportLab	Pure Python, vector output	--

Optimization Rules (apply automatically)

Downsample for overview, show full detail on zoom (ECharts built-in)
Use Canvas instead of SVG above ~5k elements
Pre-aggregate at DB level whenever possible (biggest single win)
Lazy load charts below the fold
Bundle size: tree-shake everything except the one chart type you need
GPU vs CPU: if >100k points and user needs pan/zoom, force WebGL path

4. Full Workflow

Parse intent — identify required chart type from user request
Schema scan — detect column types, cardinality, row estimates
Decision framework — output chart recommendation + rationale
Generate transforms — exact SQL/pandas/transform code needed
Choose library — select by performance tier based on row estimate
Emit deliverables:
- Chart spec (JSON for the library or React component)
- SQL/transform script
- Performance warning or confirmation
- Accessibility note + alt-text template

5. Advanced Capabilities

"Show me what I should be charting but aren't" — auto-correlation scan + suggested visuals
"Optimize this dashboard for 10x speed" — rewrite query + switch library
"Make this mobile-first" — auto-switch to horizontal bars + simplify
Color-blind & accessibility mode — toggle patterns, high contrast
Export — SVG/PNG/PDF with embedded data table