Storyteller Engine
Purpose
Storyteller Engine transforms raw business data into compelling, human-readable narratives for executive reports, marketing materials, and data journalism. Real-world applications include:
- Quarterly Business Reviews: Convert sales metrics, churn rates, and revenue data into executive summary narratives with trend analysis
- Customer Insights: Transform NPS scores, survey responses, and demographic data into customer persona stories
- Operational Reports: Generate daily/weekly operational summaries from incident logs, system metrics, and performance data
- Financial Narratives: Create earnings call prep documents from financial statements and KPI dashboards
- Marketing Performance: Turn campaign analytics (CTR, conversion rates, ROI) into stakeholder-ready performance stories
Scope
Core Commands
storyteller generate [OPTIONS]
Generates narrative from input data file. Options:
--input PATH: Path to CSV/JSON/Excel file (required)--template NAME: Template name from template library or path to custom Jinja2 template--output PATH: Output file path (default: stdout)--format {markdown,html,pdf,txt}: Output format (default: markdown)--tone {professional,casual,narrative,technical}: Writing style (default: professional)--context TEXT: Additional context to include in prompt--max-length INT: Maximum word count (default: 1000)--temperature FLOAT: AI creativity 0.0-2.0 (default: 0.7)--role {executive,marketing,technical,general}: Audience persona
storyteller validate [OPTIONS]
Validates input data against schema requirements. Options:
--input PATH: Data file to validate--schema PATH: JSON Schema definition (default: auto-detect)--strict: Fail on missing optional fields
storyteller templates list
Lists available built-in templates with metadata.
storyteller templates show NAME
Displays template content and required fields.
storyteller config set KEY VALUE
Sets configuration in ~/.storyteller/config.yaml
storyteller --version
Shows version and active model configuration.
Detailed Work Process
1. Data Preparation
Input data must be structured (CSV, JSON, or Excel) with column headers matching template field requirements. For a sales narrative template requiring quarter, revenue, growth_rate, region, your CSV must contain these exact column names.
Example data file (sales_data.csv):
quarter,revenue,growth_rate,region
Q1-2023,1250000,0.12,North America
Q2-2023,1380000,0.104,North America
2. Template Selection
Built-in templates are stored in ~/.storyteller/templates/ or /usr/local/share/storyteller/templates/. Custom templates are Jinja2 files with special {{ variable }} placeholders corresponding to data columns.
Template structure:
# {{ title }}
## {{ subtitle }}
**Period**: {{ quarter }}
Revenue performance for {{ region }} reached ${{ revenue }} in {{ quarter }}, representing a {{ growth_rate|percent }} increase over the previous period.
3. Execution
The engine:
- Loads and validates input data using Pydantic schemas
- Renders the Jinja2 template with data rows (aggregates if multiple rows)
- Constructs an OpenAI prompt with template output + context + tone instructions
- Calls the OpenAI API with configured model and temperature
- Post-processes response (truncation, formatting)
- Returns final narrative
4. Output Formats
- Markdown: Clean, formatted text (default)
- HTML: Includes basic styling and can be converted to PDF via
pandoc - PDF: Requires
pandocandwkhtmltopdfinstalled - Text: Plain text, stripped of formatting
Golden Rules
-
Schema Alignment: Column headers in input data must match field names in template. Case-sensitive. Use
storyteller validatebefore generation. -
Token Budgeting: Large datasets cause token overflow. Filter data beforehand or use templates with aggregation filters like
{{ revenue|sum }}. -
Context Injection: The
--contextflag adds crucial background the AI needs. Without it, narratives are generic. Example:--context "Target audience: C-level executives. Focus on ROI and strategic implications." -
Temperature Discipline:
0.0-0.3: Consistent, repetitive (good for standardized reports)0.4-0.7: Balanced creativity (recommended)0.8-1.2: Varied, engaging (use with context)>1.2: Unpredictable, may hallucinate (avoid for production)
-
Template Testing: Always test templates with
storyteller templates show TEMPLATEandstoryteller validatebefore full generation. -
Rate Limiting: The engine respects OpenAI rate limits automatically but add
--batch-size Nfor large files to process in chunks. -
API Key Management: Never pass
OPENAI_API_KEYin command line. Use environment variable orstoryteller config set openai_key.
Examples
Example 1: Executive Sales Summary
STORYTELLER_MODEL=gpt-4-turbo storyteller generate \
--input q4_sales.csv \
--template executive_summary \
--output exec_summary.md \
--tone professional \
--role executive \
--context "Year-end report for board meeting. Emphasize YoY growth and regional highlights. Include forward-looking statement." \
--max-length 800
Input (q4_sales.csv):
quarter,revenue,growth_rate,region,product_category
Q4-2023,4500000,0.18,North America,Enterprise
Q4-2023,3200000,0.12,Europe,SMB
Output (exec_summary.md):
# Q4 2023 Executive Summary
North America led revenue performance with $4.5M in Q4, achieving 18% growth driven by Enterprise segment expansion. Europe showed steady 12% growth totaling $3.2M, with SMB category outperforming expectations. Overall quarterly revenue surpassed targets by 7%, positioning the company for strong Q1 2024. Recommended focus: capitalize on North American Enterprise momentum and replicate success in European markets.
Example 2: Customer Survey Narrative
storyteller generate \
--input survey_responses.json \
--template customer_persona \
--format markdown \
--tone narrative \
--context "Create 3 distinct customer personas based on satisfaction scores and feedback. Use vivid language." \
--max-length 1500
Input (survey_responses.json):
[
{"segment": "Enterprise", "nps": 9, "feedback": "Reliable, scales well", "usage_months": 24},
{"segment": "Startup", "nps": 7, "feedback": "Affordable but limited features", "usage_months": 3}
]
Output (excerpt):
## Persona: The Scaling Enterprise
"The platform has become integral to our infrastructure—it simply works at scale." This Enterprise customer, with two years of usage, awarded a perfect NPS of 9. Their feedback highlights reliability as the key driver...
Example 3: Operations Log to Daily Report
storyteller generate \
--input incidents_2024-03-07.csv \
--template ops_daily \
--output daily_ops.html \
--format html \
--tone technical \
--role engineering \
--context "Audience: engineering leadership. Include severity breakdown and MTTR." \
--max-length 1200
Input (incidents_2024-03-07.csv):
timestamp,severity,service,resolution_time_minutes,description
2024-03-07 02:15:00,critical,auth-service,45,TLS cert expired
2024-03-07 08:30:00,high,api-gateway,12,Rate limiter misconfiguration
Output (daily_ops.html):
<h1>Daily Operations Report: March 7, 2024</h1>
<p><strong>Total Incidents:</strong> 2 (1 critical, 1 high)</p>
<p>The day began with a critical outage in the authentication service at 02:15 UTC caused by an expired TLS certificate. The issue was resolved within 45 minutes. A secondary high-severity incident in the API gateway at 08:30 UTC stemmed from a rate limiter misconfiguration and was contained within 12 minutes. <strong>Mean Time to Resolution (MTTR):</strong> 28.5 minutes.</p>
Rollback Commands
If generated output is unsatisfactory or corrupted:
storyteller rollback last
Reverts the most recent generate operation by restoring the previous output file from backup (.storyteller-backup-<timestamp>.md).
storyteller rollback --output PATH
Specifically rolls back the output at PATH if it was generated by storyteller (checks embedded metadata footer).
storyteller rollback --all --date YYYY-MM-DD
Mass rollback: removes all outputs generated on the specified date and restores from backups if available.
Note: Rollback works only if --output was used (not stdout). Backups are automatically created during generation unless --no-backup is passed.
Dependencies & Requirements
- Python: 3.9+ with pip
- OpenAI API: Valid API key with GPT-4 or GPT-3.5-turbo access
- Data Files: CSV must be UTF-8 encoded; JSON must be array of objects; Excel requires
openpyxl - Templates: Jinja2 syntax; special filters available:
|percent,|currency,|titlecase,|aggregate:sum/avg/count - Disk Space: ~100MB for installation; templates typically <50KB each
- Network: Outbound HTTPS to OpenAI API (status.openai.com check on startup)
Installation:
pip install storyteller-engine[full]
# for CSV/Excel support:
pip install storyteller-engine[data]
Configuration:
storyteller config set openai_key sk-...
storyteller config set default_model gpt-4-turbo
storyteller config set template_dir ~/.mytemplates
Verification Steps
After generation, verify with:
# Check output file is not empty and has expected keywords
grep -q "revenue" exec_summary.md && echo "PASS" || echo "FAIL"
# Validate against template field coverage
storyteller validate --input q4_sales.csv --template executive_summary
# Token usage estimation (dry-run)
storyteller generate --input data.csv --template temp --dry-run
# Shows prompt tokens, max completion tokens, estimated cost
# Schema check only
storyteller validate --input data.csv --strict
Expected exit codes:
0: Success (narrative generated and passed quality checks)1: Validation failed (data doesn't match template)2: API error (OpenAI unreachable or quota exceeded)3: Template error (syntax error or missing required variable)4: Output already exists (use--forceto overwrite)
Troubleshooting
Error: "Template variable 'revenue' not found in data"
- Cause: Column header mismatch between CSV and template.
- Fix: Ensure CSV column names exactly match template
{{ revenue }}. Usestoryteller templates show NAMEto see required fields.
Error: "Context length exceeded"
- Cause: Input data too large for model's context window.
- Fix: Filter data to essential rows; use aggregation in template (
{{ revenue|sum }}instead of iterating). Or use--max-lengthto reduce output size.
Output Generic / Hallucinated
- Cause:
--contexttoo vague or missing; temperature too high. - Fix: Provide specific context with
--context "Focus on Q4 2023, exclude Q3 comparisons". Lower--temperatureto 0.4-0.6.
Rate Limit Errors
- Cause: Too many rapid requests.
- Fix: Add
--batch-size 5to process 5 rows at a time. Or implement exponential backoff withstoryteller config set retry_policy exponential.
No Output File Created
- Cause: Output path directory doesn't exist.
- Fix: Create directory first:
mkdir -p $(dirname output.md). Or use absolute path.
Rollback Not Working
- Cause: Backup file missing (was
--no-backupused or file manually deleted?). - Fix: Rollbacks require backup files. Regular cleanup removes backups older than 30 days. Restore from version control if available.
Validation Fails on Optional Field
- Cause: Template uses
{{ optional_field }}but data missing column. - Fix: Either add column to data or modify template to use
{% if optional_field %}...{% endif %}. Use--strictonly for required fields.