tsfm-forecast
TSFM Forecast Skill
Generates Python + DuckDB forecasting pipeline code using foundation models. Produces runnable code for local execution; does NOT run models.
When to Use This Skill
Activate when: Generating zero-shot time-series forecasting pipelines, selecting between TimesFM / Chronos / MOIRAI / Lag-Llama, preparing time-series data with DuckDB, building backtesting harnesses, comparing model accuracy, or producing client forecast deliverables.
Don't use for:
- ML model training or fine-tuning → use
python-data-engineering - Real-time or streaming forecasts → use
event-streaming - Scheduling or orchestrating forecast jobs → use
data-pipelines - Loading raw files into DuckDB without forecasting intent → use
duckdb
Scope Constraints
- Generates code only — does not execute models, run inference, or access data files.
- Local execution model: all generated code targets the user's machine; no cloud deployment scaffolding unless explicitly requested.
- Security tier default: Tier 1 (schema/metadata only). User must explicitly elevate to Tier 2 (sample data) or Tier 3 (full data access).
- Reference files loaded one at a time — never pre-load multiple references simultaneously.
- No cross-references to other specialist skills; use handoff protocol for adjacent work.
Model Routing
| reasoning_demand | preferred | acceptable | minimum |
|---|---|---|---|
| medium | Sonnet | Sonnet, Opus | Sonnet |
Core Principles
- Code-only — Generate self-contained, runnable Python + DuckDB code. Never execute models or read actual data.
- Foundation-model-first — Default to zero-shot TSFM inference; only recommend fine-tuning when dataset is large and domain is highly specialized.
- DuckDB-native — Use DuckDB SQL for all data preparation, gap detection, resampling, and export steps.
- Evaluation-driven — Always include a seasonal naive baseline and MASE metric; a model that doesn't beat naive doesn't justify deployment.
- Handoff-ready — Generate outputs (
forecasts.parquet) in standardized shape so downstream skills (data-pipelines, client-delivery) can consume them directly.
Model Selection Quick Reference
| Signal | Recommended Model | Why |
|---|---|---|
| Long horizon (>100 steps), univariate | TimesFM 2.5 | 16K context window, direct multi-horizon |
| Need prediction intervals, CPU-only | Chronos-Bolt-Small | Probabilistic, < 1 GB VRAM, fast |
| Multivariate + external regressors | MOIRAI 2.0 | Only TSFM with native covariate support |
| Intermittent/sparse demand (many zeros) | Lag-Llama | Normalizing flow handles zero-inflation |
Load model-registry.md for: full matrix, hardware requirements, installation commands, and known limitations.
Procedure
Progress checklist (tick off as you complete each step):
- 1. Classify request mode
- 2. Select model
- 3. Generate data prep code
- 4. Generate inference code
- 5. Generate evaluation code (if eval/comparison mode)
- 6. Generate deliverable (if report mode)
Compaction recovery: If context is compressed mid-procedure, re-read this SKILL.md to restore context. Check which checklist items are complete, then resume from the next unchecked step.
Step 1 — Classify request mode
Determine which mode applies before generating any code:
- Pipeline mode: User wants end-to-end forecast code (data prep → inference → output). Default mode.
- Eval/comparison mode: User wants to compare models or backtest accuracy. Add evaluation step.
- Report mode: User wants a client-ready deliverable. Add deliverable step after evaluation.
- Profile mode: User wants to assess data before selecting a model. Run ts_profiler.py recommendation first.
Step 2 — Select model
Use the Quick Reference table above for default selection. Ask if ambiguous:
- How many series? (1 → any model; 50+ → Chronos or MOIRAI for batch efficiency)
- Need prediction intervals? (yes → Chronos or Lag-Llama)
- Any external regressors (promotions, holidays)? (yes → MOIRAI 2.0)
- Hardware constraints? (CPU-only → Chronos-Bolt-Small)
Load model-registry.md if user needs detailed comparison or hardware guidance.
Step 3 — Generate data preparation code
Load data-prep-patterns.md and generate DuckDB SQL for:
- Timestamp normalization →
dscolumn (UTC, datetime64) - Gap detection → report missing step count before filling
- Resampling to target frequency (SUM for demand, AVG for rate)
- Null/outlier handling (forward-fill + z-score flagging)
- Parquet export →
output/ts_ready.parquetwithunique_id | ds | yschema
Always confirm unique_id, ds, y column names match source data or generate a rename step.
Step 4 — Generate inference code
Load inference-templates.md and generate Python code for the selected model:
- Imports and model loading (with device detection: CUDA → MPS → CPU)
- Context window setup (load from
output/ts_ready.parquetvia pandas) forecast()call with horizon and quantile levels- Output normalization to standard shape:
unique_id | ds | y_hat | y_hat_lower | y_hat_upper - Export →
output/forecasts.parquet
Unload inference-templates.md after generating code before loading evaluation-metrics.md.
Step 5 — Generate evaluation code (eval/comparison mode only)
Load evaluation-metrics.md and generate:
- Temporal train/test split (test window = horizon × 3)
- Seasonal naive baseline for the target frequency
- MASE, SMAPE, RMSE calculation
- Coverage rate if model outputs prediction intervals
- Pass/Fail determination: MASE < 1.0 = beats naive
Unload evaluation-metrics.md after generating code.
Step 6 — Generate deliverable (report mode only)
Load deliverable-templates.md and generate:
- Markdown report structure with metric placeholders
- Executive summary with forecast findings
- Accuracy comparison table (model vs. seasonal naive)
- Assumptions and limitations section (include verbatim)
Security Posture
SECURITY: This skill generates code for local execution only. No network calls, credential access, or data file reads are performed by Claude. Generated scripts read local files via DuckDB — validate file paths before execution.
See Security & Compliance Patterns for the full framework. See Consulting Security Tier Model for tier definitions.
| Capability | Tier 1 (Default) | Tier 2 (Sampled) | Tier 3 (Full Access) |
|---|---|---|---|
| Data profiling | Column types / shape only | Stats on sample | Full profiling |
| Data prep code | Generate SQL (not execute) | Execute on samples | Execute on full data |
| Inference code | Generate only | Generate + run on samples | Generate + run on full data |
| Deliverables | Template with placeholders | Populated from samples | Fully populated |
Input Sanitization
When user provides file paths for ts_profiler.py or data prep scripts:
- Validate path exists and is within the working directory before use in generated code.
- Reject paths containing
..,~, environment variables, or shell metacharacters. - Never interpolate user-provided strings directly into shell commands.
Reference Files
Reference files loaded on demand — one at a time:
- model-registry.md — Full model comparison matrix, hardware requirements, installation commands, known limitations. Load at Step 2 if Quick Reference is insufficient.
- data-prep-patterns.md — DuckDB SQL patterns for timestamp normalization, gap detection, resampling, null handling, Parquet export. Load at Step 3.
- inference-templates.md — Python inference code for TimesFM, Chronos, MOIRAI, Lag-Llama, output normalization. Load at Step 4.
- evaluation-metrics.md — Backtesting protocol, MASE/SMAPE/RMSE/CRPS formulas, naive baseline, evaluation code patterns. Load at Step 5 (eval/comparison mode only).
- deliverable-templates.md — Executive summary, accuracy table, assumptions and limitations templates. Load at Step 6 (report mode only).
Handoffs
- Scheduling generated forecasts → data-pipelines (Dagster assets or Airflow DAGs wrapping inference scripts)
- Client engagement setup → client-delivery (engagement scaffolding, security tier selection, client handoff)
- DuckDB data preparation → duckdb (local data exploration, file ingestion, profiling before forecasting)
More from dtsong/data-engineering-skills
data-observability
Use this skill when implementing monitoring, alerting, and incident response for data pipelines. Covers freshness monitoring, volume anomaly detection, schema change detection, alerting patterns, and incident response workflows. Common phrases: \"data freshness\", \"pipeline monitoring\", \"data anomaly\", \"schema drift\", \"data alerting\", \"incident response\", \"data observability\", \"stale data\". Do NOT use for writing dbt models (use dbt-transforms), pipeline scheduling (use data-pipelines), or data quality testing as deliverables (use data-testing).
3duckdb
Use this skill when working with DuckDB for local data analysis, file ingestion, or data exploration. Covers reading CSV/Excel/Parquet/JSON files into DuckDB, SQL analytics on local data, data profiling, cleaning transformations, and export to various formats. Common phrases: \"analyze this CSV\", \"DuckDB query\", \"local data analysis\", \"read Excel in SQL\", \"profile this data\". Do NOT use for dbt model building (use dbt-transforms with DuckDB adapter) or cloud warehouse administration.
2data-governance
Use this skill when implementing data governance as part of engineering work. Covers data cataloging (dbt docs, external tools), lineage documentation, data classification (PII/PHI taxonomy), access control patterns (RBAC, row-level security), and compliance frameworks (GDPR, HIPAA, SOX, CCPA). Common phrases: \"data catalog\", \"data lineage\", \"PII classification\", \"access control\", \"RBAC\", \"data governance\", \"compliance requirements\". Do NOT use for writing dbt models (use dbt-transforms), pipeline orchestration (use data-pipelines), or data quality testing (use data-testing).
2dlt-extract
Use this skill when building DLT pipelines for file-based or consulting data extraction. Covers Excel/CSV/SharePoint ingestion via DLT, destination swapping (DuckDB dev to warehouse prod), schema contracts for cleaning, and portable pipeline patterns. Common phrases: \"dlt pipeline for files\", \"extract Excel with dlt\", \"portable data pipeline\", \"dlt filesystem source\". Do NOT use for core DLT concepts like REST API or SQL database sources (use data-integration) or pipeline scheduling (use data-pipelines).
2data-testing
Use this skill when designing testing strategies for data pipelines, writing SQL assertions, validating pipeline output, or packaging tests as client deliverables. Covers dbt test patterns, pipeline validation, SQL assertion libraries, test coverage targets, and test-as-deliverable packaging. Common phrases: \"data testing strategy\", \"pipeline validation\", \"SQL assertions\", \"test coverage\", \"test as deliverable\", \"data quality tests\". Do NOT use for writing dbt models (use dbt-transforms), DuckDB analytical queries (use duckdb), or pipeline scheduling (use data-pipelines).
2event-streaming
Use this skill when building real-time or near-real-time data pipelines. Covers Kafka, Flink, Spark Streaming, Snowpipe, BigQuery streaming, materialized views, and batch-vs-streaming decisions. Common phrases: \"real-time pipeline\", \"Kafka consumer\", \"streaming vs batch\", \"low latency ingestion\". Do NOT use for batch integration patterns (use data-integration) or pipeline orchestration (use data-pipelines).
2