tsfm-forecast

Installation

SKILL.md

TSFM Forecast Skill

Generates Python + DuckDB forecasting pipeline code using foundation models. Produces runnable code for local execution; does NOT run models.

When to Use This Skill

Activate when: Generating zero-shot time-series forecasting pipelines, selecting between TimesFM / Chronos / MOIRAI / Lag-Llama, preparing time-series data with DuckDB, building backtesting harnesses, comparing model accuracy, or producing client forecast deliverables.

Don't use for:

ML model training or fine-tuning → use python-data-engineering
Real-time or streaming forecasts → use event-streaming
Scheduling or orchestrating forecast jobs → use data-pipelines
Loading raw files into DuckDB without forecasting intent → use duckdb

Scope Constraints

Generates code only — does not execute models, run inference, or access data files.
Local execution model: all generated code targets the user's machine; no cloud deployment scaffolding unless explicitly requested.
Security tier default: Tier 1 (schema/metadata only). User must explicitly elevate to Tier 2 (sample data) or Tier 3 (full data access).
Reference files loaded one at a time — never pre-load multiple references simultaneously.
No cross-references to other specialist skills; use handoff protocol for adjacent work.

Model Routing

reasoning_demand	preferred	acceptable	minimum
medium	Sonnet	Sonnet, Opus	Sonnet

Core Principles

Code-only — Generate self-contained, runnable Python + DuckDB code. Never execute models or read actual data.
Foundation-model-first — Default to zero-shot TSFM inference; only recommend fine-tuning when dataset is large and domain is highly specialized.
DuckDB-native — Use DuckDB SQL for all data preparation, gap detection, resampling, and export steps.
Evaluation-driven — Always include a seasonal naive baseline and MASE metric; a model that doesn't beat naive doesn't justify deployment.
Handoff-ready — Generate outputs (forecasts.parquet) in standardized shape so downstream skills (data-pipelines, client-delivery) can consume them directly.

Model Selection Quick Reference

Signal	Recommended Model	Why
Long horizon (>100 steps), univariate	TimesFM 2.5	16K context window, direct multi-horizon
Need prediction intervals, CPU-only	Chronos-Bolt-Small	Probabilistic, < 1 GB VRAM, fast
Multivariate + external regressors	MOIRAI 2.0	Only TSFM with native covariate support
Intermittent/sparse demand (many zeros)	Lag-Llama	Normalizing flow handles zero-inflation

Load model-registry.md for: full matrix, hardware requirements, installation commands, and known limitations.

Procedure

Progress checklist (tick off as you complete each step):

1. Classify request mode

2. Select model

3. Generate data prep code

4. Generate inference code

5. Generate evaluation code (if eval/comparison mode)

6. Generate deliverable (if report mode)

Compaction recovery: If context is compressed mid-procedure, re-read this SKILL.md to restore context. Check which checklist items are complete, then resume from the next unchecked step.

Step 1 — Classify request mode

Determine which mode applies before generating any code:

Pipeline mode: User wants end-to-end forecast code (data prep → inference → output). Default mode.
Eval/comparison mode: User wants to compare models or backtest accuracy. Add evaluation step.
Report mode: User wants a client-ready deliverable. Add deliverable step after evaluation.
Profile mode: User wants to assess data before selecting a model. Run ts_profiler.py recommendation first.

Step 2 — Select model

Use the Quick Reference table above for default selection. Ask if ambiguous:

How many series? (1 → any model; 50+ → Chronos or MOIRAI for batch efficiency)
Need prediction intervals? (yes → Chronos or Lag-Llama)
Any external regressors (promotions, holidays)? (yes → MOIRAI 2.0)
Hardware constraints? (CPU-only → Chronos-Bolt-Small)

Load model-registry.md if user needs detailed comparison or hardware guidance.

Step 3 — Generate data preparation code

Load data-prep-patterns.md and generate DuckDB SQL for:

Timestamp normalization → ds column (UTC, datetime64)
Gap detection → report missing step count before filling
Resampling to target frequency (SUM for demand, AVG for rate)
Null/outlier handling (forward-fill + z-score flagging)
Parquet export → output/ts_ready.parquet with unique_id | ds | y schema

Always confirm unique_id, ds, y column names match source data or generate a rename step.

Step 4 — Generate inference code

Load inference-templates.md and generate Python code for the selected model:

Imports and model loading (with device detection: CUDA → MPS → CPU)
Context window setup (load from output/ts_ready.parquet via pandas)
forecast() call with horizon and quantile levels
Output normalization to standard shape: unique_id | ds | y_hat | y_hat_lower | y_hat_upper
Export → output/forecasts.parquet

Unload inference-templates.md after generating code before loading evaluation-metrics.md.

Step 5 — Generate evaluation code (eval/comparison mode only)

Load evaluation-metrics.md and generate:

Temporal train/test split (test window = horizon × 3)
Seasonal naive baseline for the target frequency
MASE, SMAPE, RMSE calculation
Coverage rate if model outputs prediction intervals
Pass/Fail determination: MASE < 1.0 = beats naive

Unload evaluation-metrics.md after generating code.

Step 6 — Generate deliverable (report mode only)

Load deliverable-templates.md and generate:

Markdown report structure with metric placeholders
Executive summary with forecast findings
Accuracy comparison table (model vs. seasonal naive)
Assumptions and limitations section (include verbatim)

Security Posture

SECURITY: This skill generates code for local execution only. No network calls, credential access, or data file reads are performed by Claude. Generated scripts read local files via DuckDB — validate file paths before execution.

See Security & Compliance Patterns for the full framework. See Consulting Security Tier Model for tier definitions.

Capability	Tier 1 (Default)	Tier 2 (Sampled)	Tier 3 (Full Access)
Data profiling	Column types / shape only	Stats on sample	Full profiling
Data prep code	Generate SQL (not execute)	Execute on samples	Execute on full data
Inference code	Generate only	Generate + run on samples	Generate + run on full data
Deliverables	Template with placeholders	Populated from samples	Fully populated

Input Sanitization

When user provides file paths for ts_profiler.py or data prep scripts:

Validate path exists and is within the working directory before use in generated code.
Reject paths containing .., ~, environment variables, or shell metacharacters.
Never interpolate user-provided strings directly into shell commands.

Reference Files

Reference files loaded on demand — one at a time:

model-registry.md — Full model comparison matrix, hardware requirements, installation commands, known limitations. Load at Step 2 if Quick Reference is insufficient.
data-prep-patterns.md — DuckDB SQL patterns for timestamp normalization, gap detection, resampling, null handling, Parquet export. Load at Step 3.
inference-templates.md — Python inference code for TimesFM, Chronos, MOIRAI, Lag-Llama, output normalization. Load at Step 4.
evaluation-metrics.md — Backtesting protocol, MASE/SMAPE/RMSE/CRPS formulas, naive baseline, evaluation code patterns. Load at Step 5 (eval/comparison mode only).
deliverable-templates.md — Executive summary, accuracy table, assumptions and limitations templates. Load at Step 6 (report mode only).

Handoffs

Scheduling generated forecasts → data-pipelines (Dagster assets or Airflow DAGs wrapping inference scripts)
Client engagement setup → client-delivery (engagement scaffolding, security tier selection, client handoff)
DuckDB data preparation → duckdb (local data exploration, file ingestion, profiling before forecasting)

Related skills

More from dtsong/data-engineering-skills

Installs

Repository

dtsong/data-eng…g-skills

GitHub Stars

First Seen

Mar 10, 2026

Security Audits

Gen Agent Trust HubPass

SocketFail

SnykPass