store

Installation
SKILL.md

Store

Help the user choose storage formats, locations, and publishing methods for their football data.

First: check profile

Read .nutmeg.user.md. If it doesn't exist, tell the user to run /nutmeg:init first.

Storage format decision tree

Small projects (< 100MB, single user)

Format Best for Tools
JSON Raw event data, API responses Any language
CSV Tabular stats, easy to share Spreadsheets, pandas, R
Parquet Columnar analytics, fast queries polars, pandas, DuckDB, Arrow
SQLite Relational queries, multiple tables Any language, DB browser tools

Recommendation: Start with JSON for raw data, Parquet for processed data.

Medium projects (100MB - 10GB)

Format Best for Notes
Parquet files Analytics workloads 5-10x smaller than JSON, fast columnar reads
DuckDB SQL analytics on local files Queries Parquet/CSV directly, no server needed
SQLite Relational data with joins Single file, portable, ACID compliant

Recommendation: Parquet for storage, DuckDB for querying.

Large projects (> 10GB, multiple users)

Solution Best for Cost
PostgreSQL Production apps, complex queries Free (self-hosted) or ~$7/mo (Railway, Supabase)
BigQuery Massive analytical queries Free tier: 1TB/mo queries
Cloudflare R2 Object storage (raw files) Free tier: 10GB storage
S3 / GCS Object storage at scale ~$0.023/GB/mo

Directory structure

Recommend this structure for football data projects:

project/
  data/
    raw/                  # Untouched API/scrape responses
      statsbomb/
        events/
        matches.json
      fbref/
        2024/
    processed/            # Cleaned, transformed data
      events.parquet
      shots.parquet
      passes.parquet
    derived/              # Computed metrics
      xg_model.parquet
      passing_networks/
  notebooks/              # Analysis notebooks
  scripts/                # Data pipeline scripts
  outputs/                # Charts, reports, exports
  .env                    # API keys (gitignored)
  .nutmeg.user.md         # Nutmeg profile

Publishing and sharing

Interactive dashboards

Platform Language Cost Notes
Streamlit Python Free (community cloud) Most popular for football analytics. Deploy from GitHub
Observable JavaScript Free tier Great for D3.js visualisations. Notebooks + Framework
Shiny R Free (shinyapps.io, 25 hrs/mo) R ecosystem integration
Gradio Python Free (HuggingFace Spaces) Quick ML model demos

Static sites

Platform Notes
GitHub Pages Free. Good for static charts (D3, matplotlib exports)
Cloudflare Pages Free. Faster, more features than GH Pages
Vercel Free tier. Good for Next.js/Astro sites

Sharing data

Method Best for
GitHub repo Small datasets (< 100MB), code + data together
GitHub Releases Larger files (up to 2GB per release)
Kaggle Datasets Community sharing, discoverable, free
HuggingFace Datasets ML-focused, versioned, free

Social media / content

Output Tool Notes
Static charts matplotlib, ggplot2, D3.js Export as PNG/SVG
Animated charts matplotlib.animation, D3 transitions Export as GIF/MP4
Twitter/X threads Chart images + alt text Accessibility matters
Blog posts Markdown + embedded charts GitHub Pages, Medium, Substack

Cost awareness

Based on the user's .nutmeg.user.md goals, flag costs:

  • Exploration/learning: Everything can be free. StatsBomb open data + Jupyter/Colab + GitHub Pages.
  • Content creation: Streamlit Community Cloud is free. Cloudflare Pages is free.
  • Professional: Budget for API access ($100-1000+/mo for Opta/StatsBomb commercial).
  • Product: Database hosting ($7-50/mo), consider data licensing costs separately.
Weekly Installs
1
GitHub Stars
17
First Seen
Mar 25, 2026