Acquire

Help the user get football data from any source into their local environment.

First: check profile

Read .nutmeg.user.md. If it doesn't exist, tell the user to run /nutmeg:init first. Use their profile to determine preferred language and available providers.

Decision tree

When the user asks for data, determine the best source:

1. What data do they need?

Need	Best free source	Best paid source
Match events (pass-by-pass)	StatsBomb open data	Opta, StatsBomb API, Wyscout
Season stats (aggregates)	FBref	SportMonks
xG / shot data	Understat, StatsBomb open	Opta (matchexpectedgoals), StatsBomb API
Tracking data (player positions)	None free	Second Spectrum, SkillCorner, Tracab
Historical results	football-data.co.uk	SportMonks
Elo ratings	ClubElo (free API)	-
Player valuations	Transfermarkt (scraping)	-

2. Write acquisition code

Adapt to the user's language preference from .nutmeg.user.md.

Python patterns:

# StatsBomb open data
from statsbombpy import sb
events = sb.events(match_id=3788741)

# FBref via soccerdata
import soccerdata as sd
fbref = sd.FBref('ENG-Premier League', '2024')
stats = fbref.read_team_season_stats()

# Understat via soccerdata
understat = sd.Understat('ENG-Premier League', '2024')
shots = understat.read_shot_events()

R patterns:

# StatsBomb
library(StatsBombR)
events <- get.matchFree(Matches) %>% allclean()

# FBref
library(worldfootballR)
stats <- fb_season_team_stats("ENG", "M", 2024, "standard")

JavaScript/TypeScript:

// StatsBomb open data (direct from GitHub)
const resp = await fetch('https://raw.githubusercontent.com/statsbomb/open-data/master/data/events/{match_id}.json');
const events = await resp.json();

3. Data validation

After acquiring data, always:

Check row/event counts are sensible (PL match should have ~1500-2000 events)
Verify key fields are present (coordinates, player IDs, timestamps)
Check for missing data (some providers have gaps for certain competitions)
Warn about coordinate system differences if combining sources

Self-discovery

If the user asks for data from an unfamiliar source:

Search the football-docs index: search_docs(query="[source name]")
If not found, search the web for "[source] football data API" or "[source] football dataset"
Evaluate: is it free? What format? What coverage? Any rate limits?
Guide the user through access

Caching

Always recommend caching fetched data locally:

API responses: save as JSON files with metadata (fetch date, parameters)
Scraped data: save with timestamps so stale data is identifiable
Suggest a directory structure: data/{source}/{competition}/{season}/

Rate limiting

Remind users about rate limits:

FBref: 10 requests/minute recommended
Understat: no official limit but be respectful
SportMonks: varies by plan (check with /nutmeg:credentials)
StatsBomb open data: no limit (static files on GitHub)

acquire