Heal

Diagnose and fix broken football data pipelines. When a scraper or API call fails, figure out why and either fix it locally or report upstream.

First: check profile

Read .nutmeg.user.md. If it doesn't exist, tell the user to run /nutmeg:init first.

Diagnosis process

1. Identify the failure

Ask the user for the error message or behaviour. Common categories:

Symptom	Likely cause
HTTP 403/429	Rate limited or blocked. Wait and retry with backoff
HTTP 404	URL/endpoint changed. Check if site restructured
Parse error (HTML)	Website redesigned. Scraper selectors need updating
Parse error (JSON)	API response schema changed. Check for versioning
Empty response	Data not available for this competition/season
Import error	Library version changed. Check changelog
Authentication error	Key expired, rotated, or wrong format

2. Investigate

Check if the issue is local (user's code) or upstream (provider/library change)
For web scrapers: fetch the page and compare HTML structure to what the scraper expects
For APIs: make a minimal test request to verify the endpoint still works
For libraries: check the library's GitHub issues and recent commits

3. Fix strategies

If it's a local issue:

Fix the code directly
Update selectors, URLs, or parsing logic
Add error handling and retry logic

If it's an upstream issue (library bug):

Check if there's already an open issue on the library's repo
If not, help the user write a clear bug report:
- Library name and version
- Minimal reproduction steps
- Expected vs actual behaviour
- Error traceback
If the fix is straightforward, help write a PR:
- Fork the repo
- Make the fix on a branch
- Write a clear PR description

If it's a provider change (API/website):

Document what changed
Update the local code to handle the new format
If using a scraping library, submit an issue to that library

Self-healing patterns

When writing data acquisition code via /nutmeg:acquire, build in resilience:

# Retry with exponential backoff
import time

def fetch_with_retry(url, max_retries=3):
    for attempt in range(max_retries):
        try:
            resp = requests.get(url, timeout=30)
            resp.raise_for_status()
            return resp.json()
        except requests.RequestException as e:
            if attempt == max_retries - 1:
                raise
            wait = 2 ** attempt
            print(f"Attempt {attempt + 1} failed, retrying in {wait}s: {e}")
            time.sleep(wait)

Common fixes by source

Source	Common issue	Fix
FBref	429 rate limit	Add 6s delay between requests
WhoScored	Cloudflare blocks	Use headed browser (Playwright)
Understat	JSON parse error	Response is JSONP, strip callback wrapper
SportMonks	401	Token expired or plan limit hit
StatsBomb open data	404	Match/competition not in open dataset