Email Whiz

Gmail inbox management via MCP. Parallel-first, large-inbox optimized.

Canonical Vocabulary

Term	Definition
5-bucket	DO / DELEGATE / DEFER / REFERENCE / NOISE — five triage buckets (historically "4D+N")
inbox zero	State where inbox contains only items requiring action today
fast-lane	Sender/subject-only classification without reading email content
security gate	Subject-keyword scan of NOISE before archiving (catches 2FA, password resets)
session cache	Local cache of Phase 0 results (1h TTL) at `~/.{gemini
tier	Inbox size class: Small (<50) / Medium (50-500) / Large (500-5k) / Massive (5k+)
quick	Modifier prefix that skips Phase 0 for lightweight modes
combo	Chain two modes sharing discovered state: `<mode> + <mode>`
mega-wave	Single message with 10-15 independent tool calls bundled together
baseline	Rolling 30-day average inbox count; bankruptcy triggers are relative to this
auto-rule	Gmail filter created from behavioral pattern analysis
VIP	Sender who receives consistent replies and high-priority treatment
engagement	Reply rate to a sender: replies sent / emails received
confidence	HIGH >80% / MEDIUM 50-80% / LOW <50% — governs auto-rule actions
memory	Persistent user preferences and patterns at `~/.{gemini
correction	User disagreement with auto-classification, stored as a memory override
staleness	Memory entry not confirmed/used beyond threshold (VIP: 90d, override: 60d)

Dispatch

`$ARGUMENTS`	Action
`triage`	Categorize inbox with fast-lane + 5-bucket, batch process by category
`inbox-zero`	Daily/weekly routine, progress tracking, baseline bankruptcy detection
`filters`	Pattern-detected filter suggestions with confidence scoring
`auto-rules`	Behavioral analysis → auto-create Gmail filters from templates
`analytics`	Volume trends, response times, sender frequency, timing patterns
`newsletters`	Subscription audit + unsubscribe plan
`labels`	Taxonomy analysis, merge/rename/delete recommendations
`search <goal>`	Natural language → Gmail search query
`senders`	VIP + noise identification
`digest`	Recent important email summary
`cleanup`	Archive candidates, duplicate detection, batch archiving
`audit`	Full inbox health report with score and action plan
`quick search` or `quick digest`	Skip Phase 0, run mode directly. Only search and digest are quick-compatible
`<mode> + <mode>`	Chain modes, sharing discovered state (triage + filters)
(empty)	Auto-scan: run Phase 0 + account analysis, present prioritized action plan

Classification gate: If $ARGUMENTS could match multiple modes, ask which the user wants before proceeding.

Hybrid Mode Protocol

Read operations — execute immediately: gmail_search_emails, gmail_read_email, gmail_list_email_labels, gmail_list_filters, gmail_get_filter

Write operations — confirm before executing: Single: gmail_modify_email, gmail_delete_email, gmail_create_label Batch: gmail_batch_modify_emails, gmail_batch_delete_emails Labels: gmail_update_label, gmail_delete_label, gmail_get_or_create_label Filters: gmail_create_filter, gmail_create_filter_from_template, gmail_delete_filter

Email deletion is FORBIDDEN unless the user explicitly requests it. Never call gmail_delete_email or gmail_batch_delete_emails on your own initiative. Always archive instead. Only use delete tools when the user directly says "delete" — and even then, use the Destructive Warning template (TYPE "DELETE" confirmation) first.

Confirmation format: See templates.md. Always show scope, count, and reversibility before write ops.

Parallelization Rules

Independent tool calls go in ONE message. Sequential independent calls is a bug.
Phase 0 fuses with mode's first queries — never 2 sequential rounds when fusion is possible.
Batch ops on disjoint ID sets: parallel. NOISE + REFERENCE + DEFER = 1 message, 3 calls.
Analytics/Audit: mega-wave — 10-15 search queries in 1 message.
gmail_read_email: batch 5-8 per message (full messages are large).
Filter creation: parallel after conflict check passes.
Combo mode boundary: sequential (data dependency). But skip Phase 0 for mode 2.
Max 15-20 calls per message. Beyond that, a single 429 impacts the entire bundle.
On 429: preserve completed results, drop to sequential with 2s gaps. 3x429 in 60s → pause 60s.

Details: efficiency-guide.md § Parallel Call Map.

Phase -1: Memory Load (skip with `quick` prefix)

Load long-term user memory before discovery. Runs once per session.

!uv run python scripts/memory.py load --mode <current_mode>

Fuse with Phase 0 cache check in the SAME message (both are local file reads, zero API calls):

!uv run python scripts/memory.py load --mode triage
!uv run python scripts/inbox_snapshot.py cache load

If memory exists, integrate into session:

VIP senders → fast-lane DO classification (Pass 1)
Noise senders → fast-lane NOISE classification (Pass 1)
Corrections → highest priority: override any conflicting fast-lane rule
Triage overrides → apply before generic decision tree
Filter history → skip suggesting previously rejected filters
Label prefs → use preferred naming when creating labels
Inbox patterns → adjust expectations in analytics/audit

Run !uv run python scripts/memory.py prune once per session if memory exists. Report pruned entries only if count > 0.

Phase 0: Discovery (skip with `quick` prefix)

Prerequisite: Run Phase -1 (Memory Load) first unless in quick mode.

0a: Session Cache Check

!uv run python scripts/inbox_snapshot.py cache load If valid (< 1h old) → use cached labels, filters, tier. Jump to mode.

0b: Parallel Discovery + Mode Fusion (cache miss)

Issue Phase 0 calls AND mode's first queries in a SINGLE message:

Core discovery (always):

gmail_list_email_labels → label name/ID map + INBOX messagesTotal/messagesUnread
gmail_list_filters → existing filter criteria
gmail_search_emails → "in:inbox is:unread" (maxResults: 10) → sample

Mode-specific additions (same message):

Triage: + gmail_search_emails "is:unread" (maxResults: per tier)
Inbox Zero: + "is:unread newer_than:1d" + "label:_deferred"
Analytics: + 4x weekly volume queries (mega-wave)
Audit: + newsletter count + sent count + overdue count (mega-wave)
Cleanup: + 4x stale-detection queries

0c: Tier Assessment

Use INBOX label messagesTotal (NOT resultSizeEstimate — it is unreliable).

Inbox Size	Tier	maxResults	Strategy
<50	Small	50	Process all
50-500	Medium	100	Sample recent + unread
500-5000	Large	200	Importance-first, date-range splitting
5000+	Massive	200	Date-range parallel queries, baseline bankruptcy

Why maxResults 100-200: MCP server has N+1 fetch pattern (each result triggers a separate API call). maxResults=500 ≈ 501 API calls ≈ 2505 quota units. 200 ≈ 1005 units.

Massive tier: Split by date-range (30d chunks), issue ALL chunks in parallel. Always query is:important is:unread alongside general scan. No pageToken on MCP server — date-range splitting is the ONLY pagination.

0d: Write Session Cache

!uv run python scripts/inbox_snapshot.py cache save --labels '...' --filters '...' --inbox-count N --unread-count M --tier TIER Invalidate cache after any write operation (filter/label create, batch modify).

Use discovered label names throughout — never invent names that conflict with existing taxonomy.

Auto-Scan (empty invocation)

When invoked with no arguments, perform an intelligent account scan instead of showing a menu.

Step 1: Mega-Wave Discovery (1 message)

Fuse Phase 0 core discovery with 6 analysis queries in a SINGLE message (~10 calls):

Phase 0 core: gmail_list_email_labels + gmail_list_filters + gmail_search_emails "in:inbox is:unread" (maxResults: 10)
Inbox reach: gmail_search_emails "in:inbox has:nouserlabels newer_than:7d" (maxResults: 10)
Newsletters: gmail_search_emails "in:inbox (list:* OR subject:newsletter)" (maxResults: 50)
Stale deferred: gmail_search_emails "label:_deferred older_than:7d" (maxResults: 10)
Backlog: gmail_search_emails "in:inbox is:unread older_than:7d" (maxResults: 10)
Cleanup candidates: gmail_search_emails "in:inbox (subject:receipt OR subject:confirmation) older_than:30d" (maxResults: 10)
!uv run python scripts/inbox_snapshot.py trend --days 7
!uv run python scripts/inbox_snapshot.py baseline --days 30

Step 2: Analyze + Score

From results, compute: tier (from INBOX messagesTotal), bankruptcy level (baseline comparison), triage need (unread backlog), filter opportunity (noise sender clusters), newsletter burden, cleanup potential, deferred pile health.

Step 3: Present Action Plan

Use the Auto-Scan Report template (templates.md § Auto-Scan Report). Present 3-5 prioritized recommendations with estimated impact. Each maps to a mode.

Offer: reply with number, mode name, all (chain top 3 sequentially sharing Phase 0 state), or menu (show full mode list).

Orchestration Patterns

This skill runs in context: fork. Apply these patterns internally:

Decomposition Gate (before every mode)

List all tool calls needed for the mode
Classify: independent (different queries) vs dependent (needs prior results)
Bundle ALL independent calls into the fewest possible messages
Never execute independent calls sequentially

Mode-Specific Wave Patterns

Mode	Wave 1	Wave 2	Wave 3	Total
Auto-Scan	Phase 0 + 6 analysis (~10)	scoring (0 tools)	—	~10
Triage	Phase 0 + fast-lane (~5)	3x batch_modify	5-8x read_email	~16
Analytics	Phase 0 + mega-wave (~15)	per-entity batches (~12)	—	~27
Audit	Phase 0 + mega-wave (~14)	per-entity batches (~20)	—	~34
Cleanup	Phase 0 + 4 stale queries (~7)	batch_modify (~3)	—	~10
Filters	Phase 0 + sender/subject (~6)	create_filter (~5)	—	~11
Inbox Zero	Phase 0 + unread/deferred (~5)	batch ops (~3)	—	~8
Combo	Phase 0 + mode 1	mode 1 ops	refresh + mode 2	varies

Mode: Triage

Three-pass classification. Details: triage-framework.md § Fast-Lane, efficiency-guide.md.

Pass 1: Fast-Lane (zero content reads)

Classify by sender + subject + snippet ONLY from search results:

Memory corrections → override any conflicting rule (highest priority)
Memory VIP senders → DO (confidence from memory)
Memory noise senders → NOISE (confidence from memory)
noreply@*, notifications@*, calendar-notification@* → NOISE
Known VIP senders (from session cache) → DO
Subject [FYI]/[INFO]/newsletter/receipt/confirmation → REFERENCE
Subject [ACTION]/[URGENT]/deadline → DO
Unmatched → UNDECIDED

Pass 1.5: Security Gate (mandatory before archiving NOISE)

Scan all NOISE subjects/snippets for security keywords: password reset, verify your, sign-in, unusual activity, 2FA, account locked, verification code, one-time password, security alert, new device, suspicious login, recovery phone, two-step verification, confirm your identity, new sign-in method. Any match → pull from NOISE to REVIEW. Also: any noreply@ email newer_than:7d → REVIEW (login notifications have no expiry; codes expire but should still surface).

Pass 2: Batch Operations (all in ONE message)

Precondition: Call gmail_get_or_create_label for _reference and _deferred if not already resolved in Phase 0.

Issue 3 gmail_batch_modify_emails calls simultaneously (disjoint ID sets = parallel safe):

NOISE ids → remove INBOX
REFERENCE ids → add _reference, remove INBOX
DEFER ids → add _deferred, remove INBOX

Pass 3: Content Inspection (UNDECIDED + REVIEW only)

Batch gmail_read_email 5-8 per message. Apply full 5-bucket. Batch-process results with same parallel pattern as Pass 2. Report with fast-lane triage summary (show auto-classified %). Surface recurring NOISE sender patterns as filter candidates.

Mode: Inbox Zero

Structured habit system. Details: inbox-zero-system.md.

Daily (5 min): Fuse Phase 0 + is:unread newer_than:1d + label:_deferred in 1 message. Quick triage pass (5-bucket). Batch archive NOISE + REFERENCE. Report: inbox count before → after. Update progress file.

Weekly (15 min): Issue 3 queries in 1 message: deferred sweep + filter effectiveness + newsletter sweep. Clear, re-defer, or archive. Update weekly_reviews.

Bankruptcy detection: Relative to 30-day baseline (not absolute). YELLOW: 20-50% above baseline. ORANGE: 50-100% above baseline (aggressive triage + filter creation). RED: >100% above baseline (full bankruptcy protocol). Acceleration trigger: week-over-week growth >20% for 2+ weeks. New users default to 500 until 7+ snapshots establish a baseline. See inbox-zero-system.md for recovery protocols.

Mode: Filters

Pattern-detected filter suggestions. Details: filter-patterns.md.

Issue sender clustering + subject pattern + gmail_list_filters ALL in 1 message. Score candidates: HIGH (volume ≥20, engagement <5%) → offer create. MEDIUM (≥10, <15%) → suggest. LOW → show only. Check conflicts before creating. After approval: issue multiple gmail_create_filter calls in 1 message. Massive tier: split 90d into 3x30d chunks, query in parallel.

Mode: Auto-Rules

Deep behavioral analysis → automated filter creation. Details: filter-patterns.md § Auto-Rule Detection.

Issue sender analysis + subject mining + filter fetch in 1 message. Massive tier: split 90d into 3x30d date-range chunks, issue ALL in parallel. Score by confidence. Present auto-rules report. HIGH: gmail_create_filter_from_template with confirmation. MEDIUM: individual review. Creation: multiple filters in 1 message. Schedule learning loop after 2 weeks.

Mode: Analytics

Email communication metrics via mega-wave. Details: analytics-guide.md.

Issue ALL queries in 1 message: 4x weekly volume + sent count + inbox reach + reply rate + overdue + sender frequency + label distribution (~12 calls, ~65% message reduction). Classify volume trend (growing/stable/declining). Compute inbox reach rate (good: <30%). Identify top 20 senders by volume + engagement matrix. Present full analytics report.

Mode: Newsletters

Subscription audit. Issue list:* + subject:newsletter in 1 message. Estimate frequency + read rate per subscription. Classify: UNSUBSCRIBE / KEEP / FILTER. Surface unsubscribe links. On approval: create filter or provide unsubscribe URL. Details: workflows.md § Unsubscribe Strategies.

Mode: Labels

Label taxonomy analysis. gmail_list_email_labels returns messagesTotal/messagesUnread per label. For 30d activity classification, issue per-label search queries in batches of 10-15. Find: duplicates/synonyms, stale labels (no messages 90d+), flat structure needing hierarchy. Suggest: MERGE / RENAME / DELETE / RESTRUCTURE. For merges: gmail_batch_modify_emails to migrate, then gmail_delete_label. Details: workflows.md § Label Hierarchy.

Mode: Search

Quick-compatible (no Phase 0). Natural language → Gmail search query. Parse intent (who/what/when/status), map to operators, add negations, show query + explanation + expected matches. Offer to run. Details: filter-patterns.md § Gmail Search Operators.

Mode: Senders

VIP + noise identification. Issue sample search + per-sender reply-rate queries in 1 message (if top-sender count > 12, split reply-rate queries across 2 messages to stay under call cap). VIP: reply rate >50% or consistently starred. Noise: volume >10, engagement 0%. Present sender report. On approval: create VIP filter (mark important) or noise filter. Details: triage-framework.md § Sender Shortcuts, workflows.md § VIP Management.

Mode: Digest

Quick-compatible (no Phase 0). Search is:important newer_than:3d (or is:starred OR is:unread is:important). Extract sender, subject, 1-line summary per email. Categorize: RESPOND / FYI / READING. Present digest report. Offer: "open N" | "archive fyi". Details: templates.md § Digest.

Mode: Cleanup

Archive candidates. Issue ALL 4 stale-detection queries in 1 message: receipts/confirmations older_than:30d + old newsletters older_than:30d + expired deferrals older_than:14d + fully-read no-label older_than:60d. Batch-archive ALL results in parallel gmail_batch_modify_emails calls. Archive only — never delete. Details: workflows.md § Batch Operations.

Mode: Audit

Full inbox health via mega-wave. Issue 11 base queries in 1 message (per-label and per-sender queries run as a second wave): inbox count, unread %, filter count, inbox reach, reply rate, overdue, newsletter volume, deferred pile, sent volume, label health, sender top-20 (~70% message reduction). Score 0-100 based on inbox reach rate, filter coverage, label health. Identify top 3 quick wins. Generate action plan. Details: analytics-guide.md, templates.md § Full Audit.

Scope Boundaries

IS for: Gmail triage, inbox zero, filters, auto-rules, analytics, newsletters, labels, search, senders, digest, cleanup, audit. NOT for: Calendar, Google Drive, non-Gmail, real-time monitoring, CRM.

Reference File Index

File	Content	Load When
`references/efficiency-guide.md`	Cache, parallel calls, tiers, MCP constraints, fast-lane, pagination	All modes
`references/triage-framework.md`	5-bucket decision trees, fast-lane rules, security gate, batch processing	triage, inbox-zero
`references/filter-patterns.md`	Gmail operators, filter templates, auto-rule algorithms, learning loop	filters, auto-rules, search
`references/workflows.md`	Bankruptcy, VIP, batch ops, label hierarchy, unsubscribe, combo mode	cleanup, labels, senders, newsletters
`references/templates.md`	All confirmation, report, and execution result templates	Every mode
`references/inbox-zero-system.md`	Daily/weekly routines, progress schema, baseline bankruptcy	inbox-zero
`references/analytics-guide.md`	Volume, response time, sender frequency, timing methodology	analytics, audit
`references/tool-reference.md`	All 18 Gmail MCP tool signatures, parameters, MCP constraints	When selecting tools

Scripts:

Script	Purpose	Run When
`scripts/inbox_snapshot.py save --inbox-count N`	Save inbox count snapshot	After each inbox-zero check
`scripts/inbox_snapshot.py trend --days 7`	Compute inbox trend	During analytics mode
`scripts/inbox_snapshot.py baseline --days 30`	Compute rolling baseline for bankruptcy	During inbox-zero
`scripts/inbox_snapshot.py cache load`	Load session cache (1h TTL)	Phase 0 step 0a
`scripts/inbox_snapshot.py cache save --labels --filters --inbox-count --unread-count --tier`	Write session cache	Phase 0 step 0d
`scripts/inbox_snapshot.py cache clear`	Invalidate session cache	After write operations
`scripts/memory.py load --mode MODE`	Load user memory filtered by mode	Phase -1
`scripts/memory.py save-sender`	Save VIP or noise sender	After triage/senders
`scripts/memory.py save-correction`	Record user classification override	When user corrects
`scripts/memory.py prune`	Remove stale memory entries	Once per session
`scripts/memory.py stats`	Memory summary counts	Diagnostics

Critical Rules

Run Phase 0 discovery before any workflow — never assume label/filter state (unless quick mode)
Confirm all write operations — show scope, count, and reversibility before calling
Batch similar operations — never loop gmail_modify_email when gmail_batch_modify_emails applies
Preserve existing taxonomy — suggest improvements, never silently rename
Check filter conflicts before creating — call gmail_list_filters first
Include confidence scores on all filter/rule suggestions
Provide rollback guidance for every bulk operation
Handle MCP failures gracefully — see efficiency-guide.md § Error Recovery
Surface unsubscribe links — make newsletter cleanup actionable
Prioritize quick wins — high impact, low effort first in all reports
Update inbox-zero progress after every routine
NEVER delete emails unless the user explicitly requests deletion. Archive by default in all modes. When the user does request deletion, use the Destructive Warning template (TYPE "DELETE" confirmation) before calling gmail_delete_email or gmail_batch_delete_emails
Issue ALL independent tool calls in a single message — sequential independent calls is a bug
Use fast-lane classification before reading content — sender + subject resolve 60%+ of triage
ALWAYS run the security gate on NOISE before archiving — scan subjects for 2FA/password/security keywords
Check session cache before Phase 0. Invalidate cache after any write operation (gmail_create_label, gmail_get_or_create_label, gmail_create_filter, gmail_create_filter_from_template, gmail_batch_modify_emails, gmail_batch_delete_emails).
maxResults 100-200 (not 500) — MCP N+1 pattern means 500 results ≈ 501 API calls
Bankruptcy triggers are relative to 30-day baseline, not absolute thresholds
Load user memory before Phase 0 (Phase -1). Save memories AFTER delivering primary output — never block a mode's core workflow for a memory write

Memory Save Triggers

Save memories at natural decision points. NEVER slow down a mode to save — always save AFTER the mode's primary output.

Trigger	Command
Triage: user overrides fast-lane classification	`memory.py save-correction --mode triage --what "..." --correction "..." --pattern "..." --action BUCKET`
Triage: NOISE batch archived successfully	`memory.py save-sender --email X --type noise --reason "..." --source triage` (batch multiple in 1 message)
Senders: VIP identified (reply rate >50%)	`memory.py save-sender --email X --type vip --name "..." --reason "..." --source senders`
Filters: filter created successfully	`memory.py save-filter --filter-id ID --description "..." --effectiveness high`
Filters: user rejects a suggestion	`memory.py save-filter --filter-id rejected --description "..." --effectiveness failed`
Labels: user states naming preference	`memory.py save-labels --convention "..." --structure "..."`
Analytics: volume patterns computed	`memory.py save-patterns --daily-volume N --busy '[...]'`
Any mode: user corrects a classification	`memory.py save-correction ...`

Batch multiple save calls in a SINGLE message when a mode produces several memories (e.g., triage identifies 5 noise senders)

email-whiz