email-whiz

Installation
SKILL.md

Email Whiz

Gmail inbox management via MCP. Parallel-first, large-inbox optimized.


Canonical Vocabulary

Term Definition
5-bucket DO / DELEGATE / DEFER / REFERENCE / NOISE — five triage buckets (historically "4D+N")
inbox zero State where inbox contains only items requiring action today
fast-lane Sender/subject-only classification without reading email content
security gate Subject-keyword scan of NOISE before archiving (catches 2FA, password resets)
session cache Local cache of Phase 0 results (1h TTL) at `~/.{gemini
tier Inbox size class: Small (<50) / Medium (50-500) / Large (500-5k) / Massive (5k+)
quick Modifier prefix that skips Phase 0 for lightweight modes
combo Chain two modes sharing discovered state: <mode> + <mode>
mega-wave Single message with 10-15 independent tool calls bundled together
baseline Rolling 30-day average inbox count; bankruptcy triggers are relative to this
auto-rule Gmail filter created from behavioral pattern analysis
VIP Sender who receives consistent replies and high-priority treatment
engagement Reply rate to a sender: replies sent / emails received
confidence HIGH >80% / MEDIUM 50-80% / LOW <50% — governs auto-rule actions
memory Persistent user preferences and patterns at `~/.{gemini
correction User disagreement with auto-classification, stored as a memory override
staleness Memory entry not confirmed/used beyond threshold (VIP: 90d, override: 60d)

Dispatch

$ARGUMENTS Action
triage Categorize inbox with fast-lane + 5-bucket, batch process by category
inbox-zero Daily/weekly routine, progress tracking, baseline bankruptcy detection
filters Pattern-detected filter suggestions with confidence scoring
auto-rules Behavioral analysis → auto-create Gmail filters from templates
analytics Volume trends, response times, sender frequency, timing patterns
newsletters Subscription audit + unsubscribe plan
labels Taxonomy analysis, merge/rename/delete recommendations
search <goal> Natural language → Gmail search query
senders VIP + noise identification
digest Recent important email summary
cleanup Archive candidates, duplicate detection, batch archiving
audit Full inbox health report with score and action plan
quick search or quick digest Skip Phase 0, run mode directly. Only search and digest are quick-compatible
<mode> + <mode> Chain modes, sharing discovered state (triage + filters)
(empty) Auto-scan: run Phase 0 + account analysis, present prioritized action plan

Classification gate: If $ARGUMENTS could match multiple modes, ask which the user wants before proceeding.


Hybrid Mode Protocol

Read operations — execute immediately: gmail_search_emails, gmail_read_email, gmail_list_email_labels, gmail_list_filters, gmail_get_filter

Write operations — confirm before executing: Single: gmail_modify_email, gmail_delete_email, gmail_create_label Batch: gmail_batch_modify_emails, gmail_batch_delete_emails Labels: gmail_update_label, gmail_delete_label, gmail_get_or_create_label Filters: gmail_create_filter, gmail_create_filter_from_template, gmail_delete_filter

Email deletion is FORBIDDEN unless the user explicitly requests it. Never call gmail_delete_email or gmail_batch_delete_emails on your own initiative. Always archive instead. Only use delete tools when the user directly says "delete" — and even then, use the Destructive Warning template (TYPE "DELETE" confirmation) first.

Confirmation format: See templates.md. Always show scope, count, and reversibility before write ops.


Parallelization Rules

  1. Independent tool calls go in ONE message. Sequential independent calls is a bug.
  2. Phase 0 fuses with mode's first queries — never 2 sequential rounds when fusion is possible.
  3. Batch ops on disjoint ID sets: parallel. NOISE + REFERENCE + DEFER = 1 message, 3 calls.
  4. Analytics/Audit: mega-wave — 10-15 search queries in 1 message.
  5. gmail_read_email: batch 5-8 per message (full messages are large).
  6. Filter creation: parallel after conflict check passes.
  7. Combo mode boundary: sequential (data dependency). But skip Phase 0 for mode 2.
  8. Max 15-20 calls per message. Beyond that, a single 429 impacts the entire bundle.
  9. On 429: preserve completed results, drop to sequential with 2s gaps. 3x429 in 60s → pause 60s.

Details: efficiency-guide.md § Parallel Call Map.


Phase -1: Memory Load (skip with quick prefix)

Load long-term user memory before discovery. Runs once per session.

!uv run python scripts/memory.py load --mode <current_mode>

Fuse with Phase 0 cache check in the SAME message (both are local file reads, zero API calls):

!uv run python scripts/memory.py load --mode triage
!uv run python scripts/inbox_snapshot.py cache load

If memory exists, integrate into session:

  • VIP senders → fast-lane DO classification (Pass 1)
  • Noise senders → fast-lane NOISE classification (Pass 1)
  • Corrections → highest priority: override any conflicting fast-lane rule
  • Triage overrides → apply before generic decision tree
  • Filter history → skip suggesting previously rejected filters
  • Label prefs → use preferred naming when creating labels
  • Inbox patterns → adjust expectations in analytics/audit

Run !uv run python scripts/memory.py prune once per session if memory exists. Report pruned entries only if count > 0.


Phase 0: Discovery (skip with quick prefix)

Prerequisite: Run Phase -1 (Memory Load) first unless in quick mode.

0a: Session Cache Check

!uv run python scripts/inbox_snapshot.py cache load If valid (< 1h old) → use cached labels, filters, tier. Jump to mode.

0b: Parallel Discovery + Mode Fusion (cache miss)

Issue Phase 0 calls AND mode's first queries in a SINGLE message:

Core discovery (always):

  1. gmail_list_email_labels → label name/ID map + INBOX messagesTotal/messagesUnread
  2. gmail_list_filters → existing filter criteria
  3. gmail_search_emails"in:inbox is:unread" (maxResults: 10) → sample

Mode-specific additions (same message):

  • Triage: + gmail_search_emails "is:unread" (maxResults: per tier)
  • Inbox Zero: + "is:unread newer_than:1d" + "label:_deferred"
  • Analytics: + 4x weekly volume queries (mega-wave)
  • Audit: + newsletter count + sent count + overdue count (mega-wave)
  • Cleanup: + 4x stale-detection queries

0c: Tier Assessment

Use INBOX label messagesTotal (NOT resultSizeEstimate — it is unreliable).

Inbox Size Tier maxResults Strategy
<50 Small 50 Process all
50-500 Medium 100 Sample recent + unread
500-5000 Large 200 Importance-first, date-range splitting
5000+ Massive 200 Date-range parallel queries, baseline bankruptcy

Why maxResults 100-200: MCP server has N+1 fetch pattern (each result triggers a separate API call). maxResults=500 ≈ 501 API calls ≈ 2505 quota units. 200 ≈ 1005 units.

Massive tier: Split by date-range (30d chunks), issue ALL chunks in parallel. Always query is:important is:unread alongside general scan. No pageToken on MCP server — date-range splitting is the ONLY pagination.

0d: Write Session Cache

!uv run python scripts/inbox_snapshot.py cache save --labels '...' --filters '...' --inbox-count N --unread-count M --tier TIER Invalidate cache after any write operation (filter/label create, batch modify).

Use discovered label names throughout — never invent names that conflict with existing taxonomy.


Auto-Scan (empty invocation)

When invoked with no arguments, perform an intelligent account scan instead of showing a menu.

Step 1: Mega-Wave Discovery (1 message)

Fuse Phase 0 core discovery with 6 analysis queries in a SINGLE message (~10 calls):

  • Phase 0 core: gmail_list_email_labels + gmail_list_filters + gmail_search_emails "in:inbox is:unread" (maxResults: 10)
  • Inbox reach: gmail_search_emails "in:inbox has:nouserlabels newer_than:7d" (maxResults: 10)
  • Newsletters: gmail_search_emails "in:inbox (list:* OR subject:newsletter)" (maxResults: 50)
  • Stale deferred: gmail_search_emails "label:_deferred older_than:7d" (maxResults: 10)
  • Backlog: gmail_search_emails "in:inbox is:unread older_than:7d" (maxResults: 10)
  • Cleanup candidates: gmail_search_emails "in:inbox (subject:receipt OR subject:confirmation) older_than:30d" (maxResults: 10)
  • !uv run python scripts/inbox_snapshot.py trend --days 7
  • !uv run python scripts/inbox_snapshot.py baseline --days 30

Step 2: Analyze + Score

From results, compute: tier (from INBOX messagesTotal), bankruptcy level (baseline comparison), triage need (unread backlog), filter opportunity (noise sender clusters), newsletter burden, cleanup potential, deferred pile health.

Step 3: Present Action Plan

Use the Auto-Scan Report template (templates.md § Auto-Scan Report). Present 3-5 prioritized recommendations with estimated impact. Each maps to a mode.

Offer: reply with number, mode name, all (chain top 3 sequentially sharing Phase 0 state), or menu (show full mode list).


Orchestration Patterns

This skill runs in context: fork. Apply these patterns internally:

Decomposition Gate (before every mode)

  1. List all tool calls needed for the mode
  2. Classify: independent (different queries) vs dependent (needs prior results)
  3. Bundle ALL independent calls into the fewest possible messages
  4. Never execute independent calls sequentially

Mode-Specific Wave Patterns

Mode Wave 1 Wave 2 Wave 3 Total
Auto-Scan Phase 0 + 6 analysis (~10) scoring (0 tools) ~10
Triage Phase 0 + fast-lane (~5) 3x batch_modify 5-8x read_email ~16
Analytics Phase 0 + mega-wave (~15) per-entity batches (~12) ~27
Audit Phase 0 + mega-wave (~14) per-entity batches (~20) ~34
Cleanup Phase 0 + 4 stale queries (~7) batch_modify (~3) ~10
Filters Phase 0 + sender/subject (~6) create_filter (~5) ~11
Inbox Zero Phase 0 + unread/deferred (~5) batch ops (~3) ~8
Combo Phase 0 + mode 1 mode 1 ops refresh + mode 2 varies

Mode: Triage

Three-pass classification. Details: triage-framework.md § Fast-Lane, efficiency-guide.md.

Pass 1: Fast-Lane (zero content reads)

Classify by sender + subject + snippet ONLY from search results:

  • Memory corrections → override any conflicting rule (highest priority)
  • Memory VIP senders → DO (confidence from memory)
  • Memory noise senders → NOISE (confidence from memory)
  • noreply@*, notifications@*, calendar-notification@* → NOISE
  • Known VIP senders (from session cache) → DO
  • Subject [FYI]/[INFO]/newsletter/receipt/confirmation → REFERENCE
  • Subject [ACTION]/[URGENT]/deadline → DO
  • Unmatched → UNDECIDED

Pass 1.5: Security Gate (mandatory before archiving NOISE)

Scan all NOISE subjects/snippets for security keywords: password reset, verify your, sign-in, unusual activity, 2FA, account locked, verification code, one-time password, security alert, new device, suspicious login, recovery phone, two-step verification, confirm your identity, new sign-in method. Any match → pull from NOISE to REVIEW. Also: any noreply@ email newer_than:7d → REVIEW (login notifications have no expiry; codes expire but should still surface).

Pass 2: Batch Operations (all in ONE message)

Precondition: Call gmail_get_or_create_label for _reference and _deferred if not already resolved in Phase 0.

Issue 3 gmail_batch_modify_emails calls simultaneously (disjoint ID sets = parallel safe):

  • NOISE ids → remove INBOX
  • REFERENCE ids → add _reference, remove INBOX
  • DEFER ids → add _deferred, remove INBOX

Pass 3: Content Inspection (UNDECIDED + REVIEW only)

Batch gmail_read_email 5-8 per message. Apply full 5-bucket. Batch-process results with same parallel pattern as Pass 2. Report with fast-lane triage summary (show auto-classified %). Surface recurring NOISE sender patterns as filter candidates.


Mode: Inbox Zero

Structured habit system. Details: inbox-zero-system.md.

Daily (5 min): Fuse Phase 0 + is:unread newer_than:1d + label:_deferred in 1 message. Quick triage pass (5-bucket). Batch archive NOISE + REFERENCE. Report: inbox count before → after. Update progress file.

Weekly (15 min): Issue 3 queries in 1 message: deferred sweep + filter effectiveness + newsletter sweep. Clear, re-defer, or archive. Update weekly_reviews.

Bankruptcy detection: Relative to 30-day baseline (not absolute). YELLOW: 20-50% above baseline. ORANGE: 50-100% above baseline (aggressive triage + filter creation). RED: >100% above baseline (full bankruptcy protocol). Acceleration trigger: week-over-week growth >20% for 2+ weeks. New users default to 500 until 7+ snapshots establish a baseline. See inbox-zero-system.md for recovery protocols.


Mode: Filters

Pattern-detected filter suggestions. Details: filter-patterns.md.

Issue sender clustering + subject pattern + gmail_list_filters ALL in 1 message. Score candidates: HIGH (volume ≥20, engagement <5%) → offer create. MEDIUM (≥10, <15%) → suggest. LOW → show only. Check conflicts before creating. After approval: issue multiple gmail_create_filter calls in 1 message. Massive tier: split 90d into 3x30d chunks, query in parallel.


Mode: Auto-Rules

Deep behavioral analysis → automated filter creation. Details: filter-patterns.md § Auto-Rule Detection.

Issue sender analysis + subject mining + filter fetch in 1 message. Massive tier: split 90d into 3x30d date-range chunks, issue ALL in parallel. Score by confidence. Present auto-rules report. HIGH: gmail_create_filter_from_template with confirmation. MEDIUM: individual review. Creation: multiple filters in 1 message. Schedule learning loop after 2 weeks.


Mode: Analytics

Email communication metrics via mega-wave. Details: analytics-guide.md.

Issue ALL queries in 1 message: 4x weekly volume + sent count + inbox reach + reply rate + overdue + sender frequency + label distribution (~12 calls, ~65% message reduction). Classify volume trend (growing/stable/declining). Compute inbox reach rate (good: <30%). Identify top 20 senders by volume + engagement matrix. Present full analytics report.


Mode: Newsletters

Subscription audit. Issue list:* + subject:newsletter in 1 message. Estimate frequency + read rate per subscription. Classify: UNSUBSCRIBE / KEEP / FILTER. Surface unsubscribe links. On approval: create filter or provide unsubscribe URL. Details: workflows.md § Unsubscribe Strategies.


Mode: Labels

Label taxonomy analysis. gmail_list_email_labels returns messagesTotal/messagesUnread per label. For 30d activity classification, issue per-label search queries in batches of 10-15. Find: duplicates/synonyms, stale labels (no messages 90d+), flat structure needing hierarchy. Suggest: MERGE / RENAME / DELETE / RESTRUCTURE. For merges: gmail_batch_modify_emails to migrate, then gmail_delete_label. Details: workflows.md § Label Hierarchy.


Mode: Search

Quick-compatible (no Phase 0). Natural language → Gmail search query. Parse intent (who/what/when/status), map to operators, add negations, show query + explanation + expected matches. Offer to run. Details: filter-patterns.md § Gmail Search Operators.


Mode: Senders

VIP + noise identification. Issue sample search + per-sender reply-rate queries in 1 message (if top-sender count > 12, split reply-rate queries across 2 messages to stay under call cap). VIP: reply rate >50% or consistently starred. Noise: volume >10, engagement 0%. Present sender report. On approval: create VIP filter (mark important) or noise filter. Details: triage-framework.md § Sender Shortcuts, workflows.md § VIP Management.


Mode: Digest

Quick-compatible (no Phase 0). Search is:important newer_than:3d (or is:starred OR is:unread is:important). Extract sender, subject, 1-line summary per email. Categorize: RESPOND / FYI / READING. Present digest report. Offer: "open N" | "archive fyi". Details: templates.md § Digest.


Mode: Cleanup

Archive candidates. Issue ALL 4 stale-detection queries in 1 message: receipts/confirmations older_than:30d + old newsletters older_than:30d + expired deferrals older_than:14d + fully-read no-label older_than:60d. Batch-archive ALL results in parallel gmail_batch_modify_emails calls. Archive only — never delete. Details: workflows.md § Batch Operations.


Mode: Audit

Full inbox health via mega-wave. Issue 11 base queries in 1 message (per-label and per-sender queries run as a second wave): inbox count, unread %, filter count, inbox reach, reply rate, overdue, newsletter volume, deferred pile, sent volume, label health, sender top-20 (~70% message reduction). Score 0-100 based on inbox reach rate, filter coverage, label health. Identify top 3 quick wins. Generate action plan. Details: analytics-guide.md, templates.md § Full Audit.


Scope Boundaries

IS for: Gmail triage, inbox zero, filters, auto-rules, analytics, newsletters, labels, search, senders, digest, cleanup, audit. NOT for: Calendar, Google Drive, non-Gmail, real-time monitoring, CRM.


Reference File Index

File Content Load When
references/efficiency-guide.md Cache, parallel calls, tiers, MCP constraints, fast-lane, pagination All modes
references/triage-framework.md 5-bucket decision trees, fast-lane rules, security gate, batch processing triage, inbox-zero
references/filter-patterns.md Gmail operators, filter templates, auto-rule algorithms, learning loop filters, auto-rules, search
references/workflows.md Bankruptcy, VIP, batch ops, label hierarchy, unsubscribe, combo mode cleanup, labels, senders, newsletters
references/templates.md All confirmation, report, and execution result templates Every mode
references/inbox-zero-system.md Daily/weekly routines, progress schema, baseline bankruptcy inbox-zero
references/analytics-guide.md Volume, response time, sender frequency, timing methodology analytics, audit
references/tool-reference.md All 18 Gmail MCP tool signatures, parameters, MCP constraints When selecting tools

Scripts:

Script Purpose Run When
scripts/inbox_snapshot.py save --inbox-count N Save inbox count snapshot After each inbox-zero check
scripts/inbox_snapshot.py trend --days 7 Compute inbox trend During analytics mode
scripts/inbox_snapshot.py baseline --days 30 Compute rolling baseline for bankruptcy During inbox-zero
scripts/inbox_snapshot.py cache load Load session cache (1h TTL) Phase 0 step 0a
scripts/inbox_snapshot.py cache save --labels --filters --inbox-count --unread-count --tier Write session cache Phase 0 step 0d
scripts/inbox_snapshot.py cache clear Invalidate session cache After write operations
scripts/memory.py load --mode MODE Load user memory filtered by mode Phase -1
scripts/memory.py save-sender Save VIP or noise sender After triage/senders
scripts/memory.py save-correction Record user classification override When user corrects
scripts/memory.py prune Remove stale memory entries Once per session
scripts/memory.py stats Memory summary counts Diagnostics

Critical Rules

  1. Run Phase 0 discovery before any workflow — never assume label/filter state (unless quick mode)
  2. Confirm all write operations — show scope, count, and reversibility before calling
  3. Batch similar operations — never loop gmail_modify_email when gmail_batch_modify_emails applies
  4. Preserve existing taxonomy — suggest improvements, never silently rename
  5. Check filter conflicts before creating — call gmail_list_filters first
  6. Include confidence scores on all filter/rule suggestions
  7. Provide rollback guidance for every bulk operation
  8. Handle MCP failures gracefully — see efficiency-guide.md § Error Recovery
  9. Surface unsubscribe links — make newsletter cleanup actionable
  10. Prioritize quick wins — high impact, low effort first in all reports
  11. Update inbox-zero progress after every routine
  12. NEVER delete emails unless the user explicitly requests deletion. Archive by default in all modes. When the user does request deletion, use the Destructive Warning template (TYPE "DELETE" confirmation) before calling gmail_delete_email or gmail_batch_delete_emails
  13. Issue ALL independent tool calls in a single message — sequential independent calls is a bug
  14. Use fast-lane classification before reading content — sender + subject resolve 60%+ of triage
  15. ALWAYS run the security gate on NOISE before archiving — scan subjects for 2FA/password/security keywords
  16. Check session cache before Phase 0. Invalidate cache after any write operation (gmail_create_label, gmail_get_or_create_label, gmail_create_filter, gmail_create_filter_from_template, gmail_batch_modify_emails, gmail_batch_delete_emails).
  17. maxResults 100-200 (not 500) — MCP N+1 pattern means 500 results ≈ 501 API calls
  18. Bankruptcy triggers are relative to 30-day baseline, not absolute thresholds
  19. Load user memory before Phase 0 (Phase -1). Save memories AFTER delivering primary output — never block a mode's core workflow for a memory write

Memory Save Triggers

Save memories at natural decision points. NEVER slow down a mode to save — always save AFTER the mode's primary output.

Trigger Command
Triage: user overrides fast-lane classification memory.py save-correction --mode triage --what "..." --correction "..." --pattern "..." --action BUCKET
Triage: NOISE batch archived successfully memory.py save-sender --email X --type noise --reason "..." --source triage (batch multiple in 1 message)
Senders: VIP identified (reply rate >50%) memory.py save-sender --email X --type vip --name "..." --reason "..." --source senders
Filters: filter created successfully memory.py save-filter --filter-id ID --description "..." --effectiveness high
Filters: user rejects a suggestion memory.py save-filter --filter-id rejected --description "..." --effectiveness failed
Labels: user states naming preference memory.py save-labels --convention "..." --structure "..."
Analytics: volume patterns computed memory.py save-patterns --daily-volume N --busy '[...]'
Any mode: user corrects a classification memory.py save-correction ...

Batch multiple save calls in a SINGLE message when a mode produces several memories (e.g., triage identifies 5 noise senders)

Related skills
Installs
45
GitHub Stars
3
First Seen
Feb 26, 2026