gdrive-sheets-compute

Installation
SKILL.md

Google Drive Sheets Compute

Read PDFs and other files from Google Drive, extract structured data, and fill Google Sheets cells (values or formulas) based on that data.

CLI Location

~/.claude/skills/gdrive-sheets-compute/scripts/gdrive_sheets.py

Running

uv run ~/.claude/skills/gdrive-sheets-compute/scripts/gdrive_sheets.py <command> [options]

Authentication

Uses OAuth2 with a desktop app flow. Credentials are cached locally.

# First-time setup: opens browser for Google OAuth consent
gdrive-sheets auth login

# Check current auth status
gdrive-sheets auth status

# Revoke stored credentials
gdrive-sheets auth logout

Credentials stored in ~/.config/gdrive-sheets-compute/.

Required OAuth Scopes

  • https://www.googleapis.com/auth/drive.readonly
  • https://www.googleapis.com/auth/spreadsheets

Setting Up Google Cloud Credentials

  1. Go to Google Cloud Console
  2. Create or select a project
  3. Enable Google Drive API and Google Sheets API
  4. Create OAuth 2.0 Client ID (Desktop application)
  5. Download the JSON and save as ~/.config/gdrive-sheets-compute/client_secret.json

Commands

Drive Operations

# Search files by name (case-insensitive)
gdrive-sheets drive search budget
gdrive-sheets drive search "tax 2024" --type spreadsheet
gdrive-sheets drive search invoices --type pdf

# List files in a folder
gdrive-sheets drive ls --folder-id <FOLDER_ID>
gdrive-sheets drive ls --folder-id <FOLDER_ID> --mime-type application/pdf

# Download a file (binary) to local path
gdrive-sheets drive download --file-id <FILE_ID> --output /tmp/file.pdf

# Extract text from a PDF on Drive (downloads + extracts)
gdrive-sheets drive extract-pdf --file-id <FILE_ID>

# Extract text from all PDFs in a folder
gdrive-sheets drive extract-pdfs --folder-id <FOLDER_ID>

Sheets Operations

# Read a range from a spreadsheet
gdrive-sheets sheets read --spreadsheet-id <ID> --range 'Sheet1!A1:D10'

# Write values to a range
gdrive-sheets sheets write --spreadsheet-id <ID> --range 'Sheet1!A1' \
  --values '[["Name","Amount"],["Alice",100]]'

# Write a formula to a cell
gdrive-sheets sheets write-formula --spreadsheet-id <ID> --range 'Sheet1!B2' \
  --formula '=SUM(C2:C10)'

# Batch update multiple ranges
gdrive-sheets sheets batch-write --spreadsheet-id <ID> \
  --data '[{"range":"Sheet1!A1","values":[["hello"]]},{"range":"Sheet1!B1","values":[[42]]}]'

# Get spreadsheet metadata (sheet names, grid properties)
gdrive-sheets sheets info --spreadsheet-id <ID>

Pipeline: PDF → Sheets

# Extract data from a PDF and display as structured table
gdrive-sheets pipeline pdf-to-table --file-id <PDF_FILE_ID>

# Extract data from all PDFs in a folder, show summary
gdrive-sheets pipeline folder-summary --folder-id <FOLDER_ID>

Workflow Example

A typical workflow for filling a sheet from Drive PDFs:

  1. List PDFs: gdrive-sheets drive ls --folder-id <ID> --mime-type application/pdf
  2. Extract text: gdrive-sheets drive extract-pdfs --folder-id <ID>
  3. Read current sheet: gdrive-sheets sheets read --spreadsheet-id <ID> --range 'Sheet1!A:Z'
  4. Compute and write: Use Claude to parse extracted text, compute values, then gdrive-sheets sheets write or sheets batch-write to fill cells

Key Details

  • Auth: OAuth2 desktop flow with refresh token persistence
  • Config dir: ~/.config/gdrive-sheets-compute/
  • PDF extraction: Uses pymupdf (fitz) for text extraction — handles scanned PDFs poorly; for OCR, pipe through pytesseract (optional dependency)
  • Sheets formulas: Use write-formula for single formulas, or include formula strings in write/batch-write values (prefix with =)
  • Rate limits: Google API quotas apply; batch operations preferred over many single calls

Programmatic Access via gspread (Python)

When writing Python scripts that create/populate Google Sheets (e.g. with uv run), use the gspread library with the existing service account:

Auth

import gspread

# Service account key at this fixed path
gc = gspread.service_account(
    filename="/Users/clementwalter/.config/gdrive-sheets-compute/service_account.json"
)
  • Service account email: booking-preferences@rainbot.iam.gserviceaccount.com
  • Sheets created by the SA are private — always call spreadsheet.share("", perm_type="anyone", role="writer") to make them accessible.

gws CLI status

The gws CLI (~/.nvm/versions/node/v24.11.0/bin/gws) is installed but NOT authenticated. It requires gws auth setup with a GCP OAuth client. For programmatic use, prefer gspread with the service account above.

Creating sheets with charts

Key patterns discovered:

# Create spreadsheet
spreadsheet = gc.create("Title")

# Rename default sheet, add extra sheets with enough room for chart overlays
ws = spreadsheet.sheet1
ws.update_title("Data")
ws_agg = spreadsheet.add_worksheet(title="Aggregated", rows=num_rows + 50, cols=15)

# Write data
ws.update(values=[headers, *rows], range_name="A1")

# Add chart via batchUpdate (gspread doesn't have a chart helper)
gc.http_client.request(
    "post",
    f"https://sheets.googleapis.com/v4/spreadsheets/{spreadsheet.id}:batchUpdate",
    json={"requests": [chart_request]},
)

# Format headers (bold, freeze)
gc.http_client.request(
    "post",
    f"https://sheets.googleapis.com/v4/spreadsheets/{spreadsheet.id}:batchUpdate",
    json={"requests": [repeat_cell_request, freeze_row_request]},
)

CRITICAL: Formulas, Not Hardcoded Values

NEVER write pre-computed aggregated values to Google Sheets. The entire point of a Google Sheet (vs CSV/plot) is that the user wants to do further computation and trace any number back to raw data.

Architecture for data pipelines:

  1. Raw data sheet: write values only (the source of truth)
  2. Helper formula columns in raw data: e.g. =LEFT(A2,10) for date extraction
  3. Aggregation sheets: ALL formulas, zero hardcoded values
    • Column A: =SORT(UNIQUE('Raw Data'!G2:G)) to extract unique periods
    • Other columns: =SUMIFS(...) / =COUNTIFS(...) referencing raw data
    • Cumulative: running sum =G2+F3
    • Guard blanks: =IFERROR(IF(A2="","", <formula>), "")
  4. Charts: reference the formula-driven ranges
# Write formulas with value_input_option="USER_ENTERED" so Sheets evaluates them
ws.update(
    values=[headers, *rows_with_formulas],
    range_name="A1",
    value_input_option="USER_ENTERED",
)

CRITICAL: Cell Formatting (Currency, Dates, etc.)

ALWAYS set proper number formats via batchUpdate. Cells containing monetary values must be formatted as currency. Dates as dates. Counts as integers. Raw numbers with no formatting are unreadable and lose semantic meaning.

Use repeatCell with numberFormat in a batchUpdate request:

# Format columns as USD currency (e.g. columns B and D for deposits/withdrawals)
requests = [{
    "repeatCell": {
        "range": {
            "sheetId": sheet_id,
            "startRowIndex": 1,        # skip header
            "startColumnIndex": col,
            "endColumnIndex": col + 1,
        },
        "cell": {
            "userEnteredFormat": {
                "numberFormat": {"type": "CURRENCY", "pattern": "$#,##0.00"}
            }
        },
        "fields": "userEnteredFormat.numberFormat",
    }
} for col in currency_columns]

# Format columns as integer (e.g. count columns)
requests += [{
    "repeatCell": {
        "range": {
            "sheetId": sheet_id,
            "startRowIndex": 1,
            "startColumnIndex": col,
            "endColumnIndex": col + 1,
        },
        "cell": {
            "userEnteredFormat": {
                "numberFormat": {"type": "NUMBER", "pattern": "#,##0"}
            }
        },
        "fields": "userEnteredFormat.numberFormat",
    }
} for col in count_columns]

Common format patterns:

  • Currency (USD): {"type": "CURRENCY", "pattern": "$#,##0.00"}
  • Currency (EUR): {"type": "CURRENCY", "pattern": "€#,##0.00"}
  • Integer: {"type": "NUMBER", "pattern": "#,##0"}
  • Percentage: {"type": "PERCENT", "pattern": "0.00%"}
  • Date: {"type": "DATE", "pattern": "yyyy-mm-dd"}
  • DateTime: {"type": "DATE_TIME", "pattern": "yyyy-mm-dd hh:mm:ss"}

Pitfalls

  • Grid size: Chart anchorCell must be within the grid. Always allocate extra rows/cols when creating worksheets (e.g. rows=600, cols=10).
  • Sharing: Service account sheets are private by default. Always share after creation.
  • value_input_option: Must be "USER_ENTERED" for formulas to be evaluated. Default "RAW" writes formula text as literal strings.
  • Array formula spill: SORT(UNIQUE(...)) spills into rows below. Leave those cells empty — don't write values or formulas into them.
  • web3.py v7: Use from_block/to_block (snake_case), not fromBlock/toBlock.
  • Public RPCs: https://ethereum-rpc.publicnode.com is more reliable than https://eth.llamarpc.com for batch getLogs. Both have ~25k block range limits.
  • Timestamps: Resolving block timestamps requires one RPC call per block — add retry logic and progress logging for >100 blocks.
Related skills
Installs
1
GitHub Stars
1
First Seen
Mar 28, 2026
Security Audits