gdrive-sheets-compute
Google Drive Sheets Compute
Read PDFs and other files from Google Drive, extract structured data, and fill Google Sheets cells (values or formulas) based on that data.
CLI Location
~/.claude/skills/gdrive-sheets-compute/scripts/gdrive_sheets.py
Running
uv run ~/.claude/skills/gdrive-sheets-compute/scripts/gdrive_sheets.py <command> [options]
Authentication
Uses OAuth2 with a desktop app flow. Credentials are cached locally.
# First-time setup: opens browser for Google OAuth consent
gdrive-sheets auth login
# Check current auth status
gdrive-sheets auth status
# Revoke stored credentials
gdrive-sheets auth logout
Credentials stored in ~/.config/gdrive-sheets-compute/.
Required OAuth Scopes
https://www.googleapis.com/auth/drive.readonlyhttps://www.googleapis.com/auth/spreadsheets
Setting Up Google Cloud Credentials
- Go to Google Cloud Console
- Create or select a project
- Enable Google Drive API and Google Sheets API
- Create OAuth 2.0 Client ID (Desktop application)
- Download the JSON and save as
~/.config/gdrive-sheets-compute/client_secret.json
Commands
Drive Operations
# Search files by name (case-insensitive)
gdrive-sheets drive search budget
gdrive-sheets drive search "tax 2024" --type spreadsheet
gdrive-sheets drive search invoices --type pdf
# List files in a folder
gdrive-sheets drive ls --folder-id <FOLDER_ID>
gdrive-sheets drive ls --folder-id <FOLDER_ID> --mime-type application/pdf
# Download a file (binary) to local path
gdrive-sheets drive download --file-id <FILE_ID> --output /tmp/file.pdf
# Extract text from a PDF on Drive (downloads + extracts)
gdrive-sheets drive extract-pdf --file-id <FILE_ID>
# Extract text from all PDFs in a folder
gdrive-sheets drive extract-pdfs --folder-id <FOLDER_ID>
Sheets Operations
# Read a range from a spreadsheet
gdrive-sheets sheets read --spreadsheet-id <ID> --range 'Sheet1!A1:D10'
# Write values to a range
gdrive-sheets sheets write --spreadsheet-id <ID> --range 'Sheet1!A1' \
--values '[["Name","Amount"],["Alice",100]]'
# Write a formula to a cell
gdrive-sheets sheets write-formula --spreadsheet-id <ID> --range 'Sheet1!B2' \
--formula '=SUM(C2:C10)'
# Batch update multiple ranges
gdrive-sheets sheets batch-write --spreadsheet-id <ID> \
--data '[{"range":"Sheet1!A1","values":[["hello"]]},{"range":"Sheet1!B1","values":[[42]]}]'
# Get spreadsheet metadata (sheet names, grid properties)
gdrive-sheets sheets info --spreadsheet-id <ID>
Pipeline: PDF → Sheets
# Extract data from a PDF and display as structured table
gdrive-sheets pipeline pdf-to-table --file-id <PDF_FILE_ID>
# Extract data from all PDFs in a folder, show summary
gdrive-sheets pipeline folder-summary --folder-id <FOLDER_ID>
Workflow Example
A typical workflow for filling a sheet from Drive PDFs:
- List PDFs:
gdrive-sheets drive ls --folder-id <ID> --mime-type application/pdf - Extract text:
gdrive-sheets drive extract-pdfs --folder-id <ID> - Read current sheet:
gdrive-sheets sheets read --spreadsheet-id <ID> --range 'Sheet1!A:Z' - Compute and write: Use Claude to parse extracted text, compute values, then
gdrive-sheets sheets writeorsheets batch-writeto fill cells
Key Details
- Auth: OAuth2 desktop flow with refresh token persistence
- Config dir:
~/.config/gdrive-sheets-compute/ - PDF extraction: Uses
pymupdf(fitz) for text extraction — handles scanned PDFs poorly; for OCR, pipe throughpytesseract(optional dependency) - Sheets formulas: Use
write-formulafor single formulas, or include formula strings inwrite/batch-writevalues (prefix with=) - Rate limits: Google API quotas apply; batch operations preferred over many single calls
Programmatic Access via gspread (Python)
When writing Python scripts that create/populate Google Sheets (e.g. with uv run),
use the gspread library with the existing service account:
Auth
import gspread
# Service account key at this fixed path
gc = gspread.service_account(
filename="/Users/clementwalter/.config/gdrive-sheets-compute/service_account.json"
)
- Service account email:
booking-preferences@rainbot.iam.gserviceaccount.com - Sheets created by the SA are private — always call
spreadsheet.share("", perm_type="anyone", role="writer")to make them accessible.
gws CLI status
The gws CLI (~/.nvm/versions/node/v24.11.0/bin/gws) is installed but NOT authenticated.
It requires gws auth setup with a GCP OAuth client. For programmatic use, prefer gspread
with the service account above.
Creating sheets with charts
Key patterns discovered:
# Create spreadsheet
spreadsheet = gc.create("Title")
# Rename default sheet, add extra sheets with enough room for chart overlays
ws = spreadsheet.sheet1
ws.update_title("Data")
ws_agg = spreadsheet.add_worksheet(title="Aggregated", rows=num_rows + 50, cols=15)
# Write data
ws.update(values=[headers, *rows], range_name="A1")
# Add chart via batchUpdate (gspread doesn't have a chart helper)
gc.http_client.request(
"post",
f"https://sheets.googleapis.com/v4/spreadsheets/{spreadsheet.id}:batchUpdate",
json={"requests": [chart_request]},
)
# Format headers (bold, freeze)
gc.http_client.request(
"post",
f"https://sheets.googleapis.com/v4/spreadsheets/{spreadsheet.id}:batchUpdate",
json={"requests": [repeat_cell_request, freeze_row_request]},
)
CRITICAL: Formulas, Not Hardcoded Values
NEVER write pre-computed aggregated values to Google Sheets. The entire point of a Google Sheet (vs CSV/plot) is that the user wants to do further computation and trace any number back to raw data.
Architecture for data pipelines:
- Raw data sheet: write values only (the source of truth)
- Helper formula columns in raw data: e.g.
=LEFT(A2,10)for date extraction - Aggregation sheets: ALL formulas, zero hardcoded values
- Column A:
=SORT(UNIQUE('Raw Data'!G2:G))to extract unique periods - Other columns:
=SUMIFS(...)/=COUNTIFS(...)referencing raw data - Cumulative: running sum
=G2+F3 - Guard blanks:
=IFERROR(IF(A2="","", <formula>), "")
- Column A:
- Charts: reference the formula-driven ranges
# Write formulas with value_input_option="USER_ENTERED" so Sheets evaluates them
ws.update(
values=[headers, *rows_with_formulas],
range_name="A1",
value_input_option="USER_ENTERED",
)
CRITICAL: Cell Formatting (Currency, Dates, etc.)
ALWAYS set proper number formats via batchUpdate. Cells containing monetary values must be formatted as currency. Dates as dates. Counts as integers. Raw numbers with no formatting are unreadable and lose semantic meaning.
Use repeatCell with numberFormat in a batchUpdate request:
# Format columns as USD currency (e.g. columns B and D for deposits/withdrawals)
requests = [{
"repeatCell": {
"range": {
"sheetId": sheet_id,
"startRowIndex": 1, # skip header
"startColumnIndex": col,
"endColumnIndex": col + 1,
},
"cell": {
"userEnteredFormat": {
"numberFormat": {"type": "CURRENCY", "pattern": "$#,##0.00"}
}
},
"fields": "userEnteredFormat.numberFormat",
}
} for col in currency_columns]
# Format columns as integer (e.g. count columns)
requests += [{
"repeatCell": {
"range": {
"sheetId": sheet_id,
"startRowIndex": 1,
"startColumnIndex": col,
"endColumnIndex": col + 1,
},
"cell": {
"userEnteredFormat": {
"numberFormat": {"type": "NUMBER", "pattern": "#,##0"}
}
},
"fields": "userEnteredFormat.numberFormat",
}
} for col in count_columns]
Common format patterns:
- Currency (USD):
{"type": "CURRENCY", "pattern": "$#,##0.00"} - Currency (EUR):
{"type": "CURRENCY", "pattern": "€#,##0.00"} - Integer:
{"type": "NUMBER", "pattern": "#,##0"} - Percentage:
{"type": "PERCENT", "pattern": "0.00%"} - Date:
{"type": "DATE", "pattern": "yyyy-mm-dd"} - DateTime:
{"type": "DATE_TIME", "pattern": "yyyy-mm-dd hh:mm:ss"}
Pitfalls
- Grid size: Chart
anchorCellmust be within the grid. Always allocate extra rows/cols when creating worksheets (e.g.rows=600, cols=10). - Sharing: Service account sheets are private by default. Always share after creation.
- value_input_option: Must be
"USER_ENTERED"for formulas to be evaluated. Default"RAW"writes formula text as literal strings. - Array formula spill:
SORT(UNIQUE(...))spills into rows below. Leave those cells empty — don't write values or formulas into them. - web3.py v7: Use
from_block/to_block(snake_case), notfromBlock/toBlock. - Public RPCs:
https://ethereum-rpc.publicnode.comis more reliable thanhttps://eth.llamarpc.comfor batch getLogs. Both have ~25k block range limits. - Timestamps: Resolving block timestamps requires one RPC call per block — add retry logic and progress logging for >100 blocks.
More from clementwalter/claudine
scaleway deployment
This skill should be used when the user asks about "scaleway", "deploy to
17atlas
macOS-only AppleScript control for the ChatGPT Atlas desktop app. Use only
2pdf
Use when tasks involve reading, creating, or reviewing PDF files where
2share
Convert browser-saved HTML files or URLs into self-contained, lightweight,
2test
Dummy skill for testing hooks. Use when user says "test skill", "test hooks",
2slack-user-cli
Read and write Slack channels, DMs, threads, and search from the terminal
2