excel-automation
Excel Automation
Create professional Excel files, parse complex financial models, and control Excel on macOS.
Quick Start
# Create a formatted Excel report
uv run --with openpyxl scripts/create_formatted_excel.py output.xlsx
# Parse a complex xlsm that openpyxl can't handle
uv run scripts/parse_complex_excel.py model.xlsm # List sheets
uv run scripts/parse_complex_excel.py model.xlsm "DCF" # Extract a sheet
uv run scripts/parse_complex_excel.py model.xlsm --fix # Fix corrupted names
# Control Excel via AppleScript (with timeout to prevent hangs)
timeout 5 osascript -e 'tell application "Microsoft Excel" to activate'
Overview
Three capabilities:
| Capability | Tool | When to Use |
|---|---|---|
| Create formatted Excel | openpyxl |
Reports, mockups, dashboards |
| Parse complex xlsm/xlsx | zipfile + xml.etree |
Financial models, VBA workbooks, >1MB files |
| Control Excel window | AppleScript (osascript) |
Zoom, scroll, select cells programmatically |
Tool Selection Decision Tree
Is the file simple (data export, no VBA, <1MB)?
├─ YES → openpyxl or pandas
└─ NO
├─ Is it .xlsm or from investment bank / >1MB?
│ └─ YES → zipfile + xml.etree.ElementTree (stdlib)
└─ Is it truly .xls (BIFF format)?
└─ YES → xlrd
Signals of "complex" Excel: file >1MB, .xlsm extension, from investment bank/broker, contains VBA macros.
IMPORTANT: Always run file <path> first — extensions lie. A .xls file may actually be a ZIP-based xlsx.
Creating Excel Files (openpyxl)
Professional Color Convention (Investment Banking Standard)
| Color | RGB Code | Meaning |
|---|---|---|
| Blue | 0000FF |
User input / assumption |
| Black | 000000 |
Calculated value |
| Green | 008000 |
Cross-sheet reference |
| White on dark blue | FFFFFF on 4472C4 |
Section headers |
| Dark blue text | 1F4E79 |
Title |
Core Formatting Patterns
from openpyxl.styles import Font, PatternFill, Border, Side, Alignment
# Fonts
BLUE_FONT = Font(color="0000FF", size=10, name="Calibri")
BLACK_FONT_BOLD = Font(color="000000", size=10, name="Calibri", bold=True)
GREEN_FONT = Font(color="008000", size=10, name="Calibri")
HEADER_FONT = Font(color="FFFFFF", size=12, name="Calibri", bold=True)
# Fills
DARK_BLUE_FILL = PatternFill("solid", fgColor="4472C4")
LIGHT_BLUE_FILL = PatternFill("solid", fgColor="D9E1F2")
INPUT_GREEN_FILL = PatternFill("solid", fgColor="E2EFDA")
LIGHT_GRAY_FILL = PatternFill("solid", fgColor="F2F2F2")
# Borders
THIN_BORDER = Border(bottom=Side(style="thin", color="B2B2B2"))
BOTTOM_DOUBLE = Border(bottom=Side(style="double", color="000000"))
Number Format Codes
| Format | Code | Example |
|---|---|---|
| Currency | '$#,##0' |
$1,234 |
| Currency with decimals | '$#,##0.00' |
$1,234.56 |
| Percentage | '0.0%' |
12.3% |
| Percentage (2 decimal) | '0.00%' |
12.34% |
| Number with commas | '#,##0' |
1,234 |
| Multiplier | '0.0x' |
1.5x |
Conditional Formatting (Sensitivity Tables)
Red-to-green gradient for sensitivity analysis:
from openpyxl.formatting.rule import ColorScaleRule
rule = ColorScaleRule(
start_type="min", start_color="F8696B", # Red (low)
mid_type="percentile", mid_value=50, mid_color="FFEB84", # Yellow (mid)
end_type="max", end_color="63BE7B" # Green (high)
)
ws.conditional_formatting.add(f"B2:F6", rule)
Execution
uv run --with openpyxl scripts/create_formatted_excel.py
Full template script: See scripts/create_formatted_excel.py
Parsing Complex Excel (zipfile + xml)
When openpyxl fails on complex xlsm files (corrupted DefinedNames, complex VBA), use stdlib directly.
XLSX Internal ZIP Structure
file.xlsx (ZIP archive)
├── [Content_Types].xml
├── xl/
│ ├── workbook.xml ← Sheet names + order
│ ├── sharedStrings.xml ← All text values (lookup table)
│ ├── worksheets/
│ │ ├── sheet1.xml ← Cell data for sheet 1
│ │ ├── sheet2.xml ← Cell data for sheet 2
│ │ └── ...
│ └── _rels/
│ └── workbook.xml.rels ← Maps rId → sheetN.xml
└── _rels/.rels
Sheet Name Resolution (Two-Step)
Sheet names in workbook.xml link to physical files via _rels/workbook.xml.rels:
import zipfile
import xml.etree.ElementTree as ET
MAIN_NS = 'http://schemas.openxmlformats.org/spreadsheetml/2006/main'
REL_NS = 'http://schemas.openxmlformats.org/officeDocument/2006/relationships'
RELS_NS = 'http://schemas.openxmlformats.org/package/2006/relationships'
def get_sheet_path(zf, sheet_name):
"""Resolve sheet name to physical XML file path inside ZIP."""
# Step 1: workbook.xml → find rId for the sheet name
wb_xml = ET.fromstring(zf.read('xl/workbook.xml'))
sheets = wb_xml.findall(f'.//{{{MAIN_NS}}}sheet')
rid = None
for s in sheets:
if s.get('name') == sheet_name:
rid = s.get(f'{{{REL_NS}}}id')
break
if not rid:
raise ValueError(f"Sheet '{sheet_name}' not found")
# Step 2: workbook.xml.rels → map rId to file path
rels_xml = ET.fromstring(zf.read('xl/_rels/workbook.xml.rels'))
for rel in rels_xml.findall(f'{{{RELS_NS}}}Relationship'):
if rel.get('Id') == rid:
return 'xl/' + rel.get('Target')
raise ValueError(f"No file mapping for {rid}")
Cell Data Extraction
def extract_cells(zf, sheet_path):
"""Extract all cell values from a sheet XML."""
# Build shared strings lookup
shared = []
try:
ss_xml = ET.fromstring(zf.read('xl/sharedStrings.xml'))
for si in ss_xml.findall(f'{{{MAIN_NS}}}si'):
texts = si.itertext()
shared.append(''.join(texts))
except KeyError:
pass # No shared strings
# Parse sheet cells
sheet_xml = ET.fromstring(zf.read(sheet_path))
rows = sheet_xml.findall(f'.//{{{MAIN_NS}}}row')
data = {}
for row in rows:
for cell in row.findall(f'{{{MAIN_NS}}}c'):
ref = cell.get('r') # e.g., "A1"
cell_type = cell.get('t') # "s" = shared string, None = number
val_el = cell.find(f'{{{MAIN_NS}}}v')
if val_el is not None and val_el.text:
if cell_type == 's':
data[ref] = shared[int(val_el.text)]
else:
try:
data[ref] = float(val_el.text)
except ValueError:
data[ref] = val_el.text
return data
Fixing Corrupted DefinedNames
Investment bank xlsm files often have corrupted <definedName> entries containing "Formula removed":
def fix_defined_names(zf_in_path, zf_out_path):
"""Remove corrupted DefinedNames and repackage."""
import shutil, tempfile
with tempfile.TemporaryDirectory() as tmp:
tmp = Path(tmp)
with zipfile.ZipFile(zf_in_path, 'r') as zf:
zf.extractall(tmp)
wb_xml_path = tmp / 'xl' / 'workbook.xml'
tree = ET.parse(wb_xml_path)
root = tree.getroot()
ns = {'main': MAIN_NS}
defined_names = root.find('.//main:definedNames', ns)
if defined_names is not None:
for name in list(defined_names):
if name.text and "Formula removed" in name.text:
defined_names.remove(name)
tree.write(wb_xml_path, encoding='utf-8', xml_declaration=True)
with zipfile.ZipFile(zf_out_path, 'w', zipfile.ZIP_DEFLATED) as zf:
for fp in tmp.rglob('*'):
if fp.is_file():
zf.write(fp, fp.relative_to(tmp))
Full template script: See scripts/parse_complex_excel.py
Controlling Excel on macOS (AppleScript)
All commands verified on macOS with Microsoft Excel.
Verified Commands
# Activate Excel (bring to front)
osascript -e 'tell application "Microsoft Excel" to activate'
# Open a file
osascript -e 'tell application "Microsoft Excel" to open POSIX file "/path/to/file.xlsx"'
# Set zoom level (percentage)
osascript -e 'tell application "Microsoft Excel"
set zoom of active window to 120
end tell'
# Scroll to specific row
osascript -e 'tell application "Microsoft Excel"
set scroll row of active window to 45
end tell'
# Scroll to specific column
osascript -e 'tell application "Microsoft Excel"
set scroll column of active window to 3
end tell'
# Select a cell range
osascript -e 'tell application "Microsoft Excel"
select range "A1" of active sheet
end tell'
# Select a specific sheet by name
osascript -e 'tell application "Microsoft Excel"
activate object sheet "DCF" of active workbook
end tell'
Timing and Timeout
Always add sleep 1 between AppleScript commands and subsequent operations (e.g., screenshot) to allow UI rendering.
IMPORTANT: osascript will hang indefinitely if Excel is not running or not responding. Always wrap with timeout:
# Safe pattern: 5-second timeout
timeout 5 osascript -e 'tell application "Microsoft Excel" to activate'
# Check exit code: 124 = timed out
if [ $? -eq 124 ]; then
echo "Excel not responding — is it running?"
fi
Common Mistakes
| Mistake | Correction |
|---|---|
| openpyxl fails on complex xlsm → try monkey-patching | Switch to zipfile + xml.etree immediately |
Count Chinese characters with wc -c |
Use wc -m (chars, not bytes; Chinese = 3 bytes/char) |
| Trust file extension | Run file <path> first to confirm actual format |
openpyxl load_workbook hangs on large xlsm |
Use zipfile for targeted extraction instead of loading entire workbook |
Important Notes
- Execute Python scripts with
uv run --with openpyxl(never use system Python) - LibreOffice (
soffice --headless) can convert formats and recalculate formulas - Detailed formatting reference: See
references/formatting-reference.md