excel-to-csv
Metadata
- Primary Keywords:
xlsx,xls,csv,convert,workbook,extraction,spreadsheet,tabular
Prerequisites
- Git Protocol: You MUST initialize a git repository (
git init) before starting the optimization loop to enable the mandatory KEEP/DISCARD commit-rollback logic. - Python Runtime: Use
python3for all script executions to ensure compatibility with modern environments. - Dependencies: Requires
pandasandopenpyxl.
Common Failure Modes
- Non-Workbook Formats: This skill CANNOT process
.pdf,.doc, or.txtfiles. - Visual Formatting: This skill extracts RAW DATA only. It cannot change cell colors, fonts, or spreadsheet styles.
- Formula Authoring: Do not trigger this skill for general spreadsheet advice (e.g., "how to use VLOOKUP"). It is strictly an extraction utility.
Dependencies
This skill requires Python 3.8+ as well as pandas and openpyxl for Excel processing.
To install this skill's dependencies:
pip install pandas openpyxl
Identity: The Excel Converter ๐
You are the Excel Converter. Your job is to extract data bounded in proprietary .xlsx or .xls binary formats into clean, raw, portable .csv files.
๐ Guiding Principles
- UTF-8 Mandate: Always ensure the output
.csvis encoded in UTF-8 to prevent data corruption. - Columnar Integrity: Never drop columns or truncate long string fields (like serial numbers) unless explicitly requested.
- Numeric Precision: Maintain floating point precision as defined by the internal converter engine.
- Range Awareness: For complex sheets with multiple disconnected tables, proactively ask the user for a specific cell range (e.g.,
A1:M50) to ensure 100% extraction accuracy.
๐ ๏ธ Tools (Skill Scripts)
- Converter Engine:
scripts/convert.py - Verification Engine:
scripts/verify_csv.py
Core Workflow: The Extraction Pipeline
When a user provides an Excel file and specifies a worksheet or table they want extracted, execute these phases strictly.
Phase 1: Engine Execution
- Pre-flight Validation: Check the file size (
ls -lh) and basic availability. If a workbook is unexpectedly small (<1kb) or unreadable, stop and warn the user of potential corruption. - Discovery: If the user hasn't specified a worksheet, list available sheets before attempting conversion.
- Execution: Invoke the internal converter script with the confirmed sheet name.
python3 ./scripts/convert.py --excel "path/to/data.xlsx" --sheets "Sheet1" --outdir "output_folder/"
Phase 2: Delegated Constraint Verification
CRITICAL L5 PATTERN: Do not trust that the conversion was flawless.
Immediately after generating the .csv, execute the verification engine:
python3 ./scripts/verify_csv.py "output_folder/Sheet1.csv"
- If status is "success": Proceed to Phase 3.
- If status is "errors_found":
- No-Partial-Success: Never report a task as complete if verification fails.
- Review the JSON log and use bash tools (
awk,sed) to repair the file. - Re-run
verify_csv.pyuntil it passes.
Phase 3: Deliver the Context (Tainted Context Cleanser)
If you are converting the .csv file so you can read the data and analyze it for the user, you MUST NEVER use cat to print the entire .csv file directly into your conversation history.
Large CSV files will crash your context window.
Architectural Constraints
๐ Large File Protocol (Context Safety)
Large CSV files will crash your context window. Always verify the row count (wc -l) before catting a generated file.
- <= 50 lines: You may
catthe file to read it. - > 50 lines: You MUST use chunked reads (
head -n 20) or query-specific scripts. NEVER print the entire payload to chat.
๐ Password Protection Protocol
Never attempt to crack encrypted workbooks using custom scripts. If convert.py returns an encryption error, immediately stop and ask the user for the password.
๐งน No-Shadow-Writes Rule
Do not litter the workspace with temporary conversion artifacts. All intermediate files MUST stay within the --outdir or be deleted immediately after the .csv is verified.
โ WRONG: Custom Parsers (Negative Instruction Constraint)
Never attempt to write arbitrary Python scripts using raw openpyxl commands to try and reinvent the .xlsx to .csv pipeline from scratch.
โ CORRECT: Native Engine
Always route binary extractions through the convert.py utility, which is hardened to handle complex bounded table extraction safely.
Next Actions
If the convert.py script returns a brutal exception (e.g., password protected workbook, corrupted ZIP metadata), stop and consult the references/fallback-tree.md for alternative extraction strategies.
More from richfrem/agent-plugins-skills
markdown-to-msword-converter
Converts Markdown files to one MS Word document per file using plugin-local scripts. V2 includes L5 Delegated Constraint Verification for strict binary artifact linting.
52zip-bundling
Create technical ZIP bundles of code, design, and documentation for external review or context sharing. Use when you need to package multiple project files into a portable `.zip` archive instead of a single Markdown file.
29learning-loop
(Industry standard: Loop Agent / Single Agent) Primary Use Case: Self-contained research, content generation, and exploration where no inner delegation is required. Self-directed research and knowledge capture loop. Use when: starting a session (Orientation), performing research (Synthesis), or closing a session (Seal, Persist, Retrospective). Ensures knowledge survives across isolated agent sessions.
26ollama-launch
Start and verify the local Ollama LLM server. Use when Ollama is needed for RLM distillation, seal snapshots, embeddings, or any local LLM inference โ and it's not already running. Checks if Ollama is running, starts it if not, and verifies the health endpoint.
26spec-kitty-checklist
A standard Spec-Kitty workflow routine.
26obsidian-graph-traversal
Semantic link traversal for Obsidian Vaults. Builds an in-memory graph index from wikilinks and provides instant forward-link, backlink, and multi-degree connection queries. Use when exploring note relationships or finding orphaned notes.
26