Excel (XLSX) Processing Skill

When working with Excel spreadsheets, follow these guidelines:

1. Reading & Extracting from XLSX

Extract to CSV:

# Using ssconvert (from gnumeric)
ssconvert input.xlsx output.csv

# Specific sheet
ssconvert -S input.xlsx output.csv

# Using LibreOffice
libreoffice --headless --convert-to csv input.xlsx

Read with Python (recommended):

import openpyxl

# Load workbook
wb = openpyxl.load_workbook('data.xlsx')

# Get sheet
sheet = wb.active
# or by name: sheet = wb['Sheet1']

# Read all data
for row in sheet.iter_rows(values_only=True):
    print(row)

# Read specific cell
value = sheet['A1'].value

# Read range
for row in sheet['A1:C10']:
    for cell in row:
        print(cell.value)

Using pandas:

import pandas as pd

# Read entire sheet
df = pd.read_excel('data.xlsx')

# Read specific sheet
df = pd.read_excel('data.xlsx', sheet_name='Sales')

# Read all sheets
dfs = pd.read_excel('data.xlsx', sheet_name=None)

# Read with specific columns
df = pd.read_excel('data.xlsx', usecols=['Name', 'Age', 'City'])

# Skip rows
df = pd.read_excel('data.xlsx', skiprows=2)

2. Creating XLSX Files

From CSV:

# Using ssconvert
ssconvert input.csv output.xlsx

# Using LibreOffice
libreoffice --headless --convert-to xlsx input.csv

Using Python (openpyxl):

import openpyxl
from openpyxl.styles import Font, PatternFill, Alignment

# Create new workbook
wb = openpyxl.Workbook()
sheet = wb.active
sheet.title = 'Data'

# Write headers
headers = ['Name', 'Age', 'City']
sheet.append(headers)

# Style headers
for cell in sheet[1]:
    cell.font = Font(bold=True)
    cell.fill = PatternFill(start_color='366092', fill_type='solid')
    cell.font = Font(color='FFFFFF', bold=True)

# Add data
data = [
    ['John Doe', 30, 'New York'],
    ['Jane Smith', 25, 'London'],
    ['Bob Johnson', 35, 'Paris']
]

for row in data:
    sheet.append(row)

# Auto-adjust column width
for column in sheet.columns:
    max_length = 0
    column_letter = column[0].column_letter
    for cell in column:
        if cell.value:
            max_length = max(max_length, len(str(cell.value)))
    sheet.column_dimensions[column_letter].width = max_length + 2

# Save
wb.save('output.xlsx')

Using pandas:

import pandas as pd

# From DataFrame
df = pd.DataFrame({
    'Name': ['John', 'Jane', 'Bob'],
    'Age': [30, 25, 35],
    'City': ['New York', 'London', 'Paris']
})

# Save to Excel
df.to_excel('output.xlsx', index=False, sheet_name='Data')

# Multiple sheets
with pd.ExcelWriter('output.xlsx') as writer:
    df1.to_excel(writer, sheet_name='Sheet1', index=False)
    df2.to_excel(writer, sheet_name='Sheet2', index=False)

3. Converting Excel Files

XLSX to CSV:

# All sheets to separate CSV files
ssconvert -S input.xlsx output.csv

# Specific sheet
libreoffice --headless --convert-to csv:"Text - txt - csv (StarCalc)":44,34,UTF8 input.xlsx

XLSX to JSON:

import pandas as pd
import json

# Read Excel
df = pd.read_excel('data.xlsx')

# Convert to JSON
json_data = df.to_json(orient='records', indent=2)

# Save to file
with open('output.json', 'w') as f:
    f.write(json_data)

XLSX to PDF:

# Using LibreOffice
libreoffice --headless --convert-to pdf input.xlsx

XLS to XLSX:

# Using LibreOffice
libreoffice --headless --convert-to xlsx input.xls

4. Data Analysis

Basic statistics:

import pandas as pd

df = pd.read_excel('data.xlsx')

# Summary statistics
print(df.describe())

# Column info
print(df.info())

# Value counts
print(df['Category'].value_counts())

# Group by and aggregate
summary = df.groupby('Category').agg({
    'Sales': ['sum', 'mean', 'count'],
    'Profit': 'sum'
})
print(summary)

Find duplicates:

import pandas as pd

df = pd.read_excel('data.xlsx')

# Find duplicate rows
duplicates = df[df.duplicated()]
print(f"Found {len(duplicates)} duplicates")

# Find duplicates based on specific columns
duplicates = df[df.duplicated(subset=['Email'], keep=False)]

Data validation:

import pandas as pd

df = pd.read_excel('data.xlsx')

# Check for missing values
missing = df.isnull().sum()
print("Missing values per column:")
print(missing[missing > 0])

# Check data types
print(df.dtypes)

# Find outliers (using IQR method)
Q1 = df['Sales'].quantile(0.25)
Q3 = df['Sales'].quantile(0.75)
IQR = Q3 - Q1
outliers = df[(df['Sales'] < Q1 - 1.5*IQR) | (df['Sales'] > Q3 + 1.5*IQR)]

5. Formulas & Calculations

Add formulas:

import openpyxl

wb = openpyxl.load_workbook('data.xlsx')
sheet = wb.active

# Add SUM formula
sheet['D10'] = '=SUM(D2:D9)'

# Add AVERAGE
sheet['D11'] = '=AVERAGE(D2:D9)'

# Add IF statement
sheet['E2'] = '=IF(D2>100,"High","Low")'

# Add VLOOKUP
sheet['F2'] = '=VLOOKUP(A2,Sheet2!A:B,2,FALSE)'

wb.save('data.xlsx')

6. Formatting & Styling

Cell formatting:

import openpyxl
from openpyxl.styles import Font, PatternFill, Border, Side, Alignment
from openpyxl.styles.numbers import FORMAT_NUMBER_00, FORMAT_PERCENTAGE_00

wb = openpyxl.load_workbook('data.xlsx')
sheet = wb.active

# Font styling
cell = sheet['A1']
cell.font = Font(name='Arial', size=14, bold=True, color='FF0000')

# Background color
cell.fill = PatternFill(start_color='FFFF00', fill_type='solid')

# Borders
border = Border(
    left=Side(style='thin'),
    right=Side(style='thin'),
    top=Side(style='thin'),
    bottom=Side(style='thin')
)
cell.border = border

# Alignment
cell.alignment = Alignment(horizontal='center', vertical='center')

# Number format
sheet['B2'].number_format = FORMAT_NUMBER_00  # 1234.00
sheet['C2'].number_format = FORMAT_PERCENTAGE_00  # 50.00%
sheet['D2'].number_format = 'mm/dd/yyyy'  # Date

wb.save('data.xlsx')

Conditional formatting:

from openpyxl.formatting.rule import CellIsRule, ColorScaleRule, DataBarRule
from openpyxl.styles import PatternFill

# Highlight cells > 100
red_fill = PatternFill(start_color='FFC7CE', fill_type='solid')
rule = CellIsRule(operator='greaterThan', formula=['100'], fill=red_fill)
sheet.conditional_formatting.add('D2:D100', rule)

# Color scale (red to green)
color_scale = ColorScaleRule(
    start_type='min', start_color='F8696B',
    mid_type='percentile', mid_value=50, mid_color='FFEB84',
    end_type='max', end_color='63BE7B'
)
sheet.conditional_formatting.add('D2:D100', color_scale)

# Data bars
data_bar = DataBarRule(start_type='min', end_type='max', color='638EC6')
sheet.conditional_formatting.add('D2:D100', data_bar)

7. Charts & Visualizations

Create charts:

from openpyxl.chart import BarChart, LineChart, PieChart, Reference

wb = openpyxl.load_workbook('data.xlsx')
sheet = wb.active

# Bar chart
chart = BarChart()
chart.title = 'Sales by Category'
chart.x_axis.title = 'Category'
chart.y_axis.title = 'Sales'

data = Reference(sheet, min_col=2, min_row=1, max_row=10)
categories = Reference(sheet, min_col=1, min_row=2, max_row=10)

chart.add_data(data, titles_from_data=True)
chart.set_categories(categories)

sheet.add_chart(chart, 'E5')

# Line chart
line_chart = LineChart()
line_chart.title = 'Trend Over Time'
line_chart.add_data(data, titles_from_data=True)
line_chart.set_categories(categories)

# Pie chart
pie = PieChart()
pie.title = 'Market Share'
pie.add_data(data, titles_from_data=True)
pie.set_categories(categories)

wb.save('data.xlsx')

8. Working with Multiple Sheets

Manage sheets:

import openpyxl

wb = openpyxl.load_workbook('data.xlsx')

# Create new sheet
new_sheet = wb.create_sheet('Summary')

# List all sheets
print(wb.sheetnames)

# Access specific sheet
sheet = wb['Sheet1']

# Copy sheet
source = wb['Sheet1']
target = wb.copy_worksheet(source)
target.title = 'Sheet1_Copy'

# Delete sheet
del wb['OldSheet']

# Reorder sheets
wb.move_sheet('Summary', offset=-1)

wb.save('data.xlsx')

9. Common Workflows

Merge multiple Excel files

import pandas as pd
import glob

# Read all Excel files
files = glob.glob('data/*.xlsx')

dfs = []
for file in files:
    df = pd.read_excel(file)
    dfs.append(df)

# Combine
combined = pd.concat(dfs, ignore_index=True)

# Save
combined.to_excel('merged.xlsx', index=False)

Create pivot table

import pandas as pd

df = pd.read_excel('sales.xlsx')

# Create pivot table
pivot = pd.pivot_table(
    df,
    values='Sales',
    index='Category',
    columns='Region',
    aggfunc='sum',
    margins=True
)

pivot.to_excel('pivot.xlsx')

Generate report from template

import openpyxl
from datetime import datetime

# Load template
wb = openpyxl.load_workbook('template.xlsx')
sheet = wb.active

# Fill in data
sheet['B2'] = datetime.now().strftime('%Y-%m-%d')
sheet['B3'] = 'Q1 2026'
sheet['B4'] = 'Sales Department'

# Add calculated values
sheet['D10'] = '=SUM(D2:D9)'

# Save with new name
wb.save(f'report_{datetime.now().strftime("%Y%m%d")}.xlsx')

Tools Required

Install necessary tools:

Linux (Ubuntu/Debian):

sudo apt-get install python3-pip libreoffice gnumeric
pip3 install openpyxl pandas xlrd xlsxwriter

macOS:

brew install libreoffice
pip3 install openpyxl pandas xlrd xlsxwriter

Windows:

pip install openpyxl pandas xlrd xlsxwriter
# Install LibreOffice manually

Security Notes

✅ Validate file paths and extensions
✅ Check file sizes before loading
✅ Sanitize user input in formulas (prevent formula injection)
✅ Be cautious with macro-enabled files (.xlsm)
✅ Validate data types and ranges
✅ Don't execute external links automatically
✅ Use read-only mode when just analyzing

When to Use This Skill

Use /xlsx when the user:

Wants to read data from Excel files
Needs to create spreadsheets from data
Wants to convert between Excel/CSV/JSON formats
Asks to analyze or validate spreadsheet data
Needs to generate reports from templates
Wants to add formulas or formatting
Asks to merge or split Excel files
Needs to create charts or pivot tables

Always preview changes before saving over existing files.