pandas-pro
Pandas Pro
Expert pandas developer specializing in efficient data manipulation, analysis, and transformation workflows with production-grade performance patterns.
Core Workflow
- Assess data structure — Examine dtypes, memory usage, missing values, data quality:
print(df.dtypes) print(df.memory_usage(deep=True).sum() / 1e6, "MB") print(df.isna().sum()) print(df.describe(include="all")) - Design transformation — Plan vectorized operations, avoid loops, identify indexing strategy
- Implement efficiently — Use vectorized methods, method chaining, proper indexing
- Validate results — Check dtypes, shapes, null counts, and row counts:
assert result.shape[0] == expected_rows, f"Row count mismatch: {result.shape[0]}" assert result.isna().sum().sum() == 0, "Unexpected nulls after transform" assert set(result.columns) == expected_cols - Optimize — Profile memory, apply categorical types, use chunking if needed
Reference Guide
Load detailed guidance based on context:
| Topic | Reference | Load When |
|---|---|---|
| DataFrame Operations | references/dataframe-operations.md |
Indexing, selection, filtering, sorting |
| Data Cleaning | references/data-cleaning.md |
Missing values, duplicates, type conversion |
| Aggregation & GroupBy | references/aggregation-groupby.md |
GroupBy, pivot, crosstab, aggregation |
| Merging & Joining | references/merging-joining.md |
Merge, join, concat, combine strategies |
| Performance Optimization | references/performance-optimization.md |
Memory usage, vectorization, chunking |
Code Patterns
Vectorized Operations (before/after)
# ❌ AVOID: row-by-row iteration
for i, row in df.iterrows():
df.at[i, 'tax'] = row['price'] * 0.2
# ✅ USE: vectorized assignment
df['tax'] = df['price'] * 0.2
Safe Subsetting with .copy()
# ❌ AVOID: chained indexing triggers SettingWithCopyWarning
df['A']['B'] = 1
# ✅ USE: .loc[] with explicit copy when mutating a subset
subset = df.loc[df['status'] == 'active', :].copy()
subset['score'] = subset['score'].fillna(0)
GroupBy Aggregation
summary = (
df.groupby(['region', 'category'], observed=True)
.agg(
total_sales=('revenue', 'sum'),
avg_price=('price', 'mean'),
order_count=('order_id', 'nunique'),
)
.reset_index()
)
Merge with Validation
merged = pd.merge(
left_df, right_df,
on=['customer_id', 'date'],
how='left',
validate='m:1', # asserts right key is unique
indicator=True,
)
unmatched = merged[merged['_merge'] != 'both']
print(f"Unmatched rows: {len(unmatched)}")
merged.drop(columns=['_merge'], inplace=True)
Missing Value Handling
# Forward-fill then interpolate numeric gaps
df['price'] = df['price'].ffill().interpolate(method='linear')
# Fill categoricals with mode, numerics with median
for col in df.select_dtypes(include='object'):
df[col] = df[col].fillna(df[col].mode()[0])
for col in df.select_dtypes(include='number'):
df[col] = df[col].fillna(df[col].median())
Time Series Resampling
daily = (
df.set_index('timestamp')
.resample('D')
.agg({'revenue': 'sum', 'sessions': 'count'})
.fillna(0)
)
Pivot Table
pivot = df.pivot_table(
values='revenue',
index='region',
columns='product_line',
aggfunc='sum',
fill_value=0,
margins=True,
)
Memory Optimization
# Downcast numerics and convert low-cardinality strings to categorical
df['category'] = df['category'].astype('category')
df['count'] = pd.to_numeric(df['count'], downcast='integer')
df['score'] = pd.to_numeric(df['score'], downcast='float')
print(df.memory_usage(deep=True).sum() / 1e6, "MB after optimization")
Constraints
MUST DO
- Use vectorized operations instead of loops
- Set appropriate dtypes (categorical for low-cardinality strings)
- Check memory usage with
.memory_usage(deep=True) - Handle missing values explicitly (don't silently drop)
- Use method chaining for readability
- Preserve index integrity through operations
- Validate data quality before and after transformations
- Use
.copy()when modifying subsets to avoid SettingWithCopyWarning
MUST NOT DO
- Iterate over DataFrame rows with
.iterrows()unless absolutely necessary - Use chained indexing (
df['A']['B']) — use.loc[]or.iloc[] - Ignore SettingWithCopyWarning messages
- Load entire large datasets without chunking
- Use deprecated methods (
.ix,.append()— usepd.concat()) - Convert to Python lists for operations possible in pandas
- Assume data is clean without validation
Output Templates
When implementing pandas solutions, provide:
- Code with vectorized operations and proper indexing
- Comments explaining complex transformations
- Memory/performance considerations if dataset is large
- Data validation checks (dtypes, nulls, shapes)
More from midudev/autoskills
bun
Use when building, testing, and deploying JavaScript/TypeScript applications. Reach for Bun when you need to run scripts, manage dependencies, bundle code, or test applications with a single unified tool.
14pydantic
Python data validation using type hints and runtime type checking with Pydantic v2's Rust-powered core for high-performance validation in FastAPI, Django, and configuration management.
11react-hook-form
React Hook Form performance optimization for client-side form validation using useForm, useWatch, useController, and useFieldArray. This skill should be used when building client-side controlled forms with React Hook Form library. This skill does NOT cover React 19 Server Actions, useActionState, or server-side form handling (use react-19 skill for those).
10azure-deploy
Execute Azure deployments for ALREADY-PREPARED applications that have existing .azure/deployment-plan.md and infrastructure files. DO NOT use this skill when the user asks to CREATE a new application — use azure-prepare instead. This skill runs azd up, azd deploy, terraform apply, and az deployment commands with built-in error recovery. Requires .azure/deployment-plan.md from azure-prepare and validated status from azure-validate. WHEN: \"run azd up\", \"run azd deploy\", \"execute deployment\", \"push to production\", \"push to cloud\", \"go live\", \"ship it\", \"bicep deploy\", \"terraform apply\", \"publish to Azure\", \"launch on Azure\". DO NOT USE WHEN: \"create and deploy\", \"build and deploy\", \"create a new app\", \"set up infrastructure\", \"create and deploy to Azure using Terraform\" — use azure-prepare for these.
8sqlalchemy-orm
SQLAlchemy Python SQL toolkit and ORM with powerful query builder, relationship mapping, and database migrations via Alembic
8clerk
Clerk authentication router. Use when user asks about adding authentication,
8