pandas

SKILL.md

Pandas

Data analysis and manipulation library for Python.

When to Use

  • Data cleaning and preprocessing
  • Exploratory data analysis
  • CSV/Excel file processing
  • Data transformation pipelines

Quick Start

import pandas as pd

# Read data
df = pd.read_csv('data.csv')

# Basic operations
df.head()
df.info()
df.describe()

# Filtering
active_users = df[df['status'] == 'active']

Core Concepts

DataFrame Operations

# Create DataFrame
df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 30, 35],
    'city': ['NYC', 'LA', 'Chicago']
})

# Selection
df['name']                    # Single column
df[['name', 'age']]          # Multiple columns
df.loc[0]                     # Row by label
df.iloc[0:2]                  # Rows by position

# Filtering
df[df['age'] > 25]
df.query('age > 25 and city == "NYC"')

Data Manipulation

# Add/modify columns
df['age_group'] = df['age'].apply(lambda x: 'young' if x < 30 else 'adult')
df['full_name'] = df['first_name'] + ' ' + df['last_name']

# Grouping
df.groupby('city')['sales'].sum()
df.groupby(['city', 'year']).agg({
    'sales': 'sum',
    'orders': 'count',
    'price': 'mean'
})

# Pivot tables
pd.pivot_table(df, values='sales', index='city', columns='year', aggfunc='sum')

Common Patterns

Data Cleaning

# Handle missing values
df.dropna()
df.fillna(0)
df['column'].fillna(df['column'].median(), inplace=True)

# Remove duplicates
df.drop_duplicates(subset=['email'])

# Type conversion
df['date'] = pd.to_datetime(df['date'])
df['price'] = pd.to_numeric(df['price'], errors='coerce')

Merging DataFrames

# Merge (SQL-like join)
merged = pd.merge(orders, customers, on='customer_id', how='left')

# Concat
combined = pd.concat([df1, df2], ignore_index=True)

Best Practices

Do:

  • Use vectorized operations
  • Chain methods for readability
  • Use query() for complex filters
  • Set appropriate dtypes

Don't:

  • Iterate with for loops
  • Modify during iteration
  • Use inplace=True in chains
  • Ignore memory usage

Troubleshooting

Issue Cause Solution
Memory error Large dataset Use chunks or dask
SettingWithCopy Chained assignment Use .loc[]
Slow operation Not vectorized Use apply or numpy

References

Weekly Installs
2
GitHub Stars
7
First Seen
Feb 10, 2026
Installed on
mcpjam2
claude-code2
replit2
junie2
windsurf2
zencoder2