pandas-best-practices
SKILL.md
Pandas Best Practices
Expert guidelines for Pandas development, focusing on data manipulation, analysis, and efficient DataFrame operations.
Code Style and Structure
- Write concise, technical responses with accurate Python examples
- Prioritize reproducibility in data analysis workflows
- Use functional programming; avoid unnecessary classes
- Prefer vectorized operations over explicit loops
- Use descriptive variable names reflecting data content
- Follow PEP 8 style guidelines
DataFrame Creation and I/O
- Use
pd.read_csv(),pd.read_excel(),pd.read_json()with appropriate parameters - Specify
dtypeparameter to ensure correct data types on load - Use
parse_datesfor automatic datetime parsing - Set
index_colwhen the data has a natural index column - Use
chunksizefor reading large files incrementally
Data Selection
- Use
.loc[]for label-based indexing - Use
.iloc[]for integer position-based indexing - Avoid chained indexing (e.g.,
df['col'][0]) - use.locor.ilocinstead - Use boolean indexing for conditional selection:
df[df['col'] > value] - Use
.query()method for complex filtering conditions
Method Chaining
- Prefer method chaining for data transformations when possible
- Use
.pipe()for applying custom functions in a chain - Chain operations like
.assign(),.query(),.groupby(),.agg() - Keep chains readable by breaking across multiple lines
Data Cleaning and Validation
Missing Data
- Check for missing data with
.isna()and.info() - Handle missing data appropriately:
.fillna(),.dropna(), or imputation - Use
pd.NAfor nullable integer and boolean types - Document decisions about missing data handling
Data Quality Checks
- Implement data quality checks at the beginning of analysis
- Validate data types with
.dtypesand convert as needed - Check for duplicates with
.duplicated()and handle appropriately - Use
.describe()for quick statistical overview
Type Conversion
- Use
.astype()for explicit type conversion - Use
pd.to_datetime()for date parsing - Use
pd.to_numeric()witherrors='coerce'for safe numeric conversion - Utilize categorical data types for low-cardinality string columns
Grouping and Aggregation
GroupBy Operations
- Use
.groupby()for efficient aggregation operations - Specify aggregation functions with
.agg()for multiple operations - Use named aggregation for clearer output column names
- Consider
.transform()for broadcasting results back to original shape
Pivot Tables and Reshaping
- Use
.pivot_table()for multi-dimensional aggregation - Use
.melt()to convert wide to long format - Use
.pivot()to convert long to wide format - Use
.stack()and.unstack()for hierarchical index manipulation
Performance Optimization
Memory Efficiency
- Use categorical data types for low-cardinality strings
- Downcast numeric types when appropriate
- Use
pd.eval()and.eval()for large expression evaluation
Computation Speed
- Use vectorized operations instead of
.apply()with row-wise functions - Prefer built-in aggregation functions over custom ones
- Use
.valuesor.to_numpy()for NumPy operations when faster
Avoiding Common Pitfalls
- Avoid iterating with
.iterrows()- use vectorized operations - Don't modify DataFrames while iterating
- Be aware of SettingWithCopyWarning - use
.copy()when needed - Avoid growing DataFrames row by row - collect in list and create once
Time Series Operations
- Use
DatetimeIndexfor time series data - Leverage
.resample()for time-based aggregation - Use
.shift()and.diff()for lag operations - Use
.rolling()and.expanding()for window calculations
Merging and Joining
- Use
.merge()for SQL-style joins - Specify
howparameter: 'inner', 'outer', 'left', 'right' - Use
validateparameter to check join cardinality - Use
.concat()for stacking DataFrames
Key Conventions
- Import as
import pandas as pd - Use
snake_casefor column names when possible - Document data sources and transformations
- Keep notebooks reproducible with clear cell execution order
Weekly Installs
108
Repository
mindrally/skillsGitHub Stars
32
First Seen
Jan 25, 2026
Security Audits
Installed on
opencode89
gemini-cli84
codex79
claude-code76
github-copilot73
cursor71