data-analysis
SKILL.md
Data Analysis Skill
Transform raw data into actionable insights. This skill helps you explore datasets, identify patterns, create visualizations, and generate statistical reports.
Purpose
This skill enables you to:
- Load and explore datasets of various formats (CSV, JSON, Parquet)
- Perform exploratory data analysis (EDA)
- Create statistical summaries and distributions
- Generate data visualizations and charts
- Identify correlations and trends
- Detect anomalies and outliers
- Build predictive models
- Export analysis reports
When to Use
Use this skill when you need to:
- Understand a new dataset
- Find trends and patterns in data
- Create reports with visualizations
- Identify data quality issues
- Compare groups or time periods
- Forecast future values
- Build summary dashboards
- Share insights with stakeholders
Key Features
- EDA Tools - Automated exploratory analysis
- Visualizations - Charts, graphs, and heatmaps
- Statistical Analysis - Descriptive stats, hypothesis testing, correlation
- Data Cleaning - Handle missing values, outliers, duplicates
- Time Series - Seasonal decomposition and forecasting
- Machine Learning - Clustering, classification, regression
- Reports - Professional analysis documents with code
- Export Options - Save to HTML, PDF, or interactive dashboards
Instructions
When using this skill:
- Load Data - Provide dataset path or CSV/JSON content
- Explore - Generate summary statistics and visualizations
- Analyze - Identify patterns, trends, and relationships
- Validate - Check data quality and handle issues
- Visualize - Create meaningful charts and graphs
- Model - Build predictive models if needed
- Report - Document findings and recommendations
Guidelines
- Start Simple: Begin with univariate analysis before multivariate
- Visualize First: Always look at the data before statistics
- Question Assumptions: Don't assume patterns are significant
- Document Methods: Explain your analytical approach
- Consider Context: Interpret results within business context
- Validate Results: Confirm findings with domain experts
- Communicate Clearly: Use simple language and visual metaphors
Examples
Example 1: Customer Purchase Analysis
Dataset: Customer transactions with 10,000 records
Analysis Steps:
- Load purchase data (date, customer_id, amount, category)
- Calculate summary statistics (total spend, average order value)
- Visualize purchase distribution by category
- Analyze seasonal trends
- Identify top customers
- Detect purchase anomalies
Output:
# Customer Analysis Report
## Summary Statistics
- Total Revenue: $2.5M
- Average Order Value: $125
- Number of Customers: 3,450
- Date Range: 2023-01-01 to 2024-01-15
## Key Findings
1. Electronics category drives 42% of revenue
2. Top 20% of customers generate 80% of revenue (Pareto principle)
3. Strong seasonal pattern with peak in Q4
4. Average customer lifetime value: $1,200
## Recommendations
- Focus retention efforts on high-value customers
- Increase inventory for Q4 seasonal demand
- Cross-sell opportunities in Electronics + Home categories
Example 2: Website Traffic Analysis
Dataset: Daily pageviews, bounce rate, session duration
Key Metrics Analyzed:
- Traffic trends over time
- Device type distribution
- Top pages and conversion rates
- User behavior funnels
- Mobile vs. desktop comparison
Visualizations Generated:
- Line chart: Daily pageviews over 12 months
- Bar chart: Traffic by device type
- Funnel chart: User conversion flow
- Heatmap: Day/hour traffic patterns
Analysis Patterns
| Scenario | Analysis Type | Key Metrics |
|---|---|---|
| Sales Data | Trend & Seasonal | Growth rate, Seasonality index |
| Customer Data | Segmentation | RFM score, Cohort analysis |
| Website Data | Behavior | Bounce rate, Conversion funnel |
| Time Series | Forecasting | Trend, Seasonality, Residuals |
| A/B Testing | Hypothesis Test | P-value, Effect size |
Tools and Libraries
This skill uses:
- pandas - Data manipulation and analysis
- numpy - Numerical computations
- matplotlib/seaborn - Visualizations
- scipy - Statistical tests
- scikit-learn - Machine learning
- plotly - Interactive visualizations
Data Quality Checks
The skill automatically:
- Identifies missing values
- Detects duplicate records
- Flags outliers
- Validates data types
- Checks for referential integrity
- Reports data completeness
Common Analyses
Descriptive Analysis
- Data summaries
- Distribution analysis
- Correlation matrices
- Group comparisons
Predictive Analysis
- Trend forecasting
- Anomaly detection
- Classification models
- Regression models
Diagnostic Analysis
- Root cause analysis
- Cohort analysis
- Segmentation
- Attribution modeling
Related Resources
- Data Analysis Best Practices
- Python Data Science Cheatsheet
- Visualization Gallery
- Sample Datasets
- Analysis Scripts
Support
For data analysis help:
- Review the examples above
- Check sample datasets in
assets/examples/datasets/ - Use helper scripts in
scripts/ - Consult the detailed guide in
references/
Weekly Installs
5
Repository
wesley1600/clau…rameworkFirst Seen
Jan 30, 2026
Security Audits
Installed on
opencode5
gemini-cli4
github-copilot4
codex4
kimi-cli4
cursor4