data-analysis
Data Analysis
When to use this skill
- Data exploration: Understand a new dataset
- Report generation: Derive data-driven insights
- Quality validation: Check data consistency
- Decision support: Make data-driven recommendations
Instructions
Step 1: Load and explore data
Python (Pandas):
import pandas as pd
import numpy as np
# Load CSV
df = pd.read_csv('data.csv')
# Basic info
print(df.info())
print(df.describe())
print(df.head(10))
# Check missing values
print(df.isnull().sum())
# Data types
print(df.dtypes)
SQL:
-- Inspect table schema
DESCRIBE table_name;
-- Sample data
SELECT * FROM table_name LIMIT 10;
-- Basic stats
SELECT
COUNT(*) as total_rows,
COUNT(DISTINCT column_name) as unique_values,
MIN(numeric_column) as min_val,
MAX(numeric_column) as max_val,
AVG(numeric_column) as avg_val
FROM table_name;
Step 2: Data cleaning
# Handle missing values
df['column'].fillna(df['column'].mean(), inplace=True)
df.dropna(subset=['required_column'], inplace=True)
# Remove duplicates
df.drop_duplicates(inplace=True)
# Type conversions
df['date'] = pd.to_datetime(df['date'])
df['category'] = df['category'].astype('category')
# Remove outliers (IQR method)
Q1 = df['value'].quantile(0.25)
Q3 = df['value'].quantile(0.75)
IQR = Q3 - Q1
df = df[(df['value'] >= Q1 - 1.5*IQR) & (df['value'] <= Q3 + 1.5*IQR)]
Step 3: Statistical analysis
# Descriptive statistics
print(df['numeric_column'].describe())
# Grouped analysis
grouped = df.groupby('category').agg({
'value': ['mean', 'sum', 'count'],
'other': 'nunique'
})
print(grouped)
# Correlation
correlation = df[['col1', 'col2', 'col3']].corr()
print(correlation)
# Pivot table
pivot = pd.pivot_table(df,
values='sales',
index='region',
columns='month',
aggfunc='sum'
)
Step 4: Visualization
import matplotlib.pyplot as plt
import seaborn as sns
# Histogram
plt.figure(figsize=(10, 6))
df['value'].hist(bins=30)
plt.title('Distribution of Values')
plt.savefig('histogram.png')
# Boxplot
plt.figure(figsize=(10, 6))
sns.boxplot(x='category', y='value', data=df)
plt.title('Value by Category')
plt.savefig('boxplot.png')
# Heatmap (correlation)
plt.figure(figsize=(10, 8))
sns.heatmap(correlation, annot=True, cmap='coolwarm')
plt.title('Correlation Matrix')
plt.savefig('heatmap.png')
# Time series
plt.figure(figsize=(12, 6))
df.groupby('date')['value'].sum().plot()
plt.title('Time Series of Values')
plt.savefig('timeseries.png')
Step 5: Derive insights
# Top/bottom analysis
top_10 = df.nlargest(10, 'value')
bottom_10 = df.nsmallest(10, 'value')
# Trend analysis
df['month'] = df['date'].dt.to_period('M')
monthly_trend = df.groupby('month')['value'].sum()
growth = monthly_trend.pct_change() * 100
# Segment analysis
segments = df.groupby('segment').agg({
'revenue': 'sum',
'customers': 'nunique',
'orders': 'count'
})
segments['avg_order_value'] = segments['revenue'] / segments['orders']
Output format
Analysis report structure
# Data Analysis Report
## 1. Dataset overview
- Dataset: [name]
- Records: X,XXX
- Columns: XX
- Date range: YYYY-MM-DD ~ YYYY-MM-DD
## 2. Key findings
- Insight 1
- Insight 2
- Insight 3
## 3. Statistical summary
| Metric | Value |
|------|-----|
| Mean | X.XX |
| Median | X.XX |
| Std dev | X.XX |
## 4. Recommendations
1. [Recommendation 1]
2. [Recommendation 2]
Best practices
- Understand the data first: Learn structure and meaning before analysis
- Incremental analysis: Move from simple to complex analyses
- Use visualization: Use a variety of charts to spot patterns
- Validate assumptions: Always verify assumptions about the data
- Reproducibility: Document analysis code and results
Constraints
Required rules (MUST)
- Preserve raw data (work on a copy)
- Document the analysis process
- Validate results
Prohibited (MUST NOT)
- Do not expose sensitive personal data
- Do not draw unsupported conclusions
References
Examples
Example 1: Basic usage
Example 2: Advanced usage
Quick Start
Unity3D 성능 데이터 분석 시나리오:
import pandas as pd
import matplotlib.pyplot as plt
# 1. Unity Profiler CSV 데이터 로드
df = pd.read_csv('unity_profiler_output.csv')
# 2. 프레임 타임 분석
print(df[['Frame', 'CPU', 'GPU', 'Memory']].describe())
# 3. 병목 구간 탐지 (16.67ms = 60fps 기준)
slow_frames = df[df['CPU'] > 16.67]
print(f"드랍 프레임 수: {len(slow_frames)} / {len(df)}")
# 4. 가장 무거운 함수 Top 10
top_functions = df.groupby('FunctionName')['CPUTime'].sum()
top_functions = top_functions.nlargest(10)
print(top_functions)
# 5. 메모리 추세 시각화
plt.figure(figsize=(12, 6))
df.groupby('Frame')['Memory'].mean().plot()
plt.title('Unity Memory Usage Over Time')
plt.xlabel('Frame')
plt.ylabel('Memory (MB)')
plt.savefig('unity_memory_trend.png')
분석 결과 → performance-optimization 스킬로 연동
More from jeo-tech-ai/oh-my-unity3d
bmad-gds
AI-driven Game Development Studio using BMAD methodology. Covers the Execute and Assess phases of the BMAD TEA (Think→Execute→Assess) loop — the core Specification-Driven Development (SDD) cycle. Routes game projects through Pre-production, Design, Architecture, Production, and Game Testing phases with 6 specialized agents. Supports Unity, Unreal Engine, Godot, and custom engines.
11code-review
Conduct thorough, constructive code reviews for quality and security. Use when reviewing pull requests, checking code quality, identifying bugs, or auditing security. Handles best practices, SOLID principles, security vulnerabilities, performance analysis, and testing coverage.
10unity-mcp
|
10changelog-maintenance
Maintain a clear and informative changelog for software releases. Use when documenting version changes, tracking features, or communicating updates to users. Handles semantic versioning, changelog formats, and release notes.
9ohmg
Ultimate multi-agent framework for Google Antigravity. Orchestrates specialized domain agents (PM, Frontend, Backend, Mobile, QA, Debug) via Serena Memory.
9backend-testing
Write comprehensive backend tests including unit tests, integration tests, and API tests. Use when testing REST APIs, database operations, authentication flows, or business logic. Handles Jest, Pytest, Mocha, testing strategies, mocking, and test coverage.
9