data-scientist
SKILL.md
Data Scientist
Expert in statistical analysis, experimentation, and business insights.
⚠️ Chunking Rule
Large analyses (EDA + modeling + visualization) = 800+ lines. Generate ONE phase per response: EDA → Feature Engineering → Modeling → Evaluation → Recommendations
Core Capabilities
Statistical Modeling
- Hypothesis testing (t-test, chi-square, ANOVA)
- Regression analysis (linear, logistic, GLMs)
- Bayesian inference
- Causal inference (propensity score matching, DiD)
Experimentation
- A/B test design and analysis
- Sample size calculation
- Statistical power analysis
- Multi-armed bandits
Customer Analytics
- Customer Lifetime Value (CLV) prediction
- Churn prediction and prevention
- Cohort analysis
- RFM segmentation
Anomaly Detection
- Isolation Forest for outliers
- DBSCAN clustering
- Statistical process control
- Time series anomaly detection
Experiment Tracking
- MLflow integration for experiment logging
- Weights & Biases (W&B) support
- Experiment comparison and visualization
- Model versioning and registry
Data Visualization
- Exploratory data analysis (EDA)
- Distribution plots and correlations
- Time series visualization
- Interactive dashboards (Plotly, Streamlit)
Best Practices
# A/B Test Analysis
from scipy import stats
def analyze_ab_test(control, treatment, metric='conversion'):
# Check sample size
n_control, n_treatment = len(control), len(treatment)
# Statistical test
t_stat, p_value = stats.ttest_ind(control[metric], treatment[metric])
# Effect size (Cohen's d)
pooled_std = np.sqrt((control[metric].var() + treatment[metric].var()) / 2)
effect_size = (treatment[metric].mean() - control[metric].mean()) / pooled_std
return {
'p_value': p_value,
'significant': p_value < 0.05,
'effect_size': effect_size,
'lift': (treatment[metric].mean() / control[metric].mean() - 1) * 100
}
# Experiment Tracking with MLflow
import mlflow
with mlflow.start_run(run_name="experiment-001"):
mlflow.log_param("model_type", "xgboost")
mlflow.log_params(model.get_params())
# Train and evaluate
model.fit(X_train, y_train)
predictions = model.predict(X_test)
# Log metrics
mlflow.log_metric("accuracy", accuracy_score(y_test, predictions))
mlflow.log_metric("f1", f1_score(y_test, predictions))
# Log model
mlflow.sklearn.log_model(model, "model")
When to Use
- Business analytics and insights
- A/B test design and analysis
- Customer segmentation and CLV
- Anomaly and fraud detection
- Experiment tracking and comparison
- Data visualization and EDA
Weekly Installs
11
Repository
anton-abyzov/specweaveGitHub Stars
82
First Seen
Jan 25, 2026
Installed on
claude-code9
opencode8
antigravity8
codex8
gemini-cli8
cursor7