data-analysis
SKILL.md
Data Analysis Workflow
Run an end-to-end data analysis in R: load, explore, analyze, and produce publication-ready output.
Input: $ARGUMENTS — a dataset path (e.g., data/county_panel.csv) or a description of the analysis goal (e.g., "regress wages on education with state fixed effects using CPS data").
Constraints
- Follow R code conventions in
.claude/rules/r-code-conventions.md - Save all scripts to
scripts/R/with descriptive names - Save all outputs (figures, tables, RDS) to
output/ - Use
saveRDS()for every computed object — Quarto slides may need them - Use project theme for all figures (check for custom theme in
.claude/rules/) - Run r-reviewer on the generated script before presenting results
Workflow Phases
Phase 1: Setup and Data Loading
- Read
.claude/rules/r-code-conventions.mdfor project standards - Create R script with proper header (title, author, purpose, inputs, outputs)
- Load required packages at top (
library(), neverrequire()) - Set seed once at top:
set.seed(42) - Load and inspect the dataset
Phase 2: Exploratory Data Analysis
Generate diagnostic outputs:
- Summary statistics:
summary(), missingness rates, variable types - Distributions: Histograms for key continuous variables
- Relationships: Scatter plots, correlation matrices
- Time patterns: If panel data, plot trends over time
- Group comparisons: If treatment/control, compare pre-treatment means
Save all diagnostic figures to output/diagnostics/.
Phase 3: Main Analysis
Based on the research question:
- Regression analysis: Use
fixestfor panel data,lm/glmfor cross-section - Standard errors: Cluster at the appropriate level (document why)
- Multiple specifications: Start simple, progressively add controls
- Effect sizes: Report standardized effects alongside raw coefficients
Phase 4: Publication-Ready Output
Tables:
- Use
modelsummaryfor regression tables (preferred) orstargazer - Include all standard elements: coefficients, SEs, significance stars, N, R-squared
- Export as
.texfor LaTeX inclusion and.htmlfor quick viewing
Figures:
- Use
ggplot2with project theme - Set
bg = "transparent"for Beamer compatibility - Include proper axis labels (sentence case, units)
- Export with explicit dimensions:
ggsave(width = X, height = Y) - Save as both
.pdfand.png
Phase 5: Save and Review
saveRDS()for all key objects (regression results, summary tables, processed data)- Create
output/subdirectories as needed withdir.create(..., recursive = TRUE) - Run the r-reviewer agent on the generated script:
Delegate to the r-reviewer agent:
"Review the script at scripts/R/[script_name].R"
- Address any Critical or High issues from the review.
Script Structure
Follow this template:
# ============================================================
# [Descriptive Title]
# Author: [from project context]
# Purpose: [What this script does]
# Inputs: [Data files]
# Outputs: [Figures, tables, RDS files]
# ============================================================
# 0. Setup ----
library(tidyverse)
library(fixest)
library(modelsummary)
set.seed(42)
dir.create("output/analysis", recursive = TRUE, showWarnings = FALSE)
# 1. Data Loading ----
# [Load and clean data]
# 2. Exploratory Analysis ----
# [Summary stats, diagnostic plots]
# 3. Main Analysis ----
# [Regressions, estimation]
# 4. Tables and Figures ----
# [Publication-ready output]
# 5. Export ----
# [saveRDS for all objects, ggsave for all figures]
Important
- Reproduce, don't guess. If the user specifies a regression, run exactly that.
- Show your work. Print summary statistics before jumping to regression.
- Check for issues. Look for multicollinearity, outliers, perfect prediction.
- Use relative paths. All paths relative to repository root.
- No hardcoded values. Use variables for sample restrictions, date ranges, etc.
Weekly Installs
13
Repository
pedrohcgs/claud…workflowGitHub Stars
646
First Seen
Feb 19, 2026
Security Audits
Installed on
claude-code13
codex13
kimi-cli13
cursor13
opencode13
gemini-cli11