data-analysis
Data Analysis Workflow
Run an end-to-end data analysis in R: load, explore, analyze, and produce publication-ready output.
Input: $ARGUMENTS — a dataset path (e.g., data/county_panel.csv) or a description of the analysis goal (e.g., "regress wages on education with state fixed effects using CPS data").
Constraints
- Follow R code conventions in
.claude/rules/r-code-conventions.md - Save all scripts to
scripts/R/with descriptive names - Save all outputs (figures, tables, RDS) to
output/ - Use
saveRDS()for every computed object — Quarto slides may need them - Use project theme for all figures (check for custom theme in
.claude/rules/) - Run r-reviewer on the generated script before presenting results
Workflow Phases
Phase 0: Pre-Flight Report
Before writing any analysis code, produce a Pre-Flight Report showing you read the inputs. This prevents the common failure mode where the agent hallucinates variable names or skips project conventions.
Output block (in your response to the user, before Phase 1):
## Pre-Flight Report
**Dataset:** [path]
- Variables found: [list from head()/names()]
- Rows: [count]
- Key types: [e.g., "outcome=numeric, treatment=binary, state=factor"]
- Missing-data summary: [% missing per key var]
**Project conventions read:**
- `.claude/rules/r-code-conventions.md` — [one-line summary of most relevant rule]
- `.claude/rules/content-invariants.md` — [INV-9, INV-10, INV-11, INV-12 applicable]
**Task interpretation:** [one sentence restating what the user asked for]
**Plan:** [3-5 bullet outline of the R script structure]
If any input cannot be read (missing file, unreadable format), stop and ask the user before proceeding.
Phase 1: Setup and Data Loading
- Create R script with proper header (title, author, purpose, inputs, outputs)
- Load required packages at top (
library(), neverrequire()) - Set seed once at top in YYYYMMDD format (per
r-code-conventions.md), e.g.set.seed(20260415)(INV-9) - Load and inspect the dataset
Phase 2: Exploratory Data Analysis
Generate diagnostic outputs:
- Summary statistics:
summary(), missingness rates, variable types - Distributions: Histograms for key continuous variables
- Relationships: Scatter plots, correlation matrices
- Time patterns: If panel data, plot trends over time
- Group comparisons: If treatment/control, compare pre-treatment means
Save all diagnostic figures to output/diagnostics/.
Phase 3: Main Analysis
Based on the research question:
- Regression analysis: Use
fixestfor panel data,lm/glmfor cross-section - Standard errors: Cluster at the appropriate level (document why)
- Multiple specifications: Start simple, progressively add controls
- Effect sizes: Report standardized effects alongside raw coefficients
Phase 4: Publication-Ready Output
Tables:
- Use
modelsummaryfor regression tables (preferred) orstargazer - Include all standard elements: coefficients, SEs, significance stars, N, R-squared
- Export as
.texfor LaTeX inclusion and.htmlfor quick viewing
Figures:
- Use
ggplot2with project theme - Set
bg = "transparent"for Beamer compatibility - Include proper axis labels (sentence case, units)
- Export with explicit dimensions:
ggsave(width = X, height = Y) - Save as both
.pdfand.png
Phase 5: Save and Review
saveRDS()for all key objects (regression results, summary tables, processed data)- Create
output/subdirectories as needed withdir.create(..., recursive = TRUE) - Run the r-reviewer agent on the generated script:
Delegate to the r-reviewer agent:
"Review the script at scripts/R/[script_name].R"
- Address any Critical or High issues from the review.
Script Structure
Follow this template:
# ============================================================
# [Descriptive Title]
# Author: [from project context]
# Purpose: [What this script does]
# Inputs: [Data files]
# Outputs: [Figures, tables, RDS files]
# ============================================================
# 0. Setup ----
library(tidyverse)
library(fixest)
library(modelsummary)
set.seed(20260415) # YYYYMMDD per r-code-conventions.md (INV-9)
dir.create("output/analysis", recursive = TRUE, showWarnings = FALSE)
# 1. Data Loading ----
# [Load and clean data]
# 2. Exploratory Analysis ----
# [Summary stats, diagnostic plots]
# 3. Main Analysis ----
# [Regressions, estimation]
# 4. Tables and Figures ----
# [Publication-ready output]
# 5. Export ----
# [saveRDS for all objects, ggsave for all figures]
Important
- Reproduce, don't guess. If the user specifies a regression, run exactly that.
- Show your work. Print summary statistics before jumping to regression.
- Check for issues. Look for multicollinearity, outliers, perfect prediction.
- Use relative paths. All paths relative to repository root.
- No hardcoded values. Use variables for sample restrictions, date ranges, etc.
Long-running fits: use the Monitor tool (Apr 2026)
For regressions, simulations, or bootstrap loops that take more than a couple of minutes, launch via Bash with run_in_background: true and then use Anthropic's Monitor tool to stream R stdout into the conversation in real time. Pattern:
- Background-launch:
Rscript scripts/R/03_analyze.Rwithrun_in_background: true. Capture thebash_id. - Use Monitor on the
bash_iduntil a milestone fires (e.g.,Coefficients table written, or process exit). - Continue or course-correct based on what the stream reveals.
This avoids the polling-loop anti-pattern (sleep 30; check; sleep 30; check) and avoids burning cache on idle waits. Especially useful when paired with the Cost-Conscious Parallelism section of the guide.
More from pedrohcgs/claude-code-my-workflow
create-lecture
Create a new Beamer lecture `.tex` from source papers and materials, with notation consistency checks and the project's preamble wired in. Use when user says "create a lecture on X", "new lecture from these papers", "start a deck on topic Y", "scaffold a new Beamer file", "build me a lecture from these PDFs". Scaffolds the full deck — NOT for compiling existing `.tex` (use `/compile-latex`).
26proofread
Read-only proofreading pass over lecture `.tex` or `.qmd` files. Checks grammar, typos, overflow, terminology consistency, and academic writing quality; produces a report without editing. Use when user says "proofread", "check for typos", "look for grammar issues", "copy-edit this", "any writing errors?", or before a lecture release.
26review-paper
Comprehensive manuscript review covering argument structure, econometric specification, citation completeness, and potential referee objections
25context-status
|
22lit-review
Structured literature search + synthesis with citation extraction, thematic clustering, and gap identification. Use when user says "find papers on X", "do a lit review", "what's the literature on...", "summarize what we know about...", "where's the gap in this field", "review recent work on Y". Produces a written review with BibTeX-ready citations. Uses WebSearch/WebFetch for recent work.
22pedagogy-review
Holistic pedagogical review of a lecture deck (`.qmd` or `.tex`). Checks narrative arc, prerequisite assumptions, worked examples, notation clarity, and deck-level pacing. Use when user says "pedagogy review", "does this teach well?", "is the flow right?", "will students follow?", "review the narrative", or before teaching a deck for the first time. Read-only; produces a report.
22