statistical-analysis
Statistical Analysis
Comprehensive statistical testing, power analysis, and experimental design for reproducible research.
When to Use
- Conducting statistical hypothesis tests (t-tests, ANOVA, chi-square)
- Performing regression or correlation analyses
- Running Bayesian statistical analyses
- Checking statistical assumptions and diagnostics
- Calculating effect sizes and conducting power analyses
- Reporting statistical results in APA format
- Planning experiments with proper power calculations
- Helping with the ANALYSIS phase of a research project
Workflow Decision Tree
START
│
├─ Need to SELECT a statistical test?
│ └─ See "Test Selection Guide"
│
├─ Ready to check ASSUMPTIONS?
│ └─ See "Assumption Checking"
│
├─ Ready to run ANALYSIS?
│ └─ See "Running Statistical Tests"
│
└─ Need to REPORT results?
└─ See "Reporting Results (APA)"
Test Selection Guide
Quick Reference: Choosing the Right Test
Comparing Two Groups:
| Data Type | Distribution | Design | Test |
|---|---|---|---|
| Continuous | Normal | Independent | Independent t-test |
| Continuous | Non-normal | Independent | Mann-Whitney U |
| Continuous | Normal | Paired | Paired t-test |
| Continuous | Non-normal | Paired | Wilcoxon signed-rank |
| Binary | - | - | Chi-square / Fisher's exact |
Comparing 3+ Groups:
| Data Type | Distribution | Design | Test |
|---|---|---|---|
| Continuous | Normal | Independent | One-way ANOVA |
| Continuous | Non-normal | Independent | Kruskal-Wallis |
| Continuous | Normal | Paired | Repeated measures ANOVA |
| Continuous | Non-normal | Paired | Friedman test |
Relationships:
| Analysis | Use Case | Test |
|---|---|---|
| Two continuous vars | Normal | Pearson correlation |
| Two continuous vars | Non-normal | Spearman correlation |
| Continuous outcome + predictor(s) | Prediction | Linear regression |
| Binary outcome + predictor(s) | Classification | Logistic regression |
Assumption Checking
ALWAYS check assumptions before interpreting test results.
Key Assumptions to Check
import scipy.stats as stats
import numpy as np
# 1. Normality Test (Shapiro-Wilk)
stat, p = stats.shapiro(data)
print(f"Shapiro-Wilk: W={stat:.3f}, p={p:.3f}")
if p < 0.05:
print("⚠️ Normality assumption violated - consider non-parametric test")
# 2. Homogeneity of Variance (Levene's test)
stat, p = stats.levene(group1, group2)
print(f"Levene's: F={stat:.3f}, p={p:.3f}")
if p < 0.05:
print("⚠️ Variance assumption violated - use Welch's t-test")
# 3. Outlier Detection (IQR method)
Q1, Q3 = np.percentile(data, [25, 75])
IQR = Q3 - Q1
outliers = data[(data < Q1 - 1.5*IQR) | (data > Q3 + 1.5*IQR)]
print(f"Outliers detected: {len(outliers)}")
What to Do When Assumptions Are Violated
| Assumption | Violation | Solution |
|---|---|---|
| Normality (mild, n>30) | Proceed | Parametric tests are robust |
| Normality (severe) | Transform | Use log/sqrt or non-parametric |
| Homogeneity of variance | t-test | Use Welch's t-test |
| Homogeneity of variance | ANOVA | Use Welch's ANOVA |
| Linearity (regression) | Violated | Add polynomial terms or use GAM |
Running Statistical Tests
Python Libraries
import scipy.stats as stats # Core statistical tests
import statsmodels.api as sm # Regression, diagnostics
import pingouin as pg # User-friendly testing
import numpy as np
import pandas as pd
Common Analyses
T-Test with Complete Reporting
import pingouin as pg
# Independent t-test with effect size
result = pg.ttest(group_a, group_b, correction='auto')
print(f"t({result['dof'].values[0]:.0f}) = {result['T'].values[0]:.2f}, "
f"p = {result['p-val'].values[0]:.3f}, "
f"d = {result['cohen-d'].values[0]:.2f}")
One-Way ANOVA with Post-Hoc
import pingouin as pg
# ANOVA
aov = pg.anova(dv='score', between='group', data=df, detailed=True)
print(f"F = {aov['F'].values[0]:.2f}, p = {aov['p-unc'].values[0]:.3f}, "
f"η²_p = {aov['np2'].values[0]:.3f}")
# Post-hoc if significant
if aov['p-unc'].values[0] < 0.05:
posthoc = pg.pairwise_tukey(dv='score', between='group', data=df)
print(posthoc[['A', 'B', 'diff', 'p-tukey']])
Linear Regression with Diagnostics
import statsmodels.api as sm
# Fit model
X = sm.add_constant(predictors)
model = sm.OLS(outcome, X).fit()
print(model.summary())
# Key outputs
print(f"R² = {model.rsquared:.3f}, Adjusted R² = {model.rsquared_adj:.3f}")
print(f"F({model.df_model:.0f}, {model.df_resid:.0f}) = {model.fvalue:.2f}, p = {model.f_pvalue:.4f}")
Correlation with Confidence Intervals
import pingouin as pg
# Pearson correlation with CI
result = pg.corr(x, y, method='pearson')
print(f"r = {result['r'].values[0]:.3f}, "
f"p = {result['p-val'].values[0]:.3f}, "
f"95% CI [{result['CI95%'].values[0][0]:.3f}, {result['CI95%'].values[0][1]:.3f}]")
Effect Sizes
Always report effect sizes alongside p-values.
Quick Reference: Effect Size Benchmarks
| Test | Effect Size | Small | Medium | Large |
|---|---|---|---|---|
| T-test | Cohen's d | 0.20 | 0.50 | 0.80 |
| ANOVA | η²_p (partial eta²) | 0.01 | 0.06 | 0.14 |
| Correlation | r | 0.10 | 0.30 | 0.50 |
| Regression | R² | 0.02 | 0.13 | 0.26 |
| Chi-square | Cramér's V | 0.07 | 0.21 | 0.35 |
Important: These are guidelines only. Practical significance depends on context.
Power Analysis
A Priori Power Analysis (Before Study)
from statsmodels.stats.power import tt_ind_solve_power, FTestAnovaPower
# T-test: Required n for d=0.5, power=0.80, alpha=0.05
n = tt_ind_solve_power(effect_size=0.5, alpha=0.05, power=0.80, ratio=1.0)
print(f"Required n per group: {n:.0f}")
# ANOVA: Required n for f=0.25, 3 groups
power_anova = FTestAnovaPower()
n = power_anova.solve_power(effect_size=0.25, ngroups=3, alpha=0.05, power=0.80)
print(f"Required n per group: {n:.0f}")
Sensitivity Analysis (After Study)
# What effect could we detect with n=50 per group?
detectable_d = tt_ind_solve_power(effect_size=None, nobs1=50, alpha=0.05,
power=0.80, ratio=1.0)
print(f"Minimum detectable effect: d = {detectable_d:.2f}")
Reporting Results (APA Format)
Templates for Common Tests
Independent T-Test:
Group A (n = 48, M = 75.2, SD = 8.5) scored significantly higher than
Group B (n = 52, M = 68.3, SD = 9.2), t(98) = 3.82, p < .001, d = 0.77,
95% CI [0.36, 1.18].
One-Way ANOVA:
A one-way ANOVA revealed a significant main effect of treatment on test
scores, F(2, 147) = 8.45, p < .001, η²_p = .10. Post hoc comparisons
using Tukey's HSD indicated that Condition A (M = 78.2, SD = 7.3)
differed significantly from Condition B (M = 71.5, SD = 8.1, p = .002).
Pearson Correlation:
There was a significant positive correlation between study hours and
exam scores, r(98) = .45, p < .001, 95% CI [.28, .59].
Multiple Regression:
Multiple regression was conducted with exam scores as the outcome.
The model was significant, F(3, 146) = 45.2, p < .001, R² = .48.
Study hours (β = .35, p < .001) and prior GPA (β = .28, p < .001)
were significant predictors.
Integration with RA Workflow
During PLANNING Phase
- Help determine appropriate sample sizes with power analysis
- Suggest statistical approaches for research design
During ANALYSIS Phase
- Run assumption checks on collected data
- Perform planned statistical analyses
- Generate effect sizes and confidence intervals
During WRITING Phase
- Format results for methods and results sections
- Generate APA-formatted statistical reports
- Connect to
/write_methodsand/write_resultsskills
Essential Reporting Elements
Always include:
- Descriptive statistics: M, SD, n for all groups
- Test statistics: Name, statistic value, df, exact p-value
- Effect sizes: With confidence intervals when possible
- Assumption checks: What was tested, results, any corrections
- All planned analyses: Including non-significant findings
More from braselog/researchassistant
scientific-writing
Write scientific manuscripts with proper structure (IMRAD), citations (APA/AMA/Vancouver), figures/tables, and reporting guidelines (CONSORT/STROBE/PRISMA). Use when drafting any manuscript section, improving writing clarity, or preparing for journal submission.
15deep-research
Conduct a thorough literature search on a topic with verified citations. Use when the user types /deep_research, asks to "research a topic", "find papers on", or needs literature review. CRITICAL - Never fabricate citations. Every claim must have a verifiable source.
9literature-review
Conduct comprehensive, systematic literature reviews using multiple databases (PubMed, bioRxiv, Semantic Scholar, OpenAlex). Creates documented searches, synthesizes findings thematically, verifies citations, and generates professional markdown reports with multiple citation styles (APA, Nature, Vancouver). Use when the user needs thorough literature research or types /deep_research.
5next
Assess current project state and suggest the most valuable next action. The primary entry point for users who aren't sure what to do. Analyzes context, checks for issues, and recommends specific skills or actions.
4quarterly-review
Conduct a quarterly review of your overall research mission and direction. This is a user-level review stored in ~/.researchAssistant/. Use when the user types /quarterly_review, every 3 months, after major project milestones, or when questioning research direction.
4weekly-review
Conduct a weekly review of project progress and plan for the upcoming week. Use when the user types /weekly_review, when it's Monday and no review exists for this week, or at the end of the work week. Aggregates daily activity entries and identifies patterns.
4