Research Literacy
Research Literacy
Purpose
AI agents tend to execute analysis steps immediately without planning or justification. In research, every analysis decision needs a rationale grounded in theory, design, and data characteristics. This skill encodes the basic scientific thinking that should precede any domain-specific action.
A competent programmer without research training will typically: (a) pick a familiar method rather than the appropriate one, (b) skip assumption checks, (c) interpret results without considering alternative explanations, and (d) make undisclosed analytic choices that inflate false positive rates. This skill exists to prevent all four failure modes.
When to Use
- Before or alongside any domain-specific skill from this project (e.g., before running an ERP analysis, first formulate the research question and justify the method).
- Standalone when planning a study, reviewing an analysis pipeline, or interpreting results.
- Whenever an analysis involves researcher degrees of freedom — choices that could have been made differently and would affect the outcome.
⚠️ Verification Notice
This skill was generated by AI from academic literature. All parameters, thresholds, and citations require independent verification before use in research. If you find errors, please open an issue.
Research Question Formulation
From Vague Idea to Testable Hypothesis
A research question must be specific, falsifiable, and operationalized before any data analysis begins.
- Start with the phenomenon: What behavior, neural signal, or cognitive process are you interested in?
- Identify the gap: What is unknown or contested in the existing literature?
- Formulate as a directional or non-directional prediction: Specify the expected relationship between variables.
- Operationalize: Define how each construct is measured and what constitutes evidence for or against the hypothesis.
The PICOS Framework for Cognitive Science
Adapted from evidence-based medicine, PICOS structures research questions systematically:
| Element | General Definition | Cognitive Science Example |
|---|---|---|
| Population | Who is studied | Healthy adults aged 18-35; patients with aphasia |
| Intervention / Exposure | What manipulation or variable | Semantic priming; TMS to DLPFC |
| Comparison | What is the control condition | Unrelated prime; sham stimulation |
| Outcome | What is measured | N400 amplitude; reaction time; BOLD signal |
| Study design | How is the study structured | Within-subjects; longitudinal; cross-sectional |
Exploratory vs. Confirmatory Research
This distinction is critical for valid inference (Wagenmakers et al., 2012):
- Confirmatory research tests a pre-specified hypothesis. Statistical tests (p-values, confidence intervals) are only valid in this context. Requires preregistration of hypotheses and analysis plan.
- Exploratory research generates hypotheses from data. Results are descriptive and hypothesis-generating, not hypothesis-testing. Statistical tests in exploratory work should be interpreted as descriptive, not inferential.
- Mixing the two without disclosure is a primary driver of the replication crisis (Nosek et al., 2018). If you discover a pattern in the data and then test it in the same dataset, the resulting p-value is not valid.
Rule: Always declare whether an analysis is confirmatory or exploratory before executing it. If the analysis plan changed after seeing the data, label it exploratory.
Method Selection Justification
Match Question Type to Analysis Family
| Research Question Type | Analysis Family | Examples |
|---|---|---|
| Group differences | Comparison | t-test, ANOVA, Mann-Whitney, permutation test |
| Relationships between variables | Association | Correlation, regression, structural equation modeling |
| Predicting outcomes | Prediction | Regression, classification, machine learning |
| Describing patterns | Description | Descriptive statistics, factor analysis, clustering |
| Temporal dynamics | Time-series | Time-frequency, autoregressive models, HMM |
| Neural representations | Multivariate | RSA, MVPA, encoding models |
Decision Criteria for Method Selection
When choosing a method, consider and document the following:
- Data type: Continuous, ordinal, categorical, count? This constrains the model family.
- Design structure: Between-subjects, within-subjects, mixed? Nested or crossed random effects? This determines the error structure.
- Sample size: Is N sufficient for the chosen method? Underpowered studies waste resources and inflate effect size estimates (Button et al., 2013). See
references/common-assumptions.mdfor method-specific guidance. - Assumption profile: Does the data meet the method's assumptions? See
references/common-assumptions.md. - Multiple comparisons: How many tests will be performed? What correction is appropriate? (Benjamini & Hochberg, 1995, for FDR; Bonferroni for strict family-wise control; cluster-based permutation for neuroimaging, Maris & Oostenveld, 2007).
The "Method Hammer" Anti-Pattern
"If all you have is a hammer, everything looks like a nail."
This anti-pattern occurs when a researcher applies the method they are most comfortable with, regardless of whether it is appropriate. Examples:
- Using a t-test when the design has multiple crossed factors (requires ANOVA or mixed model)
- Applying parametric tests to ordinal Likert data without justification
- Using mass-univariate analysis when the research question is about distributed patterns (requires MVPA)
- Defaulting to frequentist tests when the question is about evidence for the null (requires Bayesian analysis or equivalence testing)
Rule: Always articulate why THIS method and not alternatives. Document the alternatives considered and why they were rejected.
Expected Outcomes Declaration
Before running any analysis, declare what each possible outcome means:
The Three-Outcome Framework
- If H1 is supported: What specific pattern of results would you expect? (e.g., "a significant interaction between condition and group, with a larger N400 for incongruent trials in the control group but not the patient group")
- If H0 is supported: What would the data look like? (e.g., "no significant effects, Bayes factor favoring H0 > 3")
- If results are ambiguous: What would be inconclusive? (e.g., "a trend-level effect, p = .05-.10, with a small effect size below the smallest effect of interest")
Why This Matters
Declaring expected outcomes in advance prevents:
- HARKing (Hypothesizing After Results are Known): presenting post-hoc hypotheses as if they were a priori predictions (Kerr, 1998). A survey of researchers found that 43% self-reported HARKing at least once (Fiedler & Schwarz, 2016).
- Post-hoc rationalization: finding a plausible story for any result after the fact.
- Outcome switching: changing the primary outcome measure after seeing which one yields significant results.
Assumptions and Limitations Awareness
Every Method Has Assumptions
No statistical method is assumption-free. Before applying any method, identify its key assumptions and check them. The full reference table is in references/common-assumptions.md.
Common Assumption Categories
- Independence: Observations are not systematically related to each other. Violated by: repeated measures, clustered data, spatial/temporal autocorrelation in neural data.
- Normality: The sampling distribution of the test statistic is normal. Often confused with normality of raw data. Relevant for small samples; large samples benefit from the central limit theorem.
- Homogeneity of variance: Variance is equal across groups or conditions. Violated when group sizes are unequal and variances differ. Use Welch's correction or robust methods.
- Stationarity: Statistical properties do not change over time. Relevant for EEG, fMRI time series. Violated by habituation, fatigue, scanner drift.
- Measurement validity: The measure actually captures the construct of interest. No statistical test can fix a bad measure. Construct validity must be argued on theoretical grounds.
- Correct model specification: The statistical model matches the data-generating process. Omitted variables, wrong functional form, and incorrect random effects structure all threaten validity (Barr et al., 2013).
Limitations Are Not Optional
Every study has limitations. Common categories:
- Internal validity threats: confounds, demand characteristics, order effects
- External validity threats: limited sample demographics, artificial lab conditions
- Statistical conclusion validity: low power, violated assumptions, multiple comparisons
- Construct validity threats: impure measures, task impurity in neuropsychology
Rule: List limitations upfront, not as an afterthought. This is not a weakness; it is scientific rigor.
Human-in-the-Loop Principles
Why AI Agents Must Pause
Research involves judgment calls where reasonable experts disagree. These "researcher degrees of freedom" (Simmons et al., 2011) can inflate false positive rates from a nominal 5% to as high as 60% when left unchecked (Simmons et al., 2011). AI agents must not make these decisions silently.
Mandatory Pause Points
ALWAYS present the analysis plan and WAIT for user confirmation before proceeding at these decision points:
- Participant or trial exclusion: "I propose excluding 3 participants based on [criterion]. Here is the exclusion rationale and the impact on sample size."
- Outlier treatment: "These data points are [N] SDs from the mean. Options: (a) winsorize, (b) trim, (c) transform, (d) use robust methods, (e) retain. Each has different implications."
- Multiple comparisons correction: "With [N] comparisons, I recommend [method]. Alternatives are [list]. The choice affects sensitivity and specificity as follows..."
- Model specification: "I am fitting [model]. Key choices include [random effects structure, covariates, link function]. Here is why, and here are alternatives."
- Data transformation: "The data violate [assumption]. I propose [transformation/alternative method]. This changes the interpretation as follows..."
- Unexpected results: "The results do not match the predicted pattern. Before interpreting, consider: (a) the analysis may be wrong, (b) the hypothesis may be wrong, (c) there may be a confound."
Transparency Protocol
- Never silently drop data points, trials, or participants
- Never silently switch between one-tailed and two-tailed tests
- Never silently add or remove covariates
- Never silently change the dependent variable or time window
- Always report the full set of analyses, not just significant ones
Common Research Anti-Patterns
These are well-documented threats to research integrity. An AI agent must actively avoid them and flag when a user's request risks falling into one.
1. p-Hacking
Running multiple analyses, selectively reporting significant results, or tweaking analysis parameters until p < .05. Simulations show this can inflate false positive rates from 5% to over 60% (Simmons et al., 2011, Psychological Science, 22(11), 1359-1366).
How to avoid: Preregister analyses. Report all analyses conducted. Use correction for multiple comparisons.
2. HARKing (Hypothesizing After Results are Known)
Presenting post-hoc hypotheses as if they were a priori predictions (Kerr, 1998, Personality and Social Psychology Review, 2(3), 196-217).
How to avoid: Write down hypotheses before analysis. Clearly label any post-hoc exploration.
3. Confirmation Bias in Analysis
Selectively reporting evidence that supports preferred conclusions while downplaying contradictory evidence.
How to avoid: Report effect sizes and confidence intervals for all outcomes, not just significant ones. Use adversarial collaboration or preregistered analysis plans.
4. Garden of Forking Paths
Even without deliberate p-hacking, undisclosed analytic flexibility creates a "garden of forking paths" where many analysis pipelines could have been chosen, inflating the effective number of comparisons (Gelman & Loken, 2014, American Scientist, 102(6), 460-465).
How to avoid: Document every analytic decision and its alternatives. Consider multiverse analysis (Steegen et al., 2016).
5. Cargo Cult Statistics
Applying statistical procedures as rituals without understanding the underlying assumptions or logic. The "null ritual" — mechanically testing H0 at alpha = .05 without specifying H1, considering effect sizes, or evaluating power — is the canonical example (Gigerenzer, 2004, Journal of Socio-Economics, 33, 587-606).
How to avoid: For every test, articulate: What is H0? What is H1? What is the expected effect size? What is the power? Is the test appropriate for this data structure?
6. Outcome Switching
Changing the primary outcome variable after seeing the data because the original outcome was not significant.
How to avoid: Preregister primary and secondary outcomes. Report results for the preregistered primary outcome regardless of significance.
The Planning Protocol
This is the core procedure. Execute these steps before any analysis.
Step 1: State the Research Question
Write the question in one sentence. It must be specific, testable, and falsifiable. Use the PICOS framework above.
Step 2: Classify as Confirmatory or Exploratory
If confirmatory, a preregistered hypothesis must exist. If exploratory, label all results as hypothesis-generating.
Step 3: Justify the Chosen Method
Name the method, explain why it is appropriate for this question and data, and list alternatives that were considered and why they were rejected.
Step 4: Declare Expected Outcomes
For each hypothesis, state what supporting, refuting, and ambiguous results would look like, with expected effect sizes where possible.
Step 5: List Assumptions and Limitations
Enumerate the method's statistical assumptions and how they will be checked. List known limitations of the design and analysis.
Step 6: Present the Plan to the User
Show the complete plan in a structured format (see references/planning-template.md). Include decision points where user input is required.
Step 7: WAIT for User Confirmation
Do not proceed until the user approves the plan or requests modifications.
Step 8: Execute and Compare
After analysis, explicitly compare results to the expected outcomes declared in Step 4. Discuss discrepancies honestly.
Step 9: Report Limitations
Reiterate limitations, including any that became apparent during analysis (e.g., assumption violations, unexpected data patterns).
Key References
- Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3), 255-278.
- Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B, 57(1), 289-300.
- Button, K. S., Ioannidis, J. P., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S., & Munafo, M. R. (2013). Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14(5), 365-376.
- Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155-159.
- Fiedler, K., & Schwarz, N. (2016). Questionable research practices revisited. Social Psychological and Personality Science, 7(1), 45-52.
- Gelman, A., & Loken, E. (2014). The statistical crisis in science. American Scientist, 102(6), 460-465.
- Gigerenzer, G. (2004). Mindless statistics. Journal of Socio-Economics, 33, 587-606.
- Kerr, N. L. (1998). HARKing: Hypothesizing after the results are known. Personality and Social Psychology Review, 2(3), 196-217.
- Maris, E., & Oostenveld, R. (2007). Nonparametric statistical testing of EEG- and MEG-data. Journal of Neuroscience Methods, 164(1), 177-190.
- Munafo, M. R., Nosek, B. A., Bishop, D. V. M., et al. (2017). A manifesto for reproducible science. Nature Human Behaviour, 1, 0021.
- Nosek, B. A., Ebersole, C. R., DeHaven, A. C., & Mellor, D. T. (2018). The preregistration revolution. Proceedings of the National Academy of Sciences, 115(11), 2600-2606.
- Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716.
- Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22(11), 1359-1366.
- Steegen, S., Tuerlinckx, F., Gelman, A., & Vanpaemel, W. (2016). Increasing transparency through a multiverse analysis. Perspectives on Psychological Science, 11(5), 702-712.
- Wagenmakers, E.-J., Wetzels, R., Borsboom, D., van der Maas, H. L. J., & Kievit, R. A. (2012). An agenda for purely confirmatory research. Perspectives on Psychological Science, 7(6), 632-638.