Parameter Recovery Checker
Parameter Recovery Checker
Purpose
This skill encodes expert methodological knowledge for conducting parameter recovery studies -- a critical validation step before interpreting fitted model parameters. Parameter recovery determines whether a model's parameters are identifiable given the experimental design and sample size. A general-purpose programmer unfamiliar with computational modeling would not know that fitting a model is insufficient validation, or how to diagnose parameter tradeoffs and non-identifiability.
When to Use This Skill
- Before trusting fitted parameter values from any computational cognitive model
- When developing a new model and assessing whether parameters can be distinguished from data
- When planning an experiment and determining the minimum trial count for reliable parameter estimation
- When a reviewer asks for evidence of model identifiability
- When comparing models and needing to ensure each model can be distinguished (model recovery)
- When fitted parameters produce suspiciously extreme values or hit bounds
Research Planning Protocol
Before executing the domain-specific steps below, you MUST:
- State the research question -- What specific question is this analysis/paradigm addressing?
- Justify the method choice -- Why is this approach appropriate? What alternatives were considered?
- Declare expected outcomes -- What results would support vs. refute the hypothesis?
- Note assumptions and limitations -- What does this method assume? Where could it mislead?
- Present the plan to the user and WAIT for confirmation before proceeding.
For detailed methodology guidance, see the research-literacy skill.
⚠️ Verification Notice
This skill was generated by AI from academic literature. All parameters, thresholds, and citations require independent verification before use in research. If you find errors, please open an issue.
Why Parameter Recovery Matters
Fitting a model to data and obtaining parameter estimates does NOT guarantee those estimates are meaningful (Wilson & Collins, 2019; Navarro, 2019). Common failure modes:
- Non-identifiability: Multiple parameter combinations produce identical model predictions (e.g., drift rate and boundary in DDM trade off; Ratcliff & Tuerlinckx, 2002)
- Insufficient data: Too few trials for the fitting procedure to recover true values
- Local minima: Optimization converges to wrong parameter values
- Model misspecification: The fitting procedure recovers parameters that do not reflect the assumed cognitive process
Parameter recovery is the standard diagnostic for these problems (Heathcote et al., 2015; Wilson & Collins, 2019).
Step-by-Step Recovery Procedure
Step 1: Define the Parameter Space
Choose ground-truth parameter values that span the plausible range for each parameter.
How many parameter sets to simulate?
|
+-- Minimum: 100 parameter sets (Wilson & Collins, 2019)
|
+-- Recommended: 500-1000 parameter sets for smooth recovery landscapes
|
+-- For publication: 1000+ parameter sets (Heathcote et al., 2015)
Sampling strategy:
| Strategy | When to Use | Source |
|---|---|---|
| Uniform grid | Few parameters (1-2), want complete coverage | Standard practice |
| Latin hypercube | 3+ parameters, want space-filling without excessive samples | McKay et al., 1979 |
| Random uniform | Simple, adequate for many parameters | Wilson & Collins, 2019 |
| Prior-based sampling | Have informative priors on parameter ranges | Palestro et al., 2018 |
Range selection: Use ranges from published parameter estimates in the domain. For example:
- DDM drift rate v: 0.5 -- 4.0 (Ratcliff & McKoon, 2008)
- DDM boundary a: 0.5 -- 2.5 (Ratcliff & McKoon, 2008)
- DDM non-decision time Ter: 0.1 -- 0.5 s (Ratcliff & McKoon, 2008)
- ACT-R activation noise s: 0.1 -- 0.8 (Anderson, 2007)
Step 2: Simulate Data
For each ground-truth parameter set:
- Match the experimental design exactly -- Same number of trials, conditions, and structure as the real experiment
- Use the same model -- The generative model must be identical to the model you will fit
- Include realistic noise -- Use the model's noise mechanism (do not add external noise)
- Store the ground-truth parameters for later comparison
Critical: The number of simulated trials per participant must match the actual experiment. Recovery with 10,000 trials tells you nothing about recovery with 100 trials (Wilson & Collins, 2019).
Step 3: Fit the Model to Simulated Data
Apply the exact same fitting procedure you use for real data:
- Same optimization algorithm (e.g., MLE, Bayesian, chi-square minimization)
- Same parameter bounds and constraints
- Same starting values or initialization strategy
- Same convergence criteria
Multiple starting points: Run the optimizer from at least 5-10 random starting points per simulated dataset to avoid local minima (Heathcote et al., 2015).
Step 4: Evaluate Recovery Quality
Compare recovered parameters to true (ground-truth) parameters using multiple metrics.
Primary Metrics
| Metric | Formula | Good | Acceptable | Concerning | Source |
|---|---|---|---|---|---|
| Pearson correlation (r) | cor(true, recovered) | r > 0.9 | r > 0.8 | r < 0.7 | Heathcote et al., 2015; rough benchmarks |
| Bias | mean(recovered - true) | Near 0 | < 10% of range | > 20% of range | Wilson & Collins, 2019 |
| RMSE | sqrt(mean((recovered - true)^2)) | Small relative to range | -- | Large relative to range | Standard |
| Coverage | % of 95% CIs containing true value | ~95% | 85-100% | < 80% | Bayesian recovery |
Visualization (essential)
- Scatter plot: Recovered vs. true for each parameter (identity line = perfect recovery)
- Bland-Altman plot: Difference vs. mean (detect range-dependent bias)
- Parameter correlation matrix: Off-diagonal correlations reveal tradeoffs
See references/recovery-diagnostics.md for visualization templates.
Step 5: Check Parameter Tradeoffs
Correlation between recovered parameters:
Are any pairs of recovered parameters correlated |r| > 0.5?
|
+-- YES --> These parameters trade off. Consider:
| - Fixing one to a theoretically motivated value
| - Reparameterizing the model
| - Collecting more data to improve identifiability
| - Reporting the tradeoff and interpreting cautiously
|
+-- NO --> Parameters are identifiable given this design
Common parameter tradeoffs in cognitive models:
| Model | Correlated Parameters | Nature of Tradeoff | Source |
|---|---|---|---|
| DDM | Drift rate (v) and boundary (a) | Speed-accuracy tradeoff | Ratcliff & Tuerlinckx, 2002 |
| DDM | Non-decision time (Ter) and boundary (a) | Boundary absorbs timing variance | Ratcliff & Tuerlinckx, 2002 |
| ACT-R | Noise (s) and threshold (tau) | Both affect retrieval probability | Anderson, 2007 |
| RL models | Learning rate (alpha) and inverse temperature (beta) | Both control exploitation | Daw, 2011 |
| Signal detection | d-prime and criterion (c) | Criterion shift mimics sensitivity change | Macmillan & Creelman, 2005 |
Model Recovery (Confusion Matrix)
Model recovery extends parameter recovery to test whether the correct model can be identified from data (Wagenmakers et al., 2004).
Procedure
- For each candidate model M_k (k = 1, ..., K): a. Simulate data from M_k with representative parameters b. Fit ALL candidate models to the simulated data c. Select the best-fitting model using your comparison metric (AIC, BIC, Bayes factor)
- Construct a K x K confusion matrix: rows = generating model, columns = selected model
- Diagonal entries should dominate (correct model selected)
Quality Criteria
| Metric | Good | Concerning | Source |
|---|---|---|---|
| Diagonal proportion | > 90% correct | < 70% correct | Wagenmakers et al., 2004 |
| Off-diagonal patterns | Symmetric confusion | Asymmetric (one model always "wins") | Wilson & Collins, 2019 |
Warning: If model A is selected when data are generated from model B more than 20% of the time, those models are not distinguishable with your experimental design (Wilson & Collins, 2019).
Sample Size Effects
How Trial Count Affects Recovery
Recovery quality improves with more trials per participant. Test recovery at multiple trial counts:
| Trial Count | Expected Recovery | Recommendation |
|---|---|---|
| < 50 trials | Often poor (r < 0.7) | Increase trials or simplify model |
| 50-100 trials | Marginal for simple models | May suffice for 2-3 parameter models |
| 100-200 trials | Adequate for most models | Standard for DDM (Ratcliff & McKoon, 2008) |
| 200-500 trials | Good for complex models | Recommended for models with > 4 parameters |
| 500+ trials | Excellent for most models | Required for hierarchical models |
Source: Wilson & Collins (2019); Ratcliff & Tuerlinckx (2002) for DDM-specific guidance.
Recovery as a Function of N
Plot recovery metrics (r, RMSE) as a function of trial count to determine the minimum viable N for your specific model and paradigm.
Landscape Analysis
Parameter Sensitivity Surfaces
For 1-2 key parameters, compute and visualize the objective function surface:
- Fix all parameters except the target parameter(s)
- Evaluate the objective function (e.g., negative log-likelihood) at a grid of values
- Plot the surface (1D: line; 2D: contour or heatmap)
What to look for:
| Surface Feature | Interpretation | Action |
|---|---|---|
| Single sharp minimum | Well-identified parameter | Proceed with confidence |
| Broad flat minimum | Parameter poorly constrained | Widen prior or collect more data |
| Multiple minima | Non-convex; local minima risk | Use multiple starting points; consider reparameterization |
| Ridge (elongated valley) | Parameter tradeoff | Two parameters are correlated; consider fixing one |
Reporting Standards
Minimum Reporting Checklist
When publishing a parameter recovery study:
- Number of simulated parameter sets (minimum 100; Wilson & Collins, 2019)
- Sampling strategy for ground-truth parameters (uniform, LHS, prior-based)
- Range of ground-truth values for each parameter (with justification)
- Number of simulated trials per dataset (must match real experiment)
- Fitting procedure used (same as for real data)
- Number of starting points for optimization
- Recovery metrics for each parameter: correlation (r), bias, RMSE
- Scatter plots: recovered vs. true for each parameter
- Parameter correlation matrix (recovered parameters)
- Model recovery confusion matrix (if performing model comparison)
- Recovery as a function of trial count (if applicable)
Where to Report
- Main text: Summary of recovery quality (r values, key plots)
- Supplementary: Full correlation matrices, all scatter plots, landscape analyses
- Parameter recovery is increasingly expected in top journals (Wilson & Collins, 2019; Navarro, 2019)
Common Pitfalls
- Testing recovery with too many trials: Simulating 10,000 trials when the experiment has 100. Recovery will look excellent but is irrelevant to your actual data (Wilson & Collins, 2019).
- Using different fitting procedures: The recovery study must use the identical optimization pipeline as the real-data analysis. Different starting values, bounds, or algorithms invalidate the test.
- Ignoring parameter correlations: High marginal recovery (good r for each parameter) can coexist with strong parameter tradeoffs that distort interpretation. Always check the cross-parameter correlation matrix.
- Reporting only correlation: Correlation measures rank-order recovery but ignores systematic bias. A parameter can have r = 0.95 but be consistently overestimated by 30%. Report bias and RMSE alongside r.
- Sampling only near defaults: If ground-truth values cluster around typical defaults, recovery may look good only in that region. Sample across the full plausible range.
- Neglecting model recovery: Good parameter recovery does not guarantee good model recovery. Two models can have recoverable parameters individually but be indistinguishable when competing (Wagenmakers et al., 2004).
- Confusing identifiability with validity: A model can have perfectly recoverable parameters and still be a poor model of cognition. Recovery is necessary but not sufficient (Navarro, 2019).
References
- Anderson, J. R. (2007). How Can the Human Mind Occur in the Physical Universe? Oxford University Press.
- Daw, N. D. (2011). Trial-by-trial data analysis using computational models. In M. R. Delgado, E. A. Phelps, & T. W. Robbins (Eds.), Decision Making, Affect, and Learning. Oxford University Press.
- Heathcote, A., Brown, S. D., & Wagenmakers, E.-J. (2015). An introduction to good practices in cognitive modeling. In B. U. Forstmann & E.-J. Wagenmakers (Eds.), An Introduction to Model-Based Cognitive Neuroscience. Springer.
- Macmillan, N. A., & Creelman, C. D. (2005). Detection Theory: A User's Guide (2nd ed.). Lawrence Erlbaum Associates.
- McKay, M. D., Beckman, R. J., & Conover, W. J. (1979). A comparison of three methods for selecting values of input variables. Technometrics, 21(2), 239-245.
- Navarro, D. J. (2019). Between the devil and the deep blue sea: Tensions between scientific judgement and statistical model selection. Computational Brain & Behavior, 2(1), 28-34.
- Palestro, J. J., Sederberg, P. B., Osth, A. F., Van Zandt, T., & Turner, B. M. (2018). Likelihood-free methods for cognitive science. Springer.
- Ratcliff, R., & McKoon, G. (2008). The diffusion decision model: Theory and data for two-choice decision tasks. Neural Computation, 20(4), 873-922.
- Ratcliff, R., & Tuerlinckx, F. (2002). Estimating parameters of the diffusion model. Psychonomic Bulletin & Review, 9(3), 438-481.
- Wagenmakers, E.-J., Ratcliff, R., Gomez, P., & Iverson, G. J. (2004). Assessing model mimicry using the parametric bootstrap. Journal of Mathematical Psychology, 48(1), 28-50.
- Wilson, R. C., & Collins, A. G. (2019). Ten simple rules for the computational modeling of behavioral data. eLife, 8, e49547.
See references/ for diagnostic visualization templates and worked examples.