Reading Time Analysis
Reading Time Analysis
Purpose
This skill encodes expert methodological knowledge for analyzing eye-tracking data from reading experiments. A competent programmer without psycholinguistics training would likely compute a single "reading time" per word, missing the critical insight that different eye-tracking measures tap different stages of language processing. Choosing the wrong measure for your research question -- or failing to account for spillover effects, skipping patterns, and the distinction between first-pass and second-pass reading -- leads to misattribution of cognitive processes.
When to Use
Use this skill when:
- Analyzing eye-movement data from reading experiments (sentence or passage reading)
- Selecting which eye-tracking measures to report for a given linguistic manipulation
- Defining regions of interest and handling spillover effects
- Setting up statistical models for eye-tracking reading data
- Cleaning and filtering fixation data for reading analyses
Do not use this skill when:
- Analyzing self-paced reading data (see
self-paced-reading-designerfor that paradigm) - Analyzing eye movements in visual search or scene viewing (different fixation patterns)
- Working with eye-tracking data from non-reading tasks (e.g., visual world paradigm)
Research Planning Protocol
Before executing the domain-specific steps below, you MUST:
- State the research question -- What specific question is this analysis/paradigm addressing?
- Justify the method choice -- Why is this approach appropriate? What alternatives were considered?
- Declare expected outcomes -- What results would support vs. refute the hypothesis?
- Note assumptions and limitations -- What does this method assume? Where could it mislead?
- Present the plan to the user and WAIT for confirmation before proceeding.
For detailed methodology guidance, see the research-literacy skill.
⚠️ Verification Notice
This skill was generated by AI from academic literature. All parameters, thresholds, and citations require independent verification before use in research. If you find errors, please open an issue.
Eye-Tracking Reading Measures Hierarchy
Measure Definitions and Cognitive Interpretations
The following measures are ordered from earliest to latest processing stages. This hierarchy reflects the temporal unfolding of language comprehension during reading (Rayner, 1998, 2009; Clifton et al., 2007).
First-Pass Measures (Before Leaving the Region)
| Measure | Definition | Cognitive Process | When to Use |
|---|---|---|---|
| First Fixation Duration (FFD) | Duration of the first fixation on a word during first pass | Early lexical access; initial contact with the word (Rayner, 1998) | When testing early word recognition effects (frequency, predictability) |
| Single Fixation Duration (SFD) | Duration of the only fixation on a word, when exactly one first-pass fixation occurs | Cleaner measure of early lexical processing than FFD (Rayner, 2009) | When most words receive one fixation; avoids refixation confounds |
| Gaze Duration (GD) | Sum of all first-pass fixation durations on a word (before eyes leave the word in either direction) | Lexical processing / word identification (Rayner, 1998, 2009) | Default first-pass measure for most word-level analyses |
Late Measures (After Leaving the Region)
| Measure | Definition | Cognitive Process | When to Use |
|---|---|---|---|
| Go-Past Time (GPT) / Regression Path Duration | Time from first fixation on the word until first fixation to the right of the word (includes any regressions out and back) | Integration difficulty; signals reanalysis of prior material (Clifton et al., 2007) | When testing syntactic garden-path effects, semantic anomalies, discourse integration |
| Total Reading Time (TRT) | Sum of all fixation durations on a word (first pass + regressions back) | Overall processing difficulty (Rayner, 1998) | When interested in total processing cost regardless of time course |
| Regression Probability (Reg-out) | Binary: did the reader make a regression from this region? | Reanalysis / comprehension difficulty (Clifton et al., 2007) | When interested in whether (not how long) reanalysis occurred |
| Regression-in Probability | Binary: did the reader regress back to this region from downstream? | Downstream difficulty triggers revisitation (Rayner & Pollatsek, 1989) | When testing whether a region is revisited after later processing fails |
Decision Tree: Which Measure for Which Question?
What stage of processing is your manipulation expected to affect?
|
+-- EARLY LEXICAL (word frequency, orthographic regularity, predictability)
| |
| +-- Use GAZE DURATION as primary measure (Rayner, 1998, 2009)
| +-- Report FIRST FIXATION DURATION as supplementary
| +-- Report SINGLE FIXATION DURATION if high proportion of
| single-fixation cases (Rayner, 2009)
|
+-- LATE LEXICAL / POST-LEXICAL (semantic plausibility, thematic fit)
| |
| +-- Use GAZE DURATION for early effects
| +-- Use GO-PAST TIME for integration effects (Clifton et al., 2007)
| +-- Use TOTAL READING TIME for overall effects
|
+-- SYNTACTIC (garden-path, structural ambiguity, reanalysis)
| |
| +-- Use GO-PAST TIME as primary measure (Clifton et al., 2007)
| +-- Use REGRESSION PROBABILITY as complementary binary measure
| +-- Effects often appear in the SPILLOVER REGION (1-2 words
| post-critical; Rayner & Pollatsek, 1989)
|
+-- DISCOURSE / PRAGMATIC (reference resolution, inference, coherence)
| |
| +-- Use GO-PAST TIME and TOTAL READING TIME
| +-- Effects are typically late and may span multiple words
| +-- Consider REGRESSION-IN probability for earlier regions
|
+-- EXPLORATORY / UNKNOWN TIMING
|
+-- Report ALL major measures: FFD, GD, GPT, TRT, Reg-out
+-- Let the pattern across measures inform process interpretation
First-Pass vs. Second-Pass Distinction
| Category | Definition | Includes |
|---|---|---|
| First pass | All fixations from first entering a region until first leaving it (in either direction) | FFD, SFD, GD |
| Second pass | All fixations on a region after first leaving it | Re-reading time (TRT minus first-pass time) |
Why this matters: First-pass measures reflect initial processing; second-pass measures reflect recovery from processing difficulty encountered downstream. Conflating them obscures when processing difficulty arose.
Region of Interest (ROI) Definition
Word-Level ROIs
- The most common unit of analysis is the single word (Rayner, 1998)
- For multi-word critical regions, report analyses at both word level and region level
Multi-Word ROIs
- Sometimes necessary for syntactic manipulations where the critical structure spans multiple words
- Define ROIs a priori based on linguistic structure, not post-hoc based on where effects appear
- Report the number of characters and words in each ROI
Spillover Effects
Spillover is the delayed manifestation of a processing effect on fixations one or more words downstream of the critical word (Rayner & Pollatsek, 1989).
- Typical spillover range: 1-2 words after the critical word (Rayner, 1998)
- Always analyze the spillover region (word n+1, sometimes n+2) in addition to the critical word
- Spillover is most common for first-pass measures (GD, FFD)
- Pre-target region (word n-1) should also be checked to verify no confounding baseline differences
Parafoveal Preview Effects
- Words are partially processed before they are directly fixated -- the parafoveal preview benefit (Rayner, 1975; Rayner, 2009)
- Parafoveal preview extends to approximately 7-8 characters to the right of fixation in English (McConkie & Rayner, 1975)
- This means effects of word n's properties can appear on the last fixation of word n-1 (parafoveal-on-foveal effects; Drieghe et al., 2008)
Data Cleaning
Fixation Duration Cutoffs
| Criterion | Value | Rationale | Citation |
|---|---|---|---|
| Short fixation merge | < 80 ms within 1 character of another fixation: merge with nearest fixation | Too brief for meaningful processing; likely corrective saccade (Rayner & Pollatsek, 1989) | |
| Short fixation exclude | < 80 ms (not adjacent to another fixation): exclude | Not informative for reading (Rayner & Pollatsek, 1989) | |
| Long fixation exclude | > 800 ms: exclude | Likely track loss, inattention, or blink artifact (Rayner & Pollatsek, 1989) | |
| Alternative long cutoff | > 1000 ms or > 1200 ms | Used in some labs; report which cutoff and justify |
Note: Some researchers use 50 ms as the lower bound and 1000-1200 ms as the upper bound. The critical requirement is to report your exact cutoffs and the percentage of data excluded.
Trial-Level Exclusions
| Criterion | Action | Rationale |
|---|---|---|
| Track loss | Exclude trial | Unreliable position data |
| Blinks on critical region | Exclude trial | Missing fixation data on the ROI |
| First-pass skip of critical word | Exclude from first-pass measures (FFD, SFD, GD); include in TRT | Word was not fixated during first pass |
| Comprehension accuracy | Exclude participants below 80% on comprehension questions | Ensures reading for comprehension (Rayner et al., 2006) |
Skipping Rate Considerations
- Short, high-frequency, and predictable words are skipped 10-30% of the time (Rayner, 1998, 2009)
- Content words are skipped ~15% of the time; function words ~35% (Rayner, 2009)
- If skipping rates differ across conditions, this is informative -- report it
- For first-pass measures, words that are skipped contribute no data, not zero reading time
- Do not substitute zero for skipped words -- this conflates fast processing with no fixation
Statistical Modeling
Linear Mixed-Effects Models (LMMs)
Eye-tracking reading data should be analyzed with LMMs with crossed random effects for subjects and items (Baayen et al., 2008; Baayen, Davidson, & Bates, 2008):
# R formula (lme4 syntax):
gaze_duration ~ condition + (1 + condition | subject) + (1 + condition | item)
Why crossed random effects: Reading experiments use a Latin square design where every subject sees every item, but items rotate across conditions between subjects. Both subjects and items are random samples, and both contribute variance (Clark, 1973; Baayen et al., 2008).
Random Effects Structure
| Approach | Specification | When to Use | Citation |
|---|---|---|---|
| Maximal | Random intercepts + all random slopes justified by design | Default starting point | Barr et al., 2013 |
| Parsimonious | Remove random correlations first, then random slopes that explain ~0 variance | When maximal model fails to converge | Bates et al., 2015; Matuschek et al., 2017 |
Convergence protocol (Barr et al., 2013; Bates et al., 2015):
- Fit maximal model (all by-subject and by-item random slopes for within-unit factors)
- If convergence fails: remove correlations between random effects (use
||in lme4) - If still fails: remove the random slope with the smallest variance component
- Report the final model structure and note any simplifications
Distributional Considerations
Reading times are right-skewed and bounded below by zero. Options:
| Approach | When to Use | Citation |
|---|---|---|
| Log-transform | Simple; commonly used; adequate for many datasets | Standard in psycholinguistics |
| Inverse transform (-1000/RT) | Can outperform log for skewed RT data | Baayen & Milin, 2010 |
| Generalized LMM (Gamma) | Models the skewness directly; avoids back-transformation issues | Lo & Andrews, 2015 |
| Raw RT with residual checks | When effects are large and residuals are approximately normal | Baayen et al., 2008 |
Recommendation: Start with raw reading times in the LMM. Check residual plots. If residuals are non-normal, apply log-transformation or fit a GLMM with Gamma family and identity link (Lo & Andrews, 2015).
Multiple Comparisons
When analyzing multiple reading measures on the same data:
- Do not apply Bonferroni correction across measures -- each measure tests a different theoretical question (Clifton et al., 2007)
- Do correct within each measure if testing multiple contrasts
- Report effect sizes and confidence intervals alongside p-values
Typical Fixation Duration Benchmarks
These values serve as sanity checks for data quality (Rayner, 1998, 2009):
| Measure | Typical Range (Silent Reading) | Citation |
|---|---|---|
| Average fixation duration | 200-250 ms | Rayner, 1998, 2009 |
| Average saccade length | 7-9 characters (~2 degrees) | Rayner, 1998, 2009 |
| Regression rate | 10-15% of all saccades | Rayner, 1998 |
| Word skipping rate | Content words ~15%; function words ~35% | Rayner, 2009 |
| Fixation duration range | 50-500 ms (bulk of distribution) | Rayner, 1998 |
If your data substantially deviates from these benchmarks, check calibration quality, task instructions, and participant compliance.
Common Pitfalls
-
Using only total reading time: TRT conflates early and late processing. If you only report TRT, you cannot determine when the effect arose. Always report at least one first-pass measure (GD) and one late measure (GPT or TRT) (Clifton et al., 2007).
-
Ignoring spillover effects: Many effects appear 1-2 words downstream of the critical word, especially for syntactic manipulations. Always analyze the spillover region (Rayner, 1998; Rayner & Pollatsek, 1989).
-
Substituting zero for skipped words: Skipped words should be treated as missing data for first-pass measures, not as zero reading time. Substituting zero artificially deflates means and inflates variance.
-
Using ANOVA instead of LMMs: F1/F2 ANOVA is outdated for psycholinguistic data. LMMs with crossed random effects properly handle the variance structure (Baayen et al., 2008; Barr et al., 2013).
-
Over-interpreting first fixation duration: FFD is contaminated by refixation planning. When a substantial proportion of words receive multiple first-pass fixations, GD is more informative (Rayner, 2009).
-
Defining ROIs post-hoc: Selecting regions of interest after seeing the data inflates Type I error. Define ROIs a priori based on linguistic theory.
-
Ignoring comprehension accuracy: If participants are not reading for comprehension (accuracy < 80%), eye-movement patterns are not interpretable as reflecting normal reading processes (Rayner et al., 2006).
-
Not reporting data loss: Always report the percentage of trials excluded at each cleaning step and the percentage of words skipped in the critical region.
Minimum Reporting Checklist
Based on Clifton et al. (2007) and current standards in psycholinguistics:
- Eye-tracker model and sampling rate (minimum 1000 Hz recommended; 500 Hz acceptable; Rayner, 2009)
- Viewing distance and display specifications (font size, characters per degree)
- Calibration procedure and accuracy threshold (typically < 0.5 degrees average error)
- Fixation duration cutoffs (lower and upper bounds) with citations
- Data cleaning steps and percentage of data excluded at each step
- Skipping rates for the critical region by condition
- ROI definitions with linguistic justification
- All relevant reading measures (at minimum: GD, GPT, TRT for the critical region; GD for spillover)
- Statistical model specification (random effects structure, any transformations)
- Software for data analysis (with version)
- Comprehension question accuracy (mean, exclusion threshold)
- Number of participants and items after exclusions
References
- Baayen, R. H., Davidson, D. J., & Bates, D. M. (2008). Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language, 59, 390-412.
- Baayen, R. H., & Milin, P. (2010). Analyzing reaction times. International Journal of Psychological Research, 3, 12-28.
- Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68, 255-278.
- Bates, D., Kliegl, R., Vasishth, S., & Baayen, H. (2015). Parsimonious mixed models. arXiv:1506.04967.
- Clark, H. H. (1973). The language-as-fixed-effect fallacy: A critique of language statistics in psychological research. Journal of Verbal Learning and Verbal Behavior, 12, 335-359.
- Clifton, C., Staub, A., & Rayner, K. (2007). Eye movements in reading words and sentences. In R. P. G. van Gompel, M. H. Fischer, W. S. Murray, & R. L. Hill (Eds.), Eye movements: A window on mind and brain. Amsterdam: Elsevier.
- Drieghe, D., Rayner, K., & Pollatsek, A. (2008). Mislocated fixations can account for parafoveal-on-foveal effects in eye movements during reading. Quarterly Journal of Experimental Psychology, 61, 1239-1249.
- Lo, S., & Andrews, S. (2015). To transform or not to transform: Using generalized linear mixed models to analyse reaction time data. Frontiers in Psychology, 6, 1171.
- Matuschek, H., Kliegl, R., Vasishth, S., Baayen, H., & Bates, D. (2017). Balancing Type I error and power in linear mixed models. Journal of Memory and Language, 94, 305-315.
- McConkie, G. W., & Rayner, K. (1975). The span of the effective stimulus during a fixation in reading. Perception & Psychophysics, 17, 578-586.
- Rayner, K. (1975). The perceptual span and peripheral cues in reading. Cognitive Psychology, 7, 65-81.
- Rayner, K. (1998). Eye movements in reading and information processing: 20 years of research. Psychological Bulletin, 124, 372-422.
- Rayner, K. (2009). The 35th Sir Frederick Bartlett Lecture: Eye movements and attention in reading, scene perception, and visual search. Quarterly Journal of Experimental Psychology, 62, 1457-1506.
- Rayner, K., Chace, K. H., Slattery, T. J., & Ashby, J. (2006). Eye movements as reflections of comprehension processes in reading. Scientific Studies of Reading, 10, 241-255.
- Rayner, K., & Pollatsek, A. (1989). The psychology of reading. Englewood Cliffs, NJ: Prentice Hall.
See references/measure-computation-guide.md for step-by-step computation procedures and worked examples.