Reading Time Analysis

Purpose

This skill encodes expert methodological knowledge for analyzing eye-tracking data from reading experiments. A competent programmer without psycholinguistics training would likely compute a single "reading time" per word, missing the critical insight that different eye-tracking measures tap different stages of language processing. Choosing the wrong measure for your research question -- or failing to account for spillover effects, skipping patterns, and the distinction between first-pass and second-pass reading -- leads to misattribution of cognitive processes.

When to Use

Use this skill when:

Analyzing eye-movement data from reading experiments (sentence or passage reading)
Selecting which eye-tracking measures to report for a given linguistic manipulation
Defining regions of interest and handling spillover effects
Setting up statistical models for eye-tracking reading data
Cleaning and filtering fixation data for reading analyses

Do not use this skill when:

Analyzing self-paced reading data (see self-paced-reading-designer for that paradigm)
Analyzing eye movements in visual search or scene viewing (different fixation patterns)
Working with eye-tracking data from non-reading tasks (e.g., visual world paradigm)

Research Planning Protocol

Before executing the domain-specific steps below, you MUST:

State the research question -- What specific question is this analysis/paradigm addressing?
Justify the method choice -- Why is this approach appropriate? What alternatives were considered?
Declare expected outcomes -- What results would support vs. refute the hypothesis?
Note assumptions and limitations -- What does this method assume? Where could it mislead?
Present the plan to the user and WAIT for confirmation before proceeding.

For detailed methodology guidance, see the research-literacy skill.

⚠️ Verification Notice

This skill was generated by AI from academic literature. All parameters, thresholds, and citations require independent verification before use in research. If you find errors, please open an issue.

Eye-Tracking Reading Measures Hierarchy

Measure Definitions and Cognitive Interpretations

The following measures are ordered from earliest to latest processing stages. This hierarchy reflects the temporal unfolding of language comprehension during reading (Rayner, 1998, 2009; Clifton et al., 2007).

First-Pass Measures (Before Leaving the Region)

Measure	Definition	Cognitive Process	When to Use
First Fixation Duration (FFD)	Duration of the first fixation on a word during first pass	Early lexical access; initial contact with the word (Rayner, 1998)	When testing early word recognition effects (frequency, predictability)
Single Fixation Duration (SFD)	Duration of the only fixation on a word, when exactly one first-pass fixation occurs	Cleaner measure of early lexical processing than FFD (Rayner, 2009)	When most words receive one fixation; avoids refixation confounds
Gaze Duration (GD)	Sum of all first-pass fixation durations on a word (before eyes leave the word in either direction)	Lexical processing / word identification (Rayner, 1998, 2009)	Default first-pass measure for most word-level analyses

Late Measures (After Leaving the Region)

Measure	Definition	Cognitive Process	When to Use
Go-Past Time (GPT) / Regression Path Duration	Time from first fixation on the word until first fixation to the right of the word (includes any regressions out and back)	Integration difficulty; signals reanalysis of prior material (Clifton et al., 2007)	When testing syntactic garden-path effects, semantic anomalies, discourse integration
Total Reading Time (TRT)	Sum of all fixation durations on a word (first pass + regressions back)	Overall processing difficulty (Rayner, 1998)	When interested in total processing cost regardless of time course
Regression Probability (Reg-out)	Binary: did the reader make a regression from this region?	Reanalysis / comprehension difficulty (Clifton et al., 2007)	When interested in whether (not how long) reanalysis occurred
Regression-in Probability	Binary: did the reader regress back to this region from downstream?	Downstream difficulty triggers revisitation (Rayner & Pollatsek, 1989)	When testing whether a region is revisited after later processing fails

Decision Tree: Which Measure for Which Question?

What stage of processing is your manipulation expected to affect?
|
+-- EARLY LEXICAL (word frequency, orthographic regularity, predictability)
| |
| +-- Use GAZE DURATION as primary measure (Rayner, 1998, 2009)
| +-- Report FIRST FIXATION DURATION as supplementary
| +-- Report SINGLE FIXATION DURATION if high proportion of
| single-fixation cases (Rayner, 2009)
|
+-- LATE LEXICAL / POST-LEXICAL (semantic plausibility, thematic fit)
| |
| +-- Use GAZE DURATION for early effects
| +-- Use GO-PAST TIME for integration effects (Clifton et al., 2007)
| +-- Use TOTAL READING TIME for overall effects
|
+-- SYNTACTIC (garden-path, structural ambiguity, reanalysis)
| |
| +-- Use GO-PAST TIME as primary measure (Clifton et al., 2007)
| +-- Use REGRESSION PROBABILITY as complementary binary measure
| +-- Effects often appear in the SPILLOVER REGION (1-2 words
| post-critical; Rayner & Pollatsek, 1989)
|
+-- DISCOURSE / PRAGMATIC (reference resolution, inference, coherence)
| |
| +-- Use GO-PAST TIME and TOTAL READING TIME
| +-- Effects are typically late and may span multiple words
| +-- Consider REGRESSION-IN probability for earlier regions
|
+-- EXPLORATORY / UNKNOWN TIMING
 |
 +-- Report ALL major measures: FFD, GD, GPT, TRT, Reg-out
 +-- Let the pattern across measures inform process interpretation

First-Pass vs. Second-Pass Distinction

Category	Definition	Includes
First pass	All fixations from first entering a region until first leaving it (in either direction)	FFD, SFD, GD
Second pass	All fixations on a region after first leaving it	Re-reading time (TRT minus first-pass time)

Why this matters: First-pass measures reflect initial processing; second-pass measures reflect recovery from processing difficulty encountered downstream. Conflating them obscures when processing difficulty arose.

Region of Interest (ROI) Definition

Word-Level ROIs

The most common unit of analysis is the single word (Rayner, 1998)
For multi-word critical regions, report analyses at both word level and region level

Multi-Word ROIs

Sometimes necessary for syntactic manipulations where the critical structure spans multiple words
Define ROIs a priori based on linguistic structure, not post-hoc based on where effects appear
Report the number of characters and words in each ROI

Spillover Effects

Spillover is the delayed manifestation of a processing effect on fixations one or more words downstream of the critical word (Rayner & Pollatsek, 1989).

Typical spillover range: 1-2 words after the critical word (Rayner, 1998)
Always analyze the spillover region (word n+1, sometimes n+2) in addition to the critical word
Spillover is most common for first-pass measures (GD, FFD)
Pre-target region (word n-1) should also be checked to verify no confounding baseline differences

Parafoveal Preview Effects

Words are partially processed before they are directly fixated -- the parafoveal preview benefit (Rayner, 1975; Rayner, 2009)
Parafoveal preview extends to approximately 7-8 characters to the right of fixation in English (McConkie & Rayner, 1975)
This means effects of word n's properties can appear on the last fixation of word n-1 (parafoveal-on-foveal effects; Drieghe et al., 2008)

Data Cleaning

Fixation Duration Cutoffs

Criterion	Value	Rationale
Short fixation merge	< 80 ms within 1 character of another fixation: merge with nearest fixation	Too brief for meaningful processing; likely corrective saccade (Rayner & Pollatsek, 1989)
Short fixation exclude	< 80 ms (not adjacent to another fixation): exclude	Not informative for reading (Rayner & Pollatsek, 1989)
Long fixation exclude	> 800 ms: exclude	Likely track loss, inattention, or blink artifact (Rayner & Pollatsek, 1989)
Alternative long cutoff	> 1000 ms or > 1200 ms	Used in some labs; report which cutoff and justify

Note: Some researchers use 50 ms as the lower bound and 1000-1200 ms as the upper bound. The critical requirement is to report your exact cutoffs and the percentage of data excluded.

Trial-Level Exclusions

Criterion	Action	Rationale
Track loss	Exclude trial	Unreliable position data
Blinks on critical region	Exclude trial	Missing fixation data on the ROI
First-pass skip of critical word	Exclude from first-pass measures (FFD, SFD, GD); include in TRT	Word was not fixated during first pass
Comprehension accuracy	Exclude participants below 80% on comprehension questions	Ensures reading for comprehension (Rayner et al., 2006)

Skipping Rate Considerations

Short, high-frequency, and predictable words are skipped 10-30% of the time (Rayner, 1998, 2009)
Content words are skipped ~15% of the time; function words ~35% (Rayner, 2009)
If skipping rates differ across conditions, this is informative -- report it
For first-pass measures, words that are skipped contribute no data, not zero reading time
Do not substitute zero for skipped words -- this conflates fast processing with no fixation

Statistical Modeling

Linear Mixed-Effects Models (LMMs)

Eye-tracking reading data should be analyzed with LMMs with crossed random effects for subjects and items (Baayen et al., 2008; Baayen, Davidson, & Bates, 2008):

# R formula (lme4 syntax):
gaze_duration ~ condition + (1 + condition | subject) + (1 + condition | item)

Why crossed random effects: Reading experiments use a Latin square design where every subject sees every item, but items rotate across conditions between subjects. Both subjects and items are random samples, and both contribute variance (Clark, 1973; Baayen et al., 2008).

Random Effects Structure

Approach	Specification	When to Use	Citation
Maximal	Random intercepts + all random slopes justified by design	Default starting point	Barr et al., 2013
Parsimonious	Remove random correlations first, then random slopes that explain ~0 variance	When maximal model fails to converge	Bates et al., 2015; Matuschek et al., 2017

Convergence protocol (Barr et al., 2013; Bates et al., 2015):

Fit maximal model (all by-subject and by-item random slopes for within-unit factors)
If convergence fails: remove correlations between random effects (use || in lme4)
If still fails: remove the random slope with the smallest variance component
Report the final model structure and note any simplifications

Distributional Considerations

Reading times are right-skewed and bounded below by zero. Options:

Approach	When to Use	Citation
Log-transform	Simple; commonly used; adequate for many datasets	Standard in psycholinguistics
Inverse transform (-1000/RT)	Can outperform log for skewed RT data	Baayen & Milin, 2010
Generalized LMM (Gamma)	Models the skewness directly; avoids back-transformation issues	Lo & Andrews, 2015
Raw RT with residual checks	When effects are large and residuals are approximately normal	Baayen et al., 2008

Recommendation: Start with raw reading times in the LMM. Check residual plots. If residuals are non-normal, apply log-transformation or fit a GLMM with Gamma family and identity link (Lo & Andrews, 2015).

Multiple Comparisons

When analyzing multiple reading measures on the same data:

Do not apply Bonferroni correction across measures -- each measure tests a different theoretical question (Clifton et al., 2007)
Do correct within each measure if testing multiple contrasts
Report effect sizes and confidence intervals alongside p-values

Typical Fixation Duration Benchmarks

These values serve as sanity checks for data quality (Rayner, 1998, 2009):

Measure	Typical Range (Silent Reading)	Citation
Average fixation duration	200-250 ms	Rayner, 1998, 2009
Average saccade length	7-9 characters (~2 degrees)	Rayner, 1998, 2009
Regression rate	10-15% of all saccades	Rayner, 1998
Word skipping rate	Content words ~15%; function words ~35%	Rayner, 2009
Fixation duration range	50-500 ms (bulk of distribution)	Rayner, 1998

If your data substantially deviates from these benchmarks, check calibration quality, task instructions, and participant compliance.

Common Pitfalls

Using only total reading time: TRT conflates early and late processing. If you only report TRT, you cannot determine when the effect arose. Always report at least one first-pass measure (GD) and one late measure (GPT or TRT) (Clifton et al., 2007).
Ignoring spillover effects: Many effects appear 1-2 words downstream of the critical word, especially for syntactic manipulations. Always analyze the spillover region (Rayner, 1998; Rayner & Pollatsek, 1989).
Substituting zero for skipped words: Skipped words should be treated as missing data for first-pass measures, not as zero reading time. Substituting zero artificially deflates means and inflates variance.
Using ANOVA instead of LMMs: F1/F2 ANOVA is outdated for psycholinguistic data. LMMs with crossed random effects properly handle the variance structure (Baayen et al., 2008; Barr et al., 2013).
Over-interpreting first fixation duration: FFD is contaminated by refixation planning. When a substantial proportion of words receive multiple first-pass fixations, GD is more informative (Rayner, 2009).
Defining ROIs post-hoc: Selecting regions of interest after seeing the data inflates Type I error. Define ROIs a priori based on linguistic theory.
Ignoring comprehension accuracy: If participants are not reading for comprehension (accuracy < 80%), eye-movement patterns are not interpretable as reflecting normal reading processes (Rayner et al., 2006).
Not reporting data loss: Always report the percentage of trials excluded at each cleaning step and the percentage of words skipped in the critical region.

Minimum Reporting Checklist

Based on Clifton et al. (2007) and current standards in psycholinguistics:

References

Baayen, R. H., Davidson, D. J., & Bates, D. M. (2008). Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language, 59, 390-412.
Baayen, R. H., & Milin, P. (2010). Analyzing reaction times. International Journal of Psychological Research, 3, 12-28.
Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68, 255-278.
Bates, D., Kliegl, R., Vasishth, S., & Baayen, H. (2015). Parsimonious mixed models. arXiv:1506.04967.
Clark, H. H. (1973). The language-as-fixed-effect fallacy: A critique of language statistics in psychological research. Journal of Verbal Learning and Verbal Behavior, 12, 335-359.
Clifton, C., Staub, A., & Rayner, K. (2007). Eye movements in reading words and sentences. In R. P. G. van Gompel, M. H. Fischer, W. S. Murray, & R. L. Hill (Eds.), Eye movements: A window on mind and brain. Amsterdam: Elsevier.
Drieghe, D., Rayner, K., & Pollatsek, A. (2008). Mislocated fixations can account for parafoveal-on-foveal effects in eye movements during reading. Quarterly Journal of Experimental Psychology, 61, 1239-1249.
Lo, S., & Andrews, S. (2015). To transform or not to transform: Using generalized linear mixed models to analyse reaction time data. Frontiers in Psychology, 6, 1171.
Matuschek, H., Kliegl, R., Vasishth, S., Baayen, H., & Bates, D. (2017). Balancing Type I error and power in linear mixed models. Journal of Memory and Language, 94, 305-315.
McConkie, G. W., & Rayner, K. (1975). The span of the effective stimulus during a fixation in reading. Perception & Psychophysics, 17, 578-586.
Rayner, K. (1975). The perceptual span and peripheral cues in reading. Cognitive Psychology, 7, 65-81.
Rayner, K. (1998). Eye movements in reading and information processing: 20 years of research. Psychological Bulletin, 124, 372-422.
Rayner, K. (2009). The 35th Sir Frederick Bartlett Lecture: Eye movements and attention in reading, scene perception, and visual search. Quarterly Journal of Experimental Psychology, 62, 1457-1506.
Rayner, K., Chace, K. H., Slattery, T. J., & Ashby, J. (2006). Eye movements as reflections of comprehension processes in reading. Scientific Studies of Reading, 10, 241-255.
Rayner, K., & Pollatsek, A. (1989). The psychology of reading. Englewood Cliffs, NJ: Prentice Hall.

See references/measure-computation-guide.md for step-by-step computation procedures and worked examples.

Reading Time Analysis

Reading Time Analysis

Purpose

When to Use

Research Planning Protocol

⚠️ Verification Notice

Eye-Tracking Reading Measures Hierarchy

Measure Definitions and Cognitive Interpretations

First-Pass Measures (Before Leaving the Region)

Late Measures (After Leaving the Region)

Decision Tree: Which Measure for Which Question?

First-Pass vs. Second-Pass Distinction

Region of Interest (ROI) Definition

Word-Level ROIs

Multi-Word ROIs

Spillover Effects

Parafoveal Preview Effects

Data Cleaning

Fixation Duration Cutoffs

Trial-Level Exclusions

Skipping Rate Considerations

Statistical Modeling

Linear Mixed-Effects Models (LMMs)

Random Effects Structure

Distributional Considerations

Multiple Comparisons

Typical Fixation Duration Benchmarks

Common Pitfalls

Minimum Reporting Checklist

References

More from haoxuanlithuai/awesome_cognitive_and_neuroscience_skills

eeg preprocessing pipeline guide

cognitive science statistical analysis

paper-to-skill extractor

creativity self-efficacy mediation analysis

verify skill

self-paced reading designer