EEG Preprocessing Pipeline Guide

SKILL.md

EEG Preprocessing Pipeline Guide

Purpose

EEG preprocessing transforms raw electrophysiological recordings into clean data suitable for analysis. Unlike generic signal processing, every preprocessing decision in EEG involves domain-specific trade-offs: filtering at the wrong cutoff distorts ERP component morphology, choosing the wrong reference scheme biases topographic maps, and automated artifact rejection with incorrect parameters either leaves artifacts in the data or removes real neural signal.

A competent programmer without EEG training would not know that a 1 Hz high-pass filter is needed before ICA but distorts slow ERP components, that average reference requires a minimum of 64 channels, or that the order of preprocessing steps matters critically. This skill encodes the domain judgment required to build a correct EEG preprocessing pipeline.

When to Use This Skill

  • Setting up an EEG preprocessing pipeline for ERP, time-frequency, or connectivity analysis
  • Choosing filter parameters for specific analysis goals
  • Deciding between ICA and ASR for artifact removal
  • Selecting an appropriate re-referencing scheme
  • Performing quality control on preprocessed EEG data
  • Reviewing or troubleshooting an existing EEG preprocessing pipeline

Research Planning Protocol

Before executing the domain-specific steps below, you MUST:

  1. State the research question — What specific question is this analysis/paradigm addressing?
  2. Justify the method choice — Why is this approach appropriate? What alternatives were considered?
  3. Declare expected outcomes — What results would support vs. refute the hypothesis?
  4. Note assumptions and limitations — What does this method assume? Where could it mislead?
  5. Present the plan to the user and WAIT for confirmation before proceeding.

For detailed methodology guidance, see the research-literacy skill.

⚠️ Verification Notice

This skill was generated by AI from academic literature. All parameters, thresholds, and citations require independent verification before use in research. If you find errors, please open an issue.

Standard Preprocessing Pipeline Order

The recommended order of preprocessing steps, based on established best practices (Luck, 2014; Onton & Makeig, 2006; Bigdely-Shamlo et al., 2015):

1. Import and inspect raw data
2. Remove (mark) bad channels
3. High-pass filter
4. Line noise removal
5. Re-reference
6. ICA decomposition and artifact removal
7. Interpolate bad channels
8. Epoch and baseline correct
9. Epoch rejection by amplitude threshold

Critical ordering constraints:

  • ICA must come after high-pass filtering (Step 3) because low-frequency drift degrades ICA decomposition (Winkler et al., 2015)
  • Bad channel removal (Step 2) must precede ICA because bad channels degrade component estimation
  • Bad channel interpolation (Step 7) must come after ICA to avoid ICA learning the interpolated data
  • Re-referencing (Step 5) should precede ICA so that all components are in a common reference frame

Step 1: Import and Inspect Raw Data

  • Convert to a standard format (EDF, BDF, or tool-native) if needed
  • Visually scroll through the entire recording to identify gross artifacts (disconnected electrodes, large muscle bursts, saturated channels)
  • Note segments with excessive noise for later rejection
  • Check sampling rate: 250-512 Hz typical for ERP; 1000+ Hz for high-frequency oscillatory analysis (Cohen, 2014)

Step 2: Remove Bad Channels

Bad channels contribute noise to re-referencing, ICA, and spatial interpolation. Identify them before other steps.

Identification Criteria

Criterion Threshold Source
Flat signal (zero variance) Variance < 0.5 uV^2 for > 5 s Bigdely-Shamlo et al., 2015
Excessive noise Channel variance > 3 SD above the mean of all channels Bigdely-Shamlo et al., 2015
Low correlation with neighbors Mean correlation with neighboring channels < 0.4 Bigdely-Shamlo et al., 2015
Excessive line noise 50/60 Hz power > 4 SD above the mean PREP pipeline (Bigdely-Shamlo et al., 2015)

Practical Limits

  • Remove no more than 10% of channels (e.g., 6 of 64). If more channels are bad, consider re-collecting data (Keil et al., 2014)
  • Mark bad channels for later interpolation (Step 7); do not interpolate yet

Step 3: High-Pass Filtering

High-pass filtering removes slow drifts from skin potentials, electrode drift, and movement artifacts.

Analysis Goal Cutoff Frequency Filter Type Source
ERP analysis 0.1 Hz FIR zero-phase Luck, 2014; Tanner et al., 2015
ICA decomposition 1 Hz FIR zero-phase Winkler et al., 2015
Time-frequency analysis 0.1 Hz FIR zero-phase Cohen, 2014
Slow cortical potentials 0.01 Hz FIR zero-phase Luck, 2014

Critical domain knowledge: For ERP studies, use 0.1 Hz for the final analysis data but 1 Hz for the ICA decomposition step. The recommended workflow is:

  1. Filter a copy of the data at 1 Hz for ICA
  2. Run ICA on the 1 Hz-filtered copy
  3. Apply the ICA weights (unmixing matrix) to the original 0.1 Hz-filtered data
  4. This preserves slow ERP components while giving ICA clean decomposition (Winkler et al., 2015)

Why not 1 Hz for ERPs? A 1 Hz high-pass filter distorts ERP waveforms by introducing artificial pre-stimulus baseline shifts and reducing the amplitude of sustained components like the sustained negativity or the P3b (Tanner et al., 2015; Acunzo et al., 2012).

Filter Specifications

Parameter Recommendation Source
Filter type FIR (Finite Impulse Response), zero-phase Widmann et al., 2015
Design Windowed sinc (Hamming or Blackman window) Widmann et al., 2015
Transition bandwidth 2x the cutoff frequency (e.g., 0.2 Hz for a 0.1 Hz cutoff), or the EEGLAB/MNE default Widmann et al., 2015
Filter order Determined by transition bandwidth; typically 3x sampling rate / transition bandwidth Widmann et al., 2015
Phase distortion Zero (use filtfilt or FIR zero-phase); never use causal filtering for offline analysis Widmann et al., 2015

Domain warning: IIR (Butterworth) filters introduce phase distortion that shifts ERP peak latencies. Always use FIR zero-phase filters for ERP analysis unless there is a specific reason for causal filtering (Widmann et al., 2015).

Step 4: Line Noise Removal

Remove power line noise at 50 Hz (Europe, Asia) or 60 Hz (Americas) and harmonics.

Method Description When to Use Source
Notch filter Band-stop filter at 50/60 Hz Simple but removes neural signal at that frequency Not recommended for oscillatory analysis
CleanLine Adaptive frequency-domain regression Preferred for most analyses; preserves neural signal near 50/60 Hz Mullen et al., 2012
ZapLine Removes line noise via DSS decomposition Alternative to CleanLine; effective for MEG and EEG de Cheveigne, 2020
Spectral interpolation Interpolates the notched frequency band Preserves spectral continuity Leske & Dalal, 2019

Recommendation: Use CleanLine or ZapLine over notch filters. Notch filters create spectral distortion ("ringing") and remove real neural oscillatory power in the gamma band near 50/60 Hz (Muthukumaraswamy, 2013).

Step 5: Re-Referencing

EEG signals are always measured as potential differences relative to a reference. The choice of reference affects all downstream analyses.

Reference Scheme When to Use Requirements Source
Average reference Default for dense arrays Minimum 64 channels with good head coverage Dien, 1998; Luck, 2014
Linked mastoids Low-density arrays (< 64 ch) Both mastoid electrodes clean Luck, 2014
Cz reference During ICA only (if Cz was recording reference) -- Convention
REST (Reference Electrode Standardization Technique) Theoretical zero-reference approximation Requires forward model Yao, 2001
Infinity reference Approximation of neutral reference Forward model, dense arrays Yao, 2001

Decision logic:

How many clean channels do you have?
 |
 +-- >= 64 with good head coverage
 | --> Average reference (Dien, 1998)
 |
 +-- 32-63 channels
 | --> Linked mastoids or average reference
 | (average reference becomes unreliable with sparse coverage)
 |
 +-- < 32 channels
 --> Linked mastoids (Luck, 2014)

Domain warning: Average reference assumes dense, uniform electrode coverage of the head. With sparse arrays (< 64 channels) or missing channels, the average reference is biased and can distort topographies (Dien, 1998).

Step 6: ICA Decomposition and Artifact Removal

Independent Component Analysis (ICA) separates the EEG signal into statistically independent spatial components, allowing identification and removal of artifact sources (Onton & Makeig, 2006).

ICA Algorithm Selection

Algorithm Pros Cons Source
Infomax (runica) Standard, well-validated; most commonly used Assumes sub-Gaussian sources Bell & Sejnowski, 1995
Extended Infomax Handles both sub- and super-Gaussian sources Slightly slower Lee et al., 1999
AMICA Most accurate decomposition; models multiple models Very slow; requires more data Palmer et al., 2012
FastICA Fast computation Less stable; sensitive to initialization Hyvarinen, 1999
PICARD Fast, robust convergence Newer, less validated Ablin et al., 2018

Recommendation: Use Extended Infomax (default in EEGLAB) or PICARD (default in MNE-Python) for most analyses. AMICA is preferred for high-quality research when computation time is not a constraint.

Data Requirements for ICA

  • Minimum data points: At least 20 * n_channels^2 data points for stable decomposition (Onton & Makeig, 2006). For 64 channels: 20 * 64^2 = 81,920 samples (~5.3 minutes at 256 Hz)
  • High-pass filter at 1 Hz before ICA (Winkler et al., 2015)
  • Remove bad channels before ICA (bad channels produce bad components)

Artifact Component Identification

Automated Classification: ICLabel (Pion-Tonachini et al., 2019)

ICLabel classifies ICA components into 7 categories with probability estimates:

Category Action Typical Count
Brain Keep Most components
Eye (blink) Remove 1-2 components
Eye (lateral) Remove 0-1 components
Muscle Remove if probability > 0.8 0-3 components
Heart Remove if probability > 0.8 0-1 components
Line noise Remove if probability > 0.8 0-1 components
Channel noise Remove if probability > 0.8 0-2 components

Recommended threshold: Remove components classified as non-brain with probability > 0.80 (conservative) or > 0.50 (liberal) (Pion-Tonachini et al., 2019).

Manual Identification Criteria

Artifact Type Topography Time Course Power Spectrum
Blink Frontal maximum, bilateral Sharp transients (~300 ms) High power at low frequencies (< 5 Hz)
Saccade Frontal, lateralized (left-right asymmetry) Step-like deflections Low-frequency dominated
Cardiac Broad, diffuse or left-lateralized Periodic (~1 Hz) Peak at ~1 Hz
Muscle Peripheral (temporal, neck electrodes) High-frequency broadband noise Elevated power > 20 Hz

Domain insight: Typically remove 1-3 components for eye artifacts and 0-2 for other artifact types. Removing more than 5-6 components total risks removing neural signal. If many components appear artifactual, the data quality may be too poor for reliable analysis (Onton & Makeig, 2006).

Alternative: ASR (Artifact Subspace Reconstruction)

ASR is a real-time-capable method that identifies and reconstructs artifact-contaminated data segments (Mullen et al., 2015).

Parameter Default Conservative Liberal Source
Burst criterion (SD) 20 10-15 25-30 Mullen et al., 2015; Chang et al., 2020
Window length 0.5 s 0.5 s 1.0 s Mullen et al., 2015
Max rejected channels (proportion) 0.3 0.2 0.4 Mullen et al., 2015

When to use ASR vs. ICA:

Is data heavily contaminated with non-stationary artifacts?
 |
 +-- YES --> ASR first (for gross artifact removal), then ICA for residual eye artifacts
 |
 +-- NO --> ICA alone is usually sufficient

Domain insight: ASR and ICA can be combined. Apply ASR first to remove large transient artifacts (burst criterion = 20 SD), then run ICA on the ASR-cleaned data for residual artifact removal (Chang et al., 2020).

Step 7: Interpolate Bad Channels

After ICA, interpolate the bad channels identified in Step 2.

  • Method: Spherical spline interpolation (Perrin et al., 1989)
  • Maximum interpolation: No more than 10% of channels (Keil et al., 2014)
  • Order: Interpolate after ICA so that ICA does not learn interpolated (non-independent) data
  • Verify: Check that interpolated channel time courses are consistent with neighbors

Step 8: Epoch and Baseline Correct

  • Epoch time window: Typically -200 to 800 ms for ERP; adjust based on component of interest (Luck, 2014)
  • Baseline window: -200 to 0 ms pre-stimulus (standard for ERP; Luck, 2014)
  • Baseline correction: Subtract the mean of the baseline window from each time point in the epoch
Analysis Type Epoch Window Baseline Window Source
Standard ERP -200 to 800 ms -200 to 0 ms Luck, 2014
Late ERP (P600, LPP) -200 to 1000 ms -200 to 0 ms Luck, 2014
MMN -100 to 400 ms -100 to 0 ms Naatanen et al., 2007
Time-frequency -1000 to 2000 ms -500 to -200 ms (or single-trial normalization) Cohen, 2014

Domain warning: For time-frequency analysis, use a longer baseline period (-500 to -200 ms) and avoid the immediate pre-stimulus period to prevent contamination by anticipatory activity. Alternatively, use single-trial baseline normalization (Cohen, 2014).

Step 9: Epoch Rejection by Amplitude Threshold

After ICA has removed stereotyped artifacts, apply amplitude-based rejection to catch remaining transient artifacts.

Criterion Threshold Source
Peak-to-peak amplitude Reject if > 100-150 uV Luck, 2014
Absolute amplitude Reject if any sample exceeds +/- 75-100 uV Luck, 2014
Flat epoch Reject if max - min < 0.5 uV (dead channel/epoch) Bigdely-Shamlo et al., 2015
Step function (for eye blinks missed by ICA) Reject if > 80 uV step in 200 ms moving window Luck, 2014

Quality Benchmarks

Metric Acceptable Concerning Source
Proportion of epochs rejected < 25% > 30% indicates poor data quality Keil et al., 2014
Minimum retained trials per condition 30+ < 20 is unreliable for ERPs Boudewyn et al., 2018
Minimum retained trials (absolute floor) 15 < 10 is unusable Luck, 2014

Low-Pass Filtering (Optional, Post-Epoching)

Analysis Type Low-Pass Cutoff Source
ERP (visualization and analysis) 30 Hz Luck, 2014
ERP (preserving high-frequency info) 40 Hz Luck, 2014
Oscillatory (alpha, beta) No low-pass or 100 Hz Cohen, 2014
Oscillatory (gamma) No low-pass or 200 Hz Cohen, 2014

Domain warning: Low-pass filtering should be done after epoching to avoid edge artifacts. For ERP grand averages, a 20-30 Hz low-pass is common for visualization but should not be applied before statistical analysis of peak amplitudes/latencies, as it can shift peaks (Luck, 2014).

Common Pitfalls

  1. Using 1 Hz high-pass for ERP analysis: A 1 Hz cutoff distorts slow ERP components. Use 0.1 Hz for final data; apply 1 Hz only for ICA training (Tanner et al., 2015; Acunzo et al., 2012)
  2. Average reference with too few channels: Average reference with < 64 channels and incomplete head coverage biases topographies (Dien, 1998)
  3. Running ICA on unfiltered data: Low-frequency drift degrades ICA decomposition quality. Always high-pass at 1 Hz before ICA (Winkler et al., 2015)
  4. Removing too many ICA components: Removing > 5-6 components risks removing neural signal. If many components are artifactual, the data quality is too poor (Onton & Makeig, 2006)
  5. Interpolating before ICA: Interpolated channels are linear combinations of neighbors, violating ICA's independence assumption. Interpolate after ICA (Luck, 2014)
  6. Using IIR (Butterworth) filters: IIR filters introduce phase distortion that shifts ERP peak latencies. Use FIR zero-phase filters (Widmann et al., 2015)
  7. Not checking the number of retained trials: If artifact rejection removes > 25% of trials, reconsider data quality or preprocessing parameters (Keil et al., 2014)
  8. Applying notch filters for oscillatory analysis: Notch filters remove real neural gamma activity near 50/60 Hz. Use CleanLine or ZapLine instead (Muthukumaraswamy, 2013)

Minimum Reporting Checklist

Based on Keil et al. (2014) and Luck (2014):

  • Sampling rate (original and any downsampling applied)
  • High-pass filter cutoff, type (FIR/IIR), order, transition bandwidth
  • Low-pass filter cutoff, type, order (if applied)
  • Line noise removal method (notch, CleanLine, ZapLine)
  • Re-referencing scheme (average, linked mastoids, etc.) and when applied
  • Bad channel identification criteria and number removed
  • Bad channel interpolation method (spherical spline)
  • ICA algorithm used and number of components computed
  • Artifact component identification method (manual, ICLabel, ADJUST) and criteria
  • Number and type of components removed (mean and range across subjects)
  • ASR parameters if used (burst criterion, window length)
  • Epoch time window and baseline correction window
  • Epoch rejection criteria (thresholds) and proportion rejected (mean and range)
  • Minimum number of retained trials per condition
  • Software package and version (EEGLAB, MNE-Python, FieldTrip)

References

  • Ablin, P., Cardoso, J. F., & Gramfort, A. (2018). Faster independent component analysis by preconditioning with Hessian approximations. IEEE Transactions on Signal Processing, 66(15), 4040-4049.
  • Acunzo, D. J., MacKenzie, G., & van Rossum, M. C. W. (2012). Systematic biases in early ERP and ERF components as a result of high-pass filtering. Journal of Neuroscience Methods, 209(1), 212-218.
  • Bell, A. J., & Sejnowski, T. J. (1995). An information-maximization approach to blind separation and blind deconvolution. Neural Computation, 7(6), 1129-1159.
  • Bigdely-Shamlo, N., Mullen, T., Kothe, C., Su, K. M., & Robbins, K. A. (2015). The PREP pipeline: Standardized preprocessing for large-scale EEG analysis. Frontiers in Neuroinformatics, 9, 16.
  • Boudewyn, M. A., Luck, S. J., Farrens, J. L., & Kappenman, E. S. (2018). How many trials does it take to get a significant ERP effect? Psychophysiology, 55(6), e13049.
  • Chang, C. Y., Hsu, S. H., Pion-Tonachini, L., & Jung, T. P. (2020). Evaluation of artifact subspace reconstruction for automatic artifact components removal in multi-channel EEG recordings. IEEE Transactions on Biomedical Engineering, 67(4), 1114-1121.
  • Cohen, M. X. (2014). Analyzing Neural Time Series Data: Theory and Practice. MIT Press.
  • de Cheveigne, A. (2020). ZapLine: A simple and effective method to remove power line artifacts. NeuroImage, 207, 116356.
  • Dien, J. (1998). Issues in the application of the average reference. Behavior Research Methods, Instruments, & Computers, 30(3), 449-457.
  • Hyvarinen, A. (1999). Fast and robust fixed-point algorithms for independent component analysis. IEEE Transactions on Neural Networks, 10(3), 626-634.
  • Keil, A., Debener, S., Gratton, G., et al. (2014). Committee report: Publication guidelines and recommendations for studies using electroencephalography and magnetoencephalography. Psychophysiology, 51(1), 1-21.
  • Lee, T. W., Girolami, M., & Sejnowski, T. J. (1999). Independent component analysis using an extended infomax algorithm for mixed subgaussian and supergaussian sources. Neural Computation, 11(2), 417-441.
  • Leske, S., & Dalal, S. S. (2019). Reducing power line noise in EEG and MEG data via spectrum interpolation. NeuroImage, 189, 763-776.
  • Luck, S. J. (2014). An Introduction to the Event-Related Potential Technique (2nd ed.). MIT Press.
  • Mullen, T. R., Kothe, C. A. E., Chi, Y. M., et al. (2015). Real-time neuroimaging and cognitive monitoring using wearable dry EEG. IEEE Transactions on Biomedical Engineering, 62(11), 2553-2567.
  • Muthukumaraswamy, S. D. (2013). High-frequency brain activity and muscle artifacts in MEG/EEG. Clinical Neurophysiology, 124(8), 1418-1426.
  • Naatanen, R., Paavilainen, P., Rinne, T., & Alho, K. (2007). The mismatch negativity (MMN). Clinical Neurophysiology, 118(12), 2544-2590.
  • Onton, J., & Makeig, S. (2006). Information-based modeling of event-related brain dynamics. Progress in Brain Research, 159, 99-120.
  • Palmer, J. A., Kreutz-Delgado, K., & Makeig, S. (2012). AMICA: An adaptive mixture of independent component analyzers with shared components. Technical Report, Swartz Center for Computational Neuroscience.
  • Perrin, F., Pernier, J., Bertrand, O., & Echallier, J. F. (1989). Spherical splines for scalp potential and current density mapping. Electroencephalography and Clinical Neurophysiology, 72(2), 184-187.
  • Pion-Tonachini, L., Kreutz-Delgado, K., & Makeig, S. (2019). ICLabel: An automated electroencephalographic independent component classifier, dataset, and website. NeuroImage, 198, 181-197.
  • Tanner, D., Morgan-Short, K., & Luck, S. J. (2015). How inappropriate high-pass filters can produce artifactual effects and incorrect conclusions in ERP studies of language and cognition. Psychophysiology, 52(8), 997-1009.
  • Widmann, A., Schroger, E., & Maess, B. (2015). Digital filter design for electrophysiological data -- A practical approach. Journal of Neuroscience Methods, 250, 34-46.
  • Winkler, I., Debener, S., Muller, K. R., & Tangermann, M. (2015). On the influence of high-pass filtering on ICA-based artifact reduction in EEG-ERP. Proceedings of EMBC, 4101-4105.
  • Yao, D. (2001). A method to standardize a reference of scalp EEG recordings to a point at infinity. Physiological Measurement, 22(4), 693-711.

See references/ for step-by-step pipeline code templates and parameter lookup tables.

Weekly Installs
0
GitHub Stars
10
First Seen
Jan 1, 1970