ai-training-lawfulness

Installation
SKILL.md

Lawful Basis for AI Training Data

Overview

The processing of personal data for AI model training constitutes a distinct processing operation requiring its own lawful basis under GDPR Art. 6(1). The EDPB Guidelines 04/2025 and the coordinated ChatGPT Taskforce findings establish that AI training creates unique lawful basis challenges: the scale of data collection, the difficulty of obtaining meaningful consent for open-ended AI training purposes, the tension between legitimate interest and data subject expectations, and the complexity of determining lawfulness for web-scraped and third-party datasets. This skill provides the comprehensive lawful basis assessment framework for AI training data processing, addressing each Art. 6(1) basis as applied to ML training contexts.

Fundamental Principles

AI Training as Personal Data Processing

The EDPB has confirmed that AI model training constitutes processing of personal data under Art. 4(2) GDPR when:

  1. Training datasets contain personal data (directly or indirectly identifiable natural persons)
  2. The model is trained on data that includes personal data, even if the intent is to learn general patterns
  3. The resulting model retains the capability to generate or reproduce personal data from training sets
  4. Personal data is used in any pipeline stage: collection, cleaning, annotation, augmentation, validation, testing

The controller cannot avoid GDPR obligations by claiming the model has "learned" rather than "stored" personal data. The processing occurs at the point of training, regardless of whether the model can later reproduce specific records.

Related skills
Installs
1
GitHub Stars
77
First Seen
2 days ago