ai-training-data-class

Installation
SKILL.md

Sensitive Data Classification for AI/ML Training Datasets

Overview

AI and machine learning models trained on personal data raise distinct classification challenges. Training data may contain direct personal data, inferred special categories, proxy variables for protected characteristics, and data whose consent scope does not extend to model training. The EU AI Act (Regulation (EU) 2024/1689) imposes additional requirements for high-risk AI systems, including data governance obligations under Art. 10 that intersect with GDPR classification requirements. This skill provides a framework for classifying training data, detecting bias-relevant features, documenting data provenance, and verifying consent coverage.

GDPR and AI Act Intersection

GDPR Requirements for Training Data

GDPR Article Application to AI Training
Art. 5(1)(b) — Purpose limitation Training a model is a distinct processing purpose; if data was collected for customer service, using it for ML training requires a compatible purpose assessment or new lawful basis
Art. 5(1)(c) — Data minimisation Training datasets must not include more personal data than necessary for the model objective
Art. 6 — Lawful basis Model training requires its own lawful basis; legitimate interests (Art. 6(1)(f)) is most common, but requires LIA documentation
Art. 9 — Special categories If training data contains or enables inference of special category data, Art. 9(2) condition required
Art. 22 — Automated decision-making If the trained model makes decisions with legal or significant effects, additional safeguards apply
Art. 25 — Data protection by design Classification of training data is a by-design measure enabling appropriate technical protections
Art. 35 — DPIA High-risk AI processing (profiling, automated decision-making) requires DPIA
Related skills
Installs
1
GitHub Stars
77
First Seen
2 days ago