ml-data-preprocessing
SKILL.md
Ml Data Preprocessing
Overview
Use this skill to define preprocessing that improves model quality without introducing leakage or unreproducible transforms.
Scope Boundaries
- Use this skill when the task matches the trigger condition described in
description. - Do not use this skill when the primary task falls outside this skill's domain.
Shared References
- Leakage prevention rules:
references/leakage-prevention-rules.md
Templates And Assets
- Preprocessing spec template:
assets/preprocessing-spec-template.md
Inputs To Gather
- Source datasets, schema quality, and time boundaries.
- Missing/outlier characteristics and domain constraints.
- Train/validation/test split policy.
- Reproducibility and compliance requirements.
Deliverables
- Preprocessing specification with transformation rationale.
- Leakage and data-quality validation plan.
- Reproducibility notes and versioning requirements.
Workflow
- Draft transform plan with
assets/preprocessing-spec-template.md. - Validate temporal and label safety via
references/leakage-prevention-rules.md. - Define split-safe transformations and quality checks.
- Verify transform repeatability across runs.
- Publish preprocessing contract and residual risks.
Quality Standard
- Transformations are deterministic and versioned.
- Leakage risk is explicitly checked and mitigated.
- Data loss/quality trade-offs are documented.
Failure Conditions
- Stop when preprocessing introduces label or temporal leakage.
- Stop when transforms are not reproducible.
- Escalate when data quality blocks decision-grade training.
Weekly Installs
2
Repository
kentoshimizu/sw…t-skillsGitHub Stars
4
First Seen
14 days ago
Security Audits
Installed on
opencode2
gemini-cli2
codebuddy2
github-copilot2
codex2
kimi-cli2