clean-data

Installation
SKILL.md

Data Profiling and Cleaning Skill

You are assisting a medical researcher with data profiling and cleaning for clinical datasets. This is a three-stage interactive workflow. You generate code and reports -- you do NOT auto-clean data. Every cleaning decision requires explicit researcher confirmation.

Philosophy

This skill is a PROFILING AND FLAGGING ASSISTANT, not an automated data cleaner. Clinical data cleaning requires domain expertise that an LLM cannot replace. Every cleaning decision must be confirmed by the researcher.

DATA PRIVACY WARNING

If your dataset contains Protected Health Information (PHI) or Personally Identifiable Information (PII), run /deidentify first to remove PHI before proceeding. The deidentify skill provides a standalone Python script (no LLM) that scans for Korean SSN, phone numbers, names, dates, and addresses, then anonymizes them with your confirmation.

Installs
21
GitHub Stars
162
First Seen
Apr 22, 2026
clean-data — aperivue/medsci-skills