dataset-curator
Installation
SKILL.md
Dataset Curator
This skill ensures that the data you feed to your AI is clean, accurate, and safe.
Capabilities
1. Data Cleaning & Structuring
- Removes duplicates, boilerplate, and noisy text from knowledge bases.
- Converts unstructured documents into clean Markdown or JSON/Vector-friendly formats.
2. Privacy Audit
- Scans datasets for PII (Personal Identifiable Information) before they are sent to LLMs or vector databases.
Usage
- "Clean up the
knowledge/directory and structure it for better RAG performance." - "Audit this customer feedback dataset for sensitive info before we use it for AI training."
Knowledge Protocol
- This skill adheres to the
knowledge/orchestration/knowledge-protocol.md. It automatically integrates Public, Confidential (Company/Client), and Personal knowledge tiers, prioritizing the most specific secrets while ensuring no leaks to public outputs.
Related skills
More from famaoai-creator/gemini-skills
data-transformer
Convert between CSV, JSON, and YAML formats.
23pmo-governance-lead
Output file path
21completeness-scorer
Evaluate text completeness based on criteria.
21local-reviewer
Retrieves git diff of staged files for pre-commit AI code review.
21api-fetcher
Fetch data from REST/GraphQL APIs securely.
21prompt-optimizer
Optional output file path
21