ai-cleaning-data

Installation

SKILL.md

ai-cleaning-data

Use DSPy to normalize and fix messy data fields at scale. The core pattern - messy field value + field type/context → cleaned value + confidence - lets you handle inconsistent addresses, company names, dates, phone numbers, and free-text fields without writing a rule for every edge case.

The most effective approach: sample anomalies first, infer normalization rules, then apply deterministically where possible and use the LM only for ambiguous cases.

Step 1 - Understand the Cleaning Task

Before writing code, clarify:

What fields need cleaning? (addresses, phone numbers, dates, company names, free-text?)
What inconsistencies exist? (typos, format variations, abbreviations, mixed languages?)
What is the target format? Always define this explicitly — otherwise the LM improvises
How many rows? This determines whether to use LM for each row or rule inference + deterministic apply
Is there a gold standard? Even 50 manually-cleaned examples make optimization possible

Step 2 - Build a Single-Field Cleaner

Start with one field type. The signature takes the messy value plus explicit format instructions.

Related skills

More from lebsral/dspy-programming-not-prompting-lms-skills

Installs

Repository

lebsral/dspy-pr…s-skills

GitHub Stars

First Seen

6 days ago

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykPass

ai-cleaning-data

ai-cleaning-data

Step 1 - Understand the Cleaning Task

Step 2 - Build a Single-Field Cleaner

More from lebsral/dspy-programming-not-prompting-lms-skills

ai-switching-models

ai-stopping-hallucinations

ai-do

ai-reasoning

ai-building-chatbots

ai-improving-accuracy