multi-source-data-merger
Installation
SKILL.md
Multi Source Data Merger
Overview
This skill guides the process of merging data from multiple sources with different formats into a unified dataset. It covers reading heterogeneous file formats, applying field name mappings, resolving conflicts using priority ordering, and generating comprehensive output files including conflict reports.
Workflow
Step 1: Analyze Requirements and Source Files
Before writing any code, thoroughly understand the task:
- Identify all source files and their formats (JSON, CSV, Parquet, XML, etc.)
- Determine the merge key (e.g.,
user_id,record_id) that links records across sources - Review field mapping requirements - source fields may have different names that map to common output fields
- Understand conflict resolution rules - typically based on source priority ordering
- Identify expected output formats and structure
Important: Do not attempt to read binary formats (Parquet, Excel, etc.) as text files - use appropriate libraries.