data-dictionary

Installation

SKILL.md

Data Dictionary: Automated Maintenance & Enhancement

Automate the update and maintenance of data dictionaries in Treasure Data - with 80-90% automated column descriptions that you can review, edit, and enhance. From schema extraction through TD write-back, this skill handles the heavy lifting while keeping you in control.

Usage

/data-dictionary [segment-name]

Example:

/data-dictionary customer_360

If segment name not provided, will list available Parent Segments.

Workflow Overview

This skill orchestrates 6 sequential stages:

Context Gathering - Collect business context (industry, company, use cases)
Schema Extraction - Extract schema from TD Parent Segment with sample data
Description Generation - Generate AI-powered column descriptions
Review Export - Export to CSV for human review
Validation - Validate edited CSV and import back to JSON
Write-back - Publish descriptions to Treasure Data

Each stage creates artifacts used by subsequent stages. Workflow supports resume from partial runs.

Stage 1: Context Gathering

Purpose: Collect business context to inform description generation.

Process: Ask 4 context questions ONE at a time (conversational, not all at once):

"What industry is this data from?" (e.g., e-commerce, SaaS, healthcare)
"Describe your company/business model briefly"
"What is this segment used for?" (default: customer segmentation)
"What are the primary use cases?" (e.g., marketing campaigns, analytics, personalization)

Output: Context stored in memory for Stage 3.

Progress:

✅ Stage 1 complete: Context gathered
➡️  Next: Stage 2 (Schema Extraction)

Stage 2: Schema Extraction

Purpose: Extract schema from TD Parent Segment with sample data for PII detection.

Command:

node src/index.js extract $SEGMENT --sample

Where $SEGMENT is the segment name from $ARGUMENTS[0].

Output:

✅ Stage 2 complete: Schema extracted
📁 Artifacts created:
   - ./schemas/{segment}.json
➡️  Next: Stage 3 (Description Generation)

Error Handling: If extraction fails:

Display error message
Offer options: [R] Retry | [E] Exit
If user chooses Retry, re-run extraction command
If user chooses Exit, stop workflow

Resume Detection: If ./schemas/{segment}.json already exists:

Ask: "Schema file found. Resume from Stage 3 or start fresh?"
If resume: skip to Stage 3
If fresh: delete schema file and re-run extraction

Stage 3: Description Generation

Purpose: Generate AI-powered column descriptions using Claude.

API Key Detection: Before running generation, check for API key:

if [ -z "$ANTHROPIC_API_KEY" ]; then
  echo "API key not found. Using Claude Code mode..."
  USE_CLAUDE_CODE=true
else
  echo "API key detected. Using API mode..."
  USE_CLAUDE_CODE=false
fi

Command (API Mode):

node src/index.js generate $SEGMENT

Command (Claude Code Mode):

node src/index.js generate $SEGMENT --claude-code

In Claude Code mode, the CLI will print instructions for manual generation. Follow those instructions:

Read schema file shown in output
Use context gathered in Stage 1
Generate descriptions following format specified
Save to ./descriptions/{segment}-descriptions.json

Output:

✅ Stage 3 complete: Descriptions generated
📁 Artifacts created:
   - ./descriptions/{segment}-descriptions.json
➡️  Next: Stage 4 (Review Export)

Error Handling: If generation fails:

Display error message
Offer options: [R] Retry | [C] Claude Code mode | [E] Exit
If user chooses Retry, re-run with same mode
If user chooses Claude Code, re-run with --claude-code flag
If user chooses Exit, stop workflow

Resume Detection: If ./descriptions/{segment}-descriptions.json already exists:

Ask: "Descriptions file found. Resume from Stage 4 or regenerate?"
If resume: skip to Stage 4
If regenerate: delete descriptions file and re-run generation

Stage 4: Review Export

Purpose: Export descriptions to CSV for human review and editing.

Command:

node src/index.js review $SEGMENT

This command will:

Export descriptions to ./reviews/{segment}-descriptions.csv
Open CSV file in default spreadsheet application
Wait for user to review and edit

CSV Format: 6 columns: table, column, type, source, description, is_pii

table = table type (attribute, behavior, master)
column = column name
type = data type (may be empty — this is optional and doesn't affect write-back)
source = source database
description = AI-generated description (editable)
is_pii = PII flag (TRUE/FALSE)

Edit the description and is_pii columns as needed. Do NOT modify other columns (they are validated as immutable).

Output:

✅ Stage 4 complete: CSV exported and opened for review
📁 Artifacts created:
   - ./reviews/{segment}-descriptions.csv

Please review and edit the CSV file, then save and close it.
Type "done" when ready to continue.

➡️  Next: Stage 5 (Validation)

Error Handling: If export fails:

Display error message
Offer options: [R] Retry | [E] Exit
If user chooses Retry, re-run export command
If user chooses Exit, stop workflow

If file opening fails (headless system, no default app):

Log warning but continue
Display CSV path for manual opening
Proceed to "Type done when ready" prompt

Resume Detection: If ./reviews/{segment}-descriptions.csv already exists:

Ask: "CSV file found. Resume from Stage 5 or re-export?"
If resume: skip to Stage 5
If re-export: delete CSV and re-run export

Stage 5: Validation

Purpose: Validate edited CSV and import back to JSON format.

Command:

node src/index.js validate $SEGMENT

This command will:

Parse CSV file
Validate structure, required fields, types, immutable fields
Check for schema drift (columns in CSV not in original schema)
Display validation summary with segment/table/column counts
Prompt for confirmation before proceeding to write-back

Output (Success):

✅ Stage 5 complete: Validation passed
📁 Validation summary:
   - Segments: 1
   - Tables: 3
   - Columns: 45
   - Status: Ready for write-back

Proceed with write-back? (y/n)

➡️  Next: Stage 6 (Write-back)

Output (Errors Found):

❌ Stage 5 failed: Validation errors found
📁 Error log created:
   - ./reviews/{segment}-descriptions-errors.csv

Please fix errors in CSV and retry validation.

Options: [R] Retry validation | [E] Exit

Error Handling: If validation fails:

Display error count and error log path
Errors saved to {segment}-descriptions-errors.csv with row/column/issue details
Offer options: [R] Retry (after user fixes CSV) | [E] Exit
If user chooses Retry, re-run validation command
If user chooses Exit, stop workflow

Confirmation Gate: After successful validation, require explicit confirmation:

Display segment/table/column counts
Ask: "Proceed with write-back? (y/n)"
If no: stop workflow at Stage 5 (user can manually run writeback later)
If yes: continue to Stage 6

Stage 6: Write-back

Purpose: Publish descriptions to Treasure Data source tables using the /v3/table/update-schema API. Once written, the CDP propagates these descriptions to Parent Segment output tables when the Audience workflow runs. The Knowledge Base tool then returns them as comment, enabling Audience Agents to understand column semantics.

API Endpoint: POST /v3/table/update-schema/{database}/{table}

Schema Format: 4-element arrays: [name, type, alias, description]

Element 1: column name
Element 2: column type (e.g., string, long, int)
Element 3: alias (set to null to preserve existing)
Element 4: free-text description (no character limit)

Important: Do NOT use /v3/table/update — that endpoint's 3rd element (alias) only accepts SQL identifiers (lowercase, digits, underscore). The /v3/table/update-schema endpoint supports a 4th element for free-text descriptions.

Auth: Uses TD_API_KEY env var, or auto-detects TDX_API_KEY__* env vars. EU endpoint is auto-detected from key naming (e.g., TDX_API_KEY__TDX_STUDIO_EU01_256 → api.eu01.treasuredata.com).

Table Name Resolution: The review CSV table column contains table types (attribute, behavior, master), not actual TD table names. The writeback command loads the descriptions JSON to resolve actual (database, table_name) pairs.

Dry-run Preview (Required): Before actual write-back, run dry-run to preview changes:

node src/index.js writeback $SEGMENT --dry-run

Output:

Dry-run mode: No changes will be made to TD

Tables to update:
  db_customer_360.customer_master (15 columns)
  db_customer_360.customer_attributes (12 columns)
  db_customer_360.customer_behaviors (18 columns)

Total: 3 tables, 45 columns

Review complete. Proceed with actual write-back? (y/n)

Confirmation Gate: Require explicit confirmation after dry-run preview:

Ask: "Proceed with actual write-back? (y/n)"
If no: stop workflow (user can manually run writeback later)
If yes: continue to actual write-back

Actual Write-back:

node src/index.js writeback $SEGMENT

Output (Success):

✅ Stage 6 complete: Write-back successful
📁 Artifacts created:
   - ./snapshots/{segment}-{timestamp}.json (type: before)
   - ./snapshots/{segment}-{timestamp}.json (type: after)

Write-back results:
  ✅ db_customer_360.customer_master (15 columns updated)
  ✅ db_customer_360.customer_attributes (12 columns updated)
  ✅ db_customer_360.customer_behaviors (18 columns updated)

Total: 3/3 tables succeeded

Workflow complete! 🎉

Descriptions will propagate to Parent Segment output tables when the Audience workflow runs next.
Verify with: SHOW COLUMNS FROM {database}.{table} (check Comment column)

Output (Partial Failure):

⚠️  Stage 6 partial success: Some tables failed
📁 Artifacts created:
   - ./snapshots/{segment}-{timestamp}.json (type: before)
   - ./reviews/{segment}-writeback-errors.json

Write-back results:
  ✅ db_customer_360.customer_master (15 columns updated)
  ❌ db_customer_360.customer_attributes (API error: 500)
  ✅ db_customer_360.customer_behaviors (18 columns updated)

Total: 2/3 tables succeeded

Error log: ./reviews/{segment}-writeback-errors.json

Options: [R] Retry failed tables | [V] View errors | [X] Rollback | [E] Exit

Error Handling: If write-back fails (partial or complete):

Display success/failure counts
Errors saved to {segment}-writeback-errors.json
Offer options:
- [R] Retry: Re-run writeback for failed tables only
- [V] View: Display error details from error log
- Rollback: Restore TD to before snapshot state
- [E] Exit: Stop workflow

Rollback Option: If user chooses rollback after write-back failure:

node src/index.js rollback $SEGMENT

This will:

Load most recent "before" snapshot
Restore TD table schemas to pre-write-back state
Confirm rollback completion

Resume Detection: If ./snapshots/{segment}*-before.json exists:

Ask: "Before snapshot found (write-back already executed). Options:"
- [C] Continue (skip write-back, workflow already complete)
- [R] Re-run write-back (creates new snapshots)
- Rollback (restore to before state)
If continue: workflow complete
If re-run: delete snapshots and run write-back
If rollback: run rollback command, then exit

Resume Detection (Overall)

When workflow starts, check for existing artifacts to determine position:

# Check Stage 2 completion
if [ -f "./schemas/$SEGMENT.json" ]; then
  STAGE_2_COMPLETE=true
fi

# Check Stage 3 completion
if [ -f "./descriptions/${SEGMENT}-descriptions.json" ]; then
  STAGE_3_COMPLETE=true
fi

# Check Stage 4 completion
if [ -f "./reviews/${SEGMENT}-descriptions.csv" ]; then
  STAGE_4_COMPLETE=true
fi

# Check Stage 6 completion
if ls ./snapshots/${SEGMENT}*-before.json 1> /dev/null 2>&1; then
  STAGE_6_COMPLETE=true
fi

Resume Logic:

If Stage 6 complete: Workflow already finished, ask to rollback or exit
If Stage 4 complete but not 6: Resume from Stage 5 (validation)
If Stage 3 complete but not 4: Resume from Stage 4 (review export)
If Stage 2 complete but not 3: Resume from Stage 3 (description generation)
If nothing complete: Start from Stage 1 (context gathering)

Resume Prompt:

Found existing artifacts:
  ✅ Stage 2: ./schemas/{segment}.json
  ✅ Stage 3: ./descriptions/{segment}-descriptions.json
  ✅ Stage 4: ./reviews/{segment}-descriptions.csv

Resume from Stage 5 (Validation) or start fresh?
Options: [R] Resume | [F] Fresh start

Progress Tracking

After each stage completes, display:

Stage number and name
Completion status (✅ success, ❌ failure, ⚠️ partial)
Artifacts created with file paths
Next stage

Example:

✅ Stage 3 complete: Descriptions generated
📁 Artifacts created:
   - ./descriptions/customer_360-descriptions.json (45 columns)
➡️  Next: Stage 4 (Review Export)

Error Recovery Summary

Each stage offers context-appropriate recovery options:

Stage	Error Type	Options
2. Extraction	Command failure	[R] Retry, [E] Exit
3. Generation	API/generation failure	[R] Retry, [C] Claude Code mode, [E] Exit
4. Review Export	Export failure	[R] Retry, [E] Exit
5. Validation	CSV errors	[R] Retry (after fixing CSV), [E] Exit
6. Write-back	Partial/complete failure	[R] Retry failed, [V] View errors, [X] Rollback, [E] Exit

Automated Mode

For CI/CD or non-interactive workflows, use skip flags:

Stage 4 (Review Export):

node src/index.js review $SEGMENT --skip-review

Exports CSV but doesn't open file or prompt for "done".

Stage 5 (Validation): Confirmation prompt can be skipped by piping "y":

echo "y" | node src/index.js validate $SEGMENT

Stage 6 (Write-back):

node src/index.js writeback $SEGMENT --skip-review

Skips all confirmation prompts and proceeds directly.

Workflow State Artifacts

The workflow uses file system as state store:

Artifact	Location	Purpose
Schema JSON	`./schemas/{segment}.json`	Extracted schema with samples
Descriptions JSON	`./descriptions/{segment}-descriptions.json`	Generated descriptions
CSV Export	`./reviews/{segment}-descriptions.csv`	Human-editable review format
CSV Errors	`./reviews/{segment}-descriptions-errors.csv`	Validation error details
Before Snapshot	`./snapshots/{segment}-{timestamp}.json`	Pre-write-back state
After Snapshot	`./snapshots/{segment}-{timestamp}.json`	Post-write-back state
Writeback Errors	`./reviews/{segment}-writeback-errors.json`	Write-back failure details

All artifacts use segment name as identifier for resume detection.

Example: Full Workflow Execution

User: /data-dictionary customer_360

Related skills

More from treasure-data/td-skills

Installs

Repository

treasure-data/td-skills

GitHub Stars

First Seen

Mar 31, 2026

data-dictionary

Data Dictionary: Automated Maintenance & Enhancement

Usage

Workflow Overview

Stage 1: Context Gathering

Stage 2: Schema Extraction

Stage 3: Description Generation

Stage 4: Review Export

Stage 5: Validation

Stage 6: Write-back

Resume Detection (Overall)

Progress Tracking

Error Recovery Summary

Automated Mode

Workflow State Artifacts

Example: Full Workflow Execution

More from treasure-data/td-skills

pytd

tdx-basic

segment

agent

trino-optimizer

aps-doc-golden