seed-data-generator

Installation

SKILL.md

Seed Data Generator Protocol

This skill helps developers populate empty local or staging databases with massive amounts of realistic data for load testing and UI development.

Core assumption: Simple random strings (asdfgh) are useless for UI testing. Seed data must look real and respect Foreign Key constraints to successfully insert.

1. Schema Analysis & Topological Sort

Before generating data, read the schema and understand the relationships:

If orders depends on users and products.
If order_items depends on orders and products.
Topological Sort (Insert Order): users -> products -> orders -> order_items.

(Never try to insert an order item before the order exists).

2. Smart Field Generation (Faking)

Map column names and data types to specific Faker generators:

email -> Faker.Internet.Email()
first_name, last_name, full_name -> Faker.Person.FullName()
status (VARCHAR) -> Random pick from ('active', 'pending', 'cancelled').
description, bio -> Faker.Lorem.Paragraph()
created_at -> Random Timestamp between NOW() - 1 year and NOW().

3. Output Generation

Provide an executable seeder script (TypeScript/Prisma, Python, or raw SQL depending on the user's stack). Raw SQL is the default.

Required Outputs (Must write BOTH to docs/database-report/):

Human-Readable Markdown (docs/database-report/seed-data-report.md)

### 🔗 Dependency Graph Resolution
Insert Order:
1. `companies`
2. `users` (Depends on `companies`)
3. `posts` (Depends on `users`)

### 🛠️ Seed Script (Raw SQL)
```sql
-- Disable triggers temporarily for fast bulk inserts
SET session_replication_role = 'replica';

-- 1. Insert Companies
INSERT INTO companies (id, name, created_at) VALUES 
('c1', 'Acme Corp', '2023-01-15 10:00:00'),
('c2', 'Globex', '2023-02-20 11:30:00');

-- 2. Insert Users
INSERT INTO users (id, company_id, email, first_name) VALUES 
('u1', 'c1', 'john.acme@example.com', 'John'),
('u2', 'c2', 'sarah.globex@example.com', 'Sarah');

-- Re-enable triggers
SET session_replication_role = 'origin';


2. **Machine-Readable JSON (`docs/database-report/seed-data-output.json`)**
```json
{
  "skill": "seed-data-generator",
  "insertion_order": ["companies", "users", "posts"],
  "faker_mappings": {
    "users.email": "Faker.Internet.Email()",
    "companies.name": "Faker.Company.CompanyName()"
  },
  "rows_generated": {
    "companies": 2,
    "users": 2
  }
}

Guardrails

Performance: For requesting >10,000 rows, do not output literal SQL INSERT statements. Instead, output a Python/Node script using faker and fast bulk COPY commands.
Unique Constraints: Be extremely careful with random generators hitting duplicate values on UNIQUE columns. Append id or sequence numbers to emails/usernames if necessary.
Environment: Warn the user to NEVER run seed scripts in production.

Related skills

More from fatih-developer/fth-skills

Installs

Repository

fatih-developer…h-skills

GitHub Stars

First Seen

Mar 3, 2026

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykPass

seed-data-generator

Seed Data Generator Protocol

1. Schema Analysis & Topological Sort

2. Smart Field Generation (Faking)

3. Output Generation

Guardrails

More from fatih-developer/fth-skills

task-decomposer

context-compressor

multi-brain-debate

multi-brain-score

checkpoint-guardian

parallel-planner