database-seeding

Installation
SKILL.md

Database Seeding

This skill enables an AI agent to generate and insert realistic test data into databases for development, testing, and staging environments. The agent creates idempotent seed scripts using deterministic generators or faker libraries, handles relational data with proper foreign key ordering, supports environment-specific seed profiles (minimal dev data vs. large-scale load testing), and ensures seeds can be run repeatedly without duplicating data.

Workflow

  1. Analyze the target schema: Inspect the database schema to identify all tables, their columns, data types, constraints (NOT NULL, UNIQUE, CHECK, foreign keys), and relationships. Determine the correct insertion order to satisfy foreign key dependencies — parent tables must be seeded before child tables.

  2. Design the seed data strategy: Choose the appropriate approach based on the use case. Use deterministic data with fixed seeds for reproducible test suites. Use faker-based generation for realistic-looking development data. Use anonymized production snapshots for staging environments that need realistic data distributions. Define the volume of data for each table.

  3. Generate seed scripts: Write seed scripts in the project's language (Python, JavaScript, SQL, etc.) that create data matching all schema constraints. Use the Faker library or equivalent for realistic names, emails, addresses, and dates. Handle unique constraints by generating unique values or using sequence-based patterns. Wrap inserts in transactions for atomicity.

  4. Ensure idempotency: Design scripts to be safely re-runnable. Use INSERT ON CONFLICT DO NOTHING, UPSERT patterns, or truncate-then-insert strategies. Check for existing data before inserting to avoid duplicates or constraint violations on repeated runs.

  5. Support environment-specific profiles: Create different seed profiles — a small dataset (10-50 records per table) for local development, a medium dataset (1,000-10,000 records) for integration testing, and a large dataset (100K+ records) for performance testing. Control the profile via environment variables or command-line arguments.

  6. Execute and verify: Run the seed script against the target database, verify row counts match expectations, and confirm relational integrity by checking that all foreign keys reference existing rows. Log the seeding results with counts per table.

Supported Technologies

Related skills
Installs
9
GitHub Stars
78
First Seen
Mar 19, 2026