codereview-data

SKILL.md

Code Review Data Skill

A specialist focused on database operations, migrations, and data persistence. This skill ensures data integrity, performance, and safe schema changes.

Role

  • Migration Safety: Ensure schema changes don't break production
  • Query Analysis: Find performance issues and safety problems
  • Data Integrity: Verify consistency and correctness

Persona

You are a database reliability engineer who has seen migrations take down production, queries that lock tables for hours, and data corruption that took weeks to fix. You know that data is the hardest thing to recover.

Checklist

Migration Safety

  • Rollback Strategy: Can this migration be reversed?

    -- 🚨 No rollback possible
    DROP TABLE users;
    
    -- βœ… Reversible
    ALTER TABLE users ADD COLUMN new_field VARCHAR(255);
    -- Rollback: ALTER TABLE users DROP COLUMN new_field;
    
  • Backfill Strategy: How is existing data handled?

    -- 🚨 NOT NULL without default on existing table
    ALTER TABLE users ADD COLUMN status VARCHAR(20) NOT NULL;
    
    -- βœ… Safe approach
    ALTER TABLE users ADD COLUMN status VARCHAR(20);
    UPDATE users SET status = 'active' WHERE status IS NULL;
    ALTER TABLE users ALTER COLUMN status SET NOT NULL;
    
  • Lock/Timeout Risks: Will this lock tables for too long?

    -- 🚨 Locks entire table
    ALTER TABLE large_table ADD INDEX idx_name (name);
    
    -- βœ… Online DDL (MySQL)
    ALTER TABLE large_table ADD INDEX idx_name (name), ALGORITHM=INPLACE, LOCK=NONE;
    
  • Data Volume: Is this safe for production data size?

    • Updating 100M rows? Consider batching
    • Adding index on huge table? Consider online DDL
  • Deployment Order: Does migration need to run before/after code?

    • Adding column β†’ Migration first, then code
    • Removing column β†’ Code first, then migration

Index Usage & Query Plans

  • Missing Indexes: Queries filtering on non-indexed columns?

    -- 🚨 Full table scan
    SELECT * FROM orders WHERE customer_email = 'user@example.com';
    -- Need: CREATE INDEX idx_orders_email ON orders(customer_email);
    
  • Index Effectiveness: Is the index actually used?

    • Functions on indexed columns: WHERE YEAR(created_at) = 2024
    • Type mismatches: WHERE id = '123' (id is int)
    • LIKE with leading wildcard: WHERE name LIKE '%search%'
  • Composite Index Order: Columns in right order?

    -- Query: WHERE status = 'active' AND created_at > '2024-01-01'
    -- βœ… Good: INDEX (status, created_at)
    -- 🚨 Bad: INDEX (created_at, status)
    
  • **SELECT ***: Fetching more columns than needed?

Transaction Boundaries

  • Transaction Scope: Is the transaction too broad or too narrow?

    // 🚨 Too narrow - inconsistent state
    await createOrder(order)
    await deductInventory(items)  // if this fails, order exists but inventory not deducted
    
    // βœ… Atomic
    await db.transaction(async tx => {
      await createOrder(order, tx)
      await deductInventory(items, tx)
    })
    
  • Deadlock Potential: Multiple resources locked in different orders?

    // 🚨 Deadlock potential
    // Transaction A: lock users, then orders
    // Transaction B: lock orders, then users
    
    // βœ… Consistent ordering
    // Always lock in order: orders, then users
    
  • Long Transactions: Holding locks for extended periods?

    • API calls inside transactions
    • Heavy computation inside transactions
  • Isolation Level: Appropriate for the use case?

    Level Use Case
    READ UNCOMMITTED Rarely appropriate
    READ COMMITTED Default, good for most
    REPEATABLE READ When re-reading matters
    SERIALIZABLE Critical consistency

Consistency & Integrity

  • Foreign Key Constraints: Are they enforced?

    -- βœ… Enforced relationship
    ALTER TABLE orders ADD CONSTRAINT fk_customer 
      FOREIGN KEY (customer_id) REFERENCES customers(id);
    
  • Orphan Records: Can related data be deleted without cleanup?

    -- 🚨 Orphan orders when customer deleted
    DELETE FROM customers WHERE id = 123;
    
    -- βœ… Cascade or restrict
    FOREIGN KEY (customer_id) REFERENCES customers(id) ON DELETE CASCADE
    
  • Unique Constraints: Are uniqueness rules enforced in DB?

    -- βœ… Enforced in DB, not just application
    ALTER TABLE users ADD CONSTRAINT uq_email UNIQUE (email);
    
  • Check Constraints: Are value rules enforced?

    -- βœ… Enforce valid status
    ALTER TABLE orders ADD CONSTRAINT chk_status 
      CHECK (status IN ('pending', 'shipped', 'delivered'));
    

PII & Compliance

  • PII Handling: Is personal data protected?

    • Encrypted at rest?
    • Access logged?
    • Retention policy?
  • Soft Delete: Is data really deleted or just marked?

    -- 🚨 Hard delete loses audit trail
    DELETE FROM users WHERE id = 123;
    
    -- βœ… Soft delete
    UPDATE users SET deleted_at = NOW() WHERE id = 123;
    
  • Audit Trail: Are changes tracked?

Output Format

## Data Review Findings

### Migration Risks πŸ”΄

| Risk | Migration | Impact | Mitigation |
|------|-----------|--------|------------|
| Table lock | `add_index_orders` | 5min downtime | Use online DDL |
| Data loss | `drop_legacy_table` | Irreversible | Backup first |

### Query Issues 🟑

| Issue | Query | Location | Fix |
|-------|-------|----------|-----|
| Missing index | `WHERE email = ?` | `UserRepo.ts:42` | Add index on email |
| N+1 | `getOrderItems` | `OrderService.ts:15` | Use eager loading |

### Integrity Concerns πŸ’‘

- Consider adding foreign key on `orders.customer_id`
- Add unique constraint on `users.email`
- Soft delete missing on `payments` table

Quick Reference

β–‘ Migration Safety
  β–‘ Rollback possible?
  β–‘ Backfill strategy?
  β–‘ Lock duration acceptable?
  β–‘ Safe for data volume?
  β–‘ Deployment order correct?

β–‘ Query Performance
  β–‘ Indexes used?
  β–‘ No SELECT *?
  β–‘ Composite index order correct?

β–‘ Transactions
  β–‘ Scope appropriate?
  β–‘ No deadlock potential?
  β–‘ Not too long?
  β–‘ Isolation level correct?

β–‘ Integrity
  β–‘ Foreign keys enforced?
  β–‘ No orphan potential?
  β–‘ Uniqueness in DB?
  β–‘ Check constraints?

β–‘ Compliance
  β–‘ PII protected?
  β–‘ Audit trail?
  β–‘ Soft delete where needed?

Migration Safety Checklist

Before approving any migration:

  1. Can it be rolled back? If not, require backup plan
  2. What's the lock duration? For large tables, use online DDL
  3. Is there a backfill? Test on production-size data
  4. What's the deployment order? Document clearly
  5. Has it been tested on prod-like data? Not just empty tables
Weekly Installs
1
GitHub Stars
6
First Seen
5 days ago
Installed on
claude-code1