programmatic-seo
Programmatic SEO
Production-grade framework for building SEO page sets at scale. Covers the full lifecycle from keyword pattern discovery through template design, data pipeline construction, quality assurance, and post-launch optimization. Designed for deployments ranging from 50 to 100,000+ pages.
Table of Contents
- When to Use vs When Not To
- Initial Assessment
- The 14 Playbooks
- Playbook Selection Matrix
- Keyword Pattern Mining
- Data Pipeline Architecture
- Template Design System
- Quality Control Framework
- Internal Linking Architecture
- Indexation Strategy
- Launch Sequence
- Post-Launch Optimization
- Anti-Patterns and Penalty Avoidance
- Decision Matrix: Build vs Skip
- Output Artifacts
- Related Skills
When to Use vs When Not To
Use this skill when:
- You have a repeating keyword pattern with 50+ variations
- You have (or can acquire) structured data to populate pages
- The search intent is consistent across variations
- Your domain has sufficient authority to compete
Do NOT use when:
- Each page requires unique editorial content (use content-creator instead)
- Total addressable pages < 30 (manual content is more effective)
- You lack a data source and would be generating thin placeholder content
- Your domain authority is below DR 20 and competitors are DR 60+
Initial Assessment
Before designing any pSEO strategy, answer these questions. Skip nothing.
1. Opportunity Validation
| Question | Why It Matters | Red Flag |
|---|---|---|
| What is the repeating keyword pattern? | Defines the template structure | Pattern is vague or inconsistent |
| What is the aggregate monthly search volume? | Determines ROI ceiling | < 5,000 aggregate monthly searches |
| How many unique pages can you generate? | Scope the project | < 50 pages (too few) or > 50K without data infrastructure |
| What does the SERP look like for sample queries? | Competitive feasibility | Page 1 dominated by DR 80+ editorial content |
| Is intent informational, navigational, or transactional? | Template design | Mixed intent across the same pattern |
2. Data Source Evaluation
Rate your data source on this scale:
| Tier | Source Type | Defensibility | Example |
|---|---|---|---|
| S | Proprietary first-party | Unbeatable | Your product usage data, internal benchmarks |
| A | Product-derived | Strong | Aggregated user analytics, customer outcomes |
| B | User-generated | Moderate | Community reviews, submitted content |
| C | Licensed exclusive | Moderate | Paid data feed no competitor has |
| D | Public aggregated | Weak | Government data, public APIs |
| F | Scraped commodity | None | Wikipedia rewrites, copied listings |
Rule: Do not build pSEO on Tier F data. Google penalizes commodity rewrites. If your only data source is public and easily replicable, invest in acquiring Tier A-C data first.
3. Competitive Moat Assessment
For 5 sample queries in your pattern, analyze page 1 results:
- What is the average Domain Rating of ranking pages?
- Are existing results programmatic or editorial?
- What unique data do ranking pages provide?
- What is the content depth (word count, data richness, UX quality)?
Go/No-Go threshold: If the average DR gap between you and page 1 is > 30 AND existing results have proprietary data, the opportunity requires either a differentiated approach or domain authority building first.
The 14 Playbooks
| # | Playbook | Pattern | Example | Data Requirement |
|---|---|---|---|---|
| 1 | Templates | "[Type] template" | "resume template", "invoice template" | Template files + metadata |
| 2 | Curation | "best [category]" | "best CRM for startups" | Product/service reviews + ratings |
| 3 | Conversions | "[X] to [Y]" | "100 USD to EUR" | Conversion logic/API |
| 4 | Comparisons | "[X] vs [Y]" | "Notion vs Confluence" | Feature data for both products |
| 5 | Examples | "[type] examples" | "landing page examples" | Curated example collection |
| 6 | Locations | "[service] in [city]" | "coworking in Austin" | Location-specific data |
| 7 | Personas | "[product] for [audience]" | "CRM for real estate" | Audience-specific use cases |
| 8 | Integrations | "[A] + [B] integration" | "Slack Asana integration" | Integration documentation |
| 9 | Glossary | "what is [term]" | "what is churn rate" | Domain expertise |
| 10 | Translations | Content in N languages | Localized guides | Translation + localization data |
| 11 | Directory | "[category] tools" | "AI writing tools" | Tool listings + evaluations |
| 12 | Profiles | "[entity name]" | "Stripe company profile" | Entity-level data |
| 13 | Statistics | "[topic] statistics" | "SaaS churn statistics 2026" | Verified statistical data |
| 14 | Calculators | "[topic] calculator" | "LTV calculator" | Calculation logic + inputs |
Playbook Selection Matrix
| If you have... | Primary Playbook | Secondary Layer |
|---|---|---|
| A product with many integrations | Integrations | Comparisons |
| A design/creative tool | Templates + Examples | Personas |
| A multi-segment audience | Personas | Comparisons |
| Local/regional presence | Locations | Directory |
| A tool/utility product | Calculators + Conversions | Glossary |
| Deep domain expertise | Glossary + Statistics | Curation |
| A competitor landscape to exploit | Comparisons + Curation | Directory |
| User-generated content | Examples + Directory | Profiles |
Layering rule: Combine up to 2 playbooks per page set. Example: "Best coworking spaces in [city]" = Curation + Locations.
Keyword Pattern Mining
Step 1: Pattern Identification
Extract the repeating structure from seed keywords:
Seed: "react developer salary san francisco"
Pattern: [role] salary [city]
Variables: role (200+ options), city (500+ options)
Max pages: 200 x 500 = 100,000
Step 2: Volume Distribution Analysis
Not all variable combinations have search volume. Map the distribution:
| Tier | Volume Range | Typical % of Total Pages | Strategy |
|---|---|---|---|
| Head | 1,000+ monthly | 2-5% | Priority indexation, highest content quality |
| Torso | 100-999 monthly | 15-25% | Standard template, full deployment |
| Long-tail | 10-99 monthly | 40-50% | Template with conditional content blocks |
| Zero-volume | < 10 monthly | 20-40% | Noindex OR skip unless data is uniquely valuable |
Step 3: Intent Classification
For each pattern, verify intent consistency:
| Intent Type | Template Implications | CTA Strategy |
|---|---|---|
| Informational | Data-heavy, educational content | Newsletter, related content |
| Commercial investigation | Comparison tables, pros/cons | Free trial, demo |
| Transactional | Pricing, availability, features | Buy now, sign up |
| Navigational | Brand-specific, direct answer | Product page link |
Data Pipeline Architecture
Pipeline Design
[Data Source] → [Extraction] → [Transformation] → [Enrichment] → [Validation] → [Template Population] → [Quality Check] → [Publish]
Data Quality Gates
Every record must pass these gates before page generation:
| Gate | Check | Failure Action |
|---|---|---|
| Completeness | All required fields populated | Skip page, log for manual review |
| Accuracy | Data matches source, no staleness > 90 days | Flag for refresh |
| Uniqueness | No duplicate records | Merge or deduplicate |
| Minimum richness | Page will have > 300 words of unique content | Skip or enrich |
| Legal compliance | Data usage rights verified | Block publication |
Update Cadence
| Data Type | Recommended Update Frequency | Staleness Penalty |
|---|---|---|
| Pricing data | Weekly | High (users notice immediately) |
| Company/product data | Monthly | Medium |
| Statistical data | Quarterly | Low if year-tagged |
| Glossary/educational | Semi-annually | Very low |
| Location data | Monthly | Medium (closures, address changes) |
Template Design System
Page Architecture
Every programmatic page must have these zones:
┌─────────────────────────────────────┐
│ Zone 1: Unique Header │ H1 with target keyword, unique intro paragraph
├─────────────────────────────────────┤
│ Zone 2: Primary Data Section │ The core data/content for this specific page
├─────────────────────────────────────┤
│ Zone 3: Contextual Analysis │ Insights, comparisons, trends specific to this entity
├─────────────────────────────────────┤
│ Zone 4: Related Data │ Adjacent data points that add depth
├─────────────────────────────────────┤
│ Zone 5: Internal Navigation │ Related pages, breadcrumbs, category links
├─────────────────────────────────────┤
│ Zone 6: CTA │ Conversion element matched to intent
└─────────────────────────────────────┘
Uniqueness Requirements
Each page MUST have at least 3 of these 5 uniqueness sources:
- Unique data points -- Numbers, facts, or attributes specific to this entity
- Conditional content blocks -- Sections that appear/disappear based on data attributes
- Calculated insights -- Derived metrics (percentages, comparisons, rankings)
- Contextual recommendations -- "If X, then Y" advice blocks based on the data
- User-generated content -- Reviews, comments, or community contributions
URL Structure
Always use subfolders. Never subdomains for pSEO.
| Pattern | URL Template | Example |
|---|---|---|
| Location | /[service]/[city]/ |
/coworking/austin/ |
| Comparison | /compare/[a]-vs-[b]/ |
/compare/notion-vs-confluence/ |
| Integration | /integrations/[partner]/ |
/integrations/slack/ |
| Glossary | /glossary/[term]/ |
/glossary/churn-rate/ |
| Persona | /[product]-for-[audience]/ |
/crm-for-real-estate/ |
Quality Control Framework
Pre-Publication QA Checklist
Content Quality:
- Each page has > 300 words of unique content (not counting shared template elements)
- H1 is unique and contains the target keyword
- Meta title is unique (< 60 chars) and meta description is unique (< 155 chars)
- No broken data references (empty fields rendered as "N/A" or blank)
- At least 2 conditional content blocks triggered per page
- No duplicate pages targeting the same keyword
Technical SEO:
- Canonical tag points to self
- Hreflang tags if multilingual
- Schema markup renders without errors
- Page loads in < 3 seconds
- Mobile responsive
Internal Linking:
- Breadcrumb trail is complete
- 3-5 related pages linked contextually
- Hub page links to this page
- No orphan pages in the set
Thin Content Detection
Run this check against every generated page:
| Signal | Threshold | Action |
|---|---|---|
| Unique word count | < 200 unique words | Block publication |
| Content similarity to another page in set | > 80% Jaccard similarity | Merge or differentiate |
| Data fields populated | < 60% of template fields | Skip or enrich |
| User time-on-page (post-launch) | < 15 seconds average | Review and improve |
| Bounce rate (post-launch) | > 85% | Review intent match |
Internal Linking Architecture
Hub-and-Spoke Model
┌─────────┐
│ HUB │ /coworking/
│ PAGE │ (ranks for "coworking spaces")
└────┬────┘
┌──────────────┼──────────────┐
┌────┴────┐ ┌────┴────┐ ┌────┴────┐
│ SPOKE 1 │ │ SPOKE 2 │ │ SPOKE 3 │
│ /austin/│ │ /denver/│ │ /seattle/│
└────┬────┘ └────┬────┘ └────┬────┘
│ │ │
Cross-links between related spokes
Linking rules:
- Hub links DOWN to every spoke (or top 50 spokes if > 200 pages)
- Every spoke links UP to the hub
- Spokes link ACROSS to 3-5 related spokes (geographic proximity, thematic similarity)
- Deep pages link UP to their spoke AND the hub
- Cross-silo links only when contextually genuine
Pagination for Large Sets
If a hub page has > 50 spokes, implement paginated sub-hubs:
/coworking/ → Top cities + browse by state
/coworking/california/ → All California cities
/coworking/california/page/2/ → Paginated if > 25 cities
Indexation Strategy
Crawl Budget Management
| Page Set Size | Strategy |
|---|---|
| < 500 pages | Single XML sitemap, submit all |
| 500-5,000 | Segmented sitemaps by category |
| 5,000-50,000 | Segmented sitemaps + priority scoring + IndexNow |
| 50,000+ | Programmatic sitemap generation + crawl budget monitoring + strategic noindex |
Indexation Priority
| Priority | Pages | Action |
|---|---|---|
| P0 | Hub pages | Submit immediately, internal link from homepage |
| P1 | Head-volume spokes (top 10%) | Submit in first sitemap batch |
| P2 | Torso-volume spokes | Submit in second batch, 1-2 weeks later |
| P3 | Long-tail spokes | Submit gradually over 4-6 weeks |
| P4 | Zero-volume pages | Noindex unless data is uniquely valuable |
IndexNow Integration
For large-scale updates, use IndexNow to notify search engines immediately:
POST https://api.indexnow.org/indexnow
{
"host": "yoursite.com",
"key": "your-api-key",
"urlList": ["https://yoursite.com/page1", "https://yoursite.com/page2"]
}
Launch Sequence
Phase 1: Pilot (Week 1-2)
- Deploy 20-50 pages from head-volume tier
- Submit sitemap with pilot pages only
- Monitor indexation rate daily
- Check for crawl errors in Search Console
Phase 2: Scale (Week 3-6)
- Deploy remaining torso-volume pages in batches of 100-500
- Add cross-links between deployed pages
- Monitor thin content warnings
- Track impressions in Search Console
Phase 3: Long-Tail (Week 7-12)
- Deploy long-tail pages
- Noindex zero-volume pages (keep them crawlable but not indexed)
- Begin link acquisition outreach for hub pages
Phase 4: Optimization (Ongoing)
- A/B test template variations on head-volume pages
- Refresh stale data quarterly
- Add conditional content blocks based on engagement data
- Monitor for keyword cannibalization across the set
Post-Launch Optimization
Metrics Dashboard
| Metric | Frequency | Target |
|---|---|---|
| Indexation rate | Weekly | > 90% of submitted pages indexed within 60 days |
| Organic impressions | Weekly | Trending up month-over-month |
| Average position (by tier) | Bi-weekly | Head pages: top 10; Torso: top 30 |
| Click-through rate | Monthly | > 3% for head pages |
| Bounce rate | Monthly | < 70% |
| Conversion rate | Monthly | > 1% for transactional intent |
| Pages per session | Monthly | > 1.5 |
Optimization Playbook
| Signal | Diagnosis | Action |
|---|---|---|
| Indexed but not ranking | Content quality or authority gap | Enrich content, build links to hub |
| Ranking but low CTR | Title/description not compelling | A/B test meta titles |
| Ranking but high bounce | Intent mismatch or thin content | Audit against search intent, add data |
| Deindexed after initial indexing | Thin content penalty | Improve uniqueness, reduce similarity |
| Crawled but not indexed | Quality threshold not met | Add more unique content per page |
Anti-Patterns and Penalty Avoidance
| Anti-Pattern | Why It Fails | Prevention |
|---|---|---|
| City-name swapping | Same content + different city = doorway page penalty | Each location page needs unique local data |
| Keyword stuffing in templates | Unnatural density triggers spam filters | Keep keyword density 1-2%, write naturally |
| Generating pages for zero-demand queries | Wastes crawl budget, signals low quality | Validate demand before generating |
| No internal links to pSEO pages | Orphan pages get deprioritized | Connect every page to the hub-spoke structure |
| Stale data never refreshed | Users lose trust, Google notices | Set update cadence per data type |
| All pages identical structure | Lack of variation signals automation to Google | Use 3-5 template variants |
Decision Matrix: Build vs Skip
Score each dimension 1-5, then apply the threshold.
| Dimension | Weight | 1 (Skip) | 5 (Build) |
|---|---|---|---|
| Search demand | 30% | < 1K aggregate monthly | > 50K aggregate monthly |
| Data quality | 25% | Public/scraped, easily replicated | Proprietary, defensible |
| Competitive gap | 20% | DR gap > 40, strong incumbents | DR gap < 15, weak/no incumbents |
| Template feasibility | 15% | Each page needs unique editorial | Clean template fits all variations |
| Business alignment | 10% | No conversion path from these pages | Direct path to core product |
Scoring guide:
- 4.0+ weighted average: Build immediately
- 3.0-3.9: Build if resources allow, validate with pilot first
- 2.0-2.9: Invest in data quality or authority first
- < 2.0: Do not build
Output Artifacts
| Artifact | Format | Description |
|---|---|---|
| Opportunity Analysis | Markdown table | Keyword patterns x volume x data source x difficulty x business alignment |
| Playbook Recommendation | Decision matrix | If/then mapping with rationale and real-world examples |
| Page Template Specification | Annotated wireframe (markdown) | URL pattern, zone structure, uniqueness sources, conditional logic |
| Data Pipeline Spec | Flow diagram (text) | Source > extraction > transformation > validation > publication |
| Quality Scorecard | Checklist + thresholds | Pre-publication QA gates with pass/fail criteria |
| Indexation Plan | Phased timeline | Priority tiers, sitemap structure, crawl budget allocation |
| Post-Launch Dashboard | Metric table | KPIs, targets, review cadence, optimization triggers |
Related Skills
- seo-audit -- Run after pSEO pages are live to diagnose indexation issues, thin content warnings, or ranking problems across the page set.
- schema-markup -- Add structured data to pSEO templates (Product, FAQ, LocalBusiness) for rich snippet eligibility at scale.
- site-architecture -- Plan hub-and-spoke structure and crawl budget management for large pSEO deployments (500+ pages).
- competitor-alternatives -- Use the Comparisons playbook when building "[X] vs [Y]" pages; competitor-alternatives has dedicated comparison page frameworks.
- content-creator -- Use when individual pages in the set need editorial-quality unique content beyond template generation.