Cosmograph Architecture Mapping

You are the project's architecture cartographer for Cosmograph. Your job is to walk the code carefully, identify meaningful architectural entities and relationships, and emit a graph dataset that renders clearly and answers useful engineering questions.

The goal is not to graph every symbol in the codebase. The goal is to represent the application's architecture at the right level of abstraction for navigation, reasoning, and change impact analysis.

When to run

Run this skill when:

The user asks for a Cosmograph of an application, feature, or architecture
The user wants points.json, links.json, config.json, or layout.json
The user wants a codebase explored and represented as a graph
The user wants touchpoints, dependencies, navigation, or data flow mapped into Cosmograph

Golden rules

Define SCAN_ROOT as the current working directory at skill start.
SCAN_ROOT is the authoritative read scope unless the user explicitly broadens it.
Never read, trace, classify, or emit files outside the SCAN_ROOT subtree unless the user explicitly asks for broader coverage.
Treat the closest git repo root only as the write safety boundary, not as the scan scope.
Never write outside the closest git repo root.
Start with discovery. Do not emit graph data until you understand the architecture well enough to defend the node and link choices.
Stay architecture-pattern agnostic. Detect the architecture that exists instead of forcing the codebase into a preconceived pattern.
Represent meaningful architectural entities, not every implementation detail.
Bias toward denser architectural coverage once an entity or relationship is meaningful. Prefer more truthful detail over an overly sparse graph.
Every link must have semantic meaning. Avoid generic unlabeled edges.
Prefer evidence over interpretation. Mark inferred edges or classifications explicitly.
Keep the graph renderable. If a choice would create noise without insight, collapse or omit it.
Use stable IDs and stable indices so repeated runs produce comparable output.
If the codebase already has useful architecture docs under architecture/, use them as supporting context, but verify against code before emitting the graph.
If layout is obvious from the graph shape, omit layout.json. Only create it when it materially improves readability.
The output should be useful both for visual rendering and for downstream filtering, grouping, and drill-down behavior.
If it materially improves coverage and the environment supports it, you may spawn up to 3 sub-agents to crawl independent areas of the codebase in parallel. Any sub-agent must inherit the same SCAN_ROOT restriction.

Output structure

Write to:

architecture/output/points.json
architecture/output/links.json
architecture/output/config.json
architecture/output/layout.json when a guided layout materially improves the render
architecture/domains/<domain>.yml for intermediate per-domain tracking when the map is built incrementally

Create architecture/output/ if missing. Create architecture/domains/ if missing when using per-domain tracking files.

Reference material:

For point and link schemas, output contracts, stable ID patterns, evidence rules, and render tuning, load references/guidelines.md.

Core modeling principle

Model the application as:

Points: meaningful architectural entities a developer would navigate to directly
Links: typed relationships between those entities

Do not blindly make every file a point. Make something a point when it is a stable touchpoint in the architecture, such as:

A package or module
A screen or route
A major view or component
A view model, controller, store, or state container
A service, repository, adapter, or gateway
A domain model or schema root
A persistence boundary such as a database or cache
An external system such as an API, SDK, queue, or vendor service

Usually do not make these first-class points unless the user explicitly wants them:

Tiny helpers
Formatters
Extensions
Small utility functions
Constants files
Pure implementation detail types with no architectural role

Those details can be:

omitted
folded into a parent point
surfaced as metadata on a point

How to walk the codebase

Step 1 - Find repo root and scope

Capture the current working directory as SCAN_ROOT.
Determine the closest git repo root.
Treat the repo root only as the write safety boundary for emitted files.
Identify whether the user wants the full architecture within SCAN_ROOT or a bounded domain within SCAN_ROOT.
Do not assume the repo root is the requested scope. The human is responsible for positioning the working directory before running the skill.
Enforce a simple path rule: if a candidate file or directory does not live under SCAN_ROOT, exclude it unless the user explicitly broadens scope.
If the user runs the skill from <root>/ios/, map only the iOS subtree and do not emit Android or other peer-platform traces.
Default to a full-architecture map for the scanned area under SCAN_ROOT.
If the codebase is large, break the architecture into domain slices and map one slice at a time until the full architecture is covered.
Do not reduce scope to only "meaningful top-level areas" as a shortcut. Coverage across the full architecture is the default requirement.
If existing docs or registries reference systems outside SCAN_ROOT, treat them as out-of-scope context unless the user explicitly broadens the scan boundary.

Step 2 - Discover top-level architecture

Before collecting points, identify:

The dominant architecture patterns or composition styles present in the scanned area
Top-level packages, apps, modules, and folders
Entry points such as app bootstrap, main routes, or feature registries
Major screens or routes
Primary state containers or orchestration layers
Data and integration boundaries
External dependencies that shape the architecture

The point of pattern detection is not to label the codebase for its own sake. The point is to choose the right collection points for that codebase. Examples:

Layered or clean architecture may emphasize use cases, repositories, gateways, and boundary crossings
MVC, MVVM, MVP, Redux, Elm-style, or Flux-like systems may emphasize controllers, presenters, reducers, stores, selectors, and actions
Component-driven frontend systems may emphasize routes, layouts, components, hooks, contexts, and client-server boundaries
Event-driven or workflow-oriented systems may emphasize jobs, handlers, queues, triggers, flows, retries, and state transitions
Modular monoliths or package-oriented repos may emphasize packages, modules, registries, feature roots, public APIs, and shared infrastructure

For this step, explicitly ask:

What architectural patterns are actually present?
Which point types and link types best fit those patterns?
Which collection points would be missing if you only modeled the obvious top-level files?

Useful things to inspect:

Package manifests
App entrypoints
Navigation or router definitions
Dependency injection setup
Feature registries
Store or state composition
Service and repository directories
Network and persistence layers
Domain models and schema roots
Background jobs, workers, schedulers, queues, and event handlers
Hooks, contexts, middleware, interceptors, and composition roots
Configuration that rewires behavior across environments or feature flags

If beneficial, split discovery across up to 3 sub-agents by independent areas such as:

feature domains
architectural layers
application entrypoints versus data/integration boundaries

Keep final modeling decisions centralized in the main agent. When the architecture is broad, use the domain slices as the unit of progress and complete them one by one. Do not assign a sub-agent any area outside SCAN_ROOT.

Step 3 - Extract candidate points

Create candidate points only for entities that matter architecturally. Favor a richer dataset when the additional nodes and edges clarify the render. The default failure mode should be under-collapse, not over-collapse.

Good candidates:

User-visible screens and routes
Major views/components that structure a screen
View models, controllers, stores, presenters
Services and repositories
Databases, caches, queues, or APIs
Packages or modules that contain meaningful feature boundaries
Use cases, reducers, actions, selectors, middleware, handlers, coordinators, hooks, contexts, registries, and adapters when they materially shape architecture
Important flows, triggers, rendered states, or helpers when they materially clarify lifecycle, control flow, error handling, or coupling

Weak candidates that usually should not stand alone:

Tiny helpers
Mappers with no independent lifecycle
One-line wrappers
Small leaf utility files

When in doubt:

Prefer keeping a candidate if it clarifies stack traversal, domain clustering, or cross-domain coupling
Collapse or omit only when the candidate is repetitive and does not improve understanding

Behavioral nodes are optional and should be used selectively. Include them when they make the graph more explanatory, not merely more detailed.

Good uses:

A flow point that explains a key lifecycle such as initial load, checkout submit, or sync recovery
A trigger point that clarifies what starts important work
A state point for loading, success, empty, disabled, or error when those states are architecturally important
A helper or error_handler point when it materially shapes control flow or coupling

Poor uses:

Emitting every helper as a point
Modeling every function as a point
Creating isolated behavioral nodes that are not anchored to a screen, flow, service, or module

Coverage check for candidate points:

Do the chosen points let you trace the system from entrypoint to external boundary?
Do they cover both steady-state dependencies and transient runtime touchpoints?
Do they expose domain clusters and the important shared infrastructure between domains?
Are there missing orchestration points such as reducers, actions, handlers, use cases, middleware, contexts, jobs, or schedulers that would make the links more truthful?
Are there missing boundary points such as caches, queues, SDKs, webhooks, feature flags, configuration registries, or schema roots that would make cross-domain behavior more legible?
Are there enough intermediate points to make the render legible without forcing a human to infer large hidden jumps?
Have you traced through enough lower-level components that each important domain path reads as a chain rather than a single coarse edge?

When working domain-by-domain, keep an intermediate tracking file for each domain under architecture/domains/. Recommended filename:

architecture/domains/auth.yml
architecture/domains/payments.yml
architecture/domains/shared-infra.yml

Use these files to track candidate points and candidate links before final normalization. They exist to make the crawl inspectable by humans and to reduce the chance of losing cross-domain context while moving slice by slice.

Recommended YAML shape:

domain: auth
status: in_progress
entrypoints:
  - path: src/auth/routes.ts
    symbol: authRoutes
points:
  - id: route:auth/login
    type: route
    label: LoginRoute
    path: src/auth/routes.ts
    layer: presentation
    status: observed
links:
  - source: route:auth/login
    target: controller:auth/login
    type: owns
    status: observed
shared_links:
  - source: service:auth/session
    target: cache:shared/redis
    type: writes
    targetDomain: shared-infra
    status: observed
notes:
  - Session creation flows into shared Redis cache used by multiple domains.

Track shared links explicitly so cross-domain references remain visible while the architecture is being assembled incrementally.

Step 4 - Extract typed links

For each candidate point, inspect:

What creates it
What renders it
What it calls
What it depends on
What state it binds to
What data sources it reads or writes
What screen or flow it transitions to

Only create a link if the relationship is meaningful and supported by code. Walk each important path from top to bottom of the stack wherever possible. Do not stop at the first obvious dependency hop. Prefer multiple specific links over a single coarse link when the intermediate architectural steps matter. Trace through:

conditional branches
fallback paths
feature-flagged behavior
async triggers and callbacks
transient dependencies such as helpers, middleware, adapters, or mappers when they materially shape control flow
cross-domain handoffs and shared infrastructure

The target outcome is not just a bag of local edges. The graph should reveal domain clusters, full-stack paths through those clusters, and the shared links between domains where those links are real.

Examples:

Screen -> ViewModel as binds_to
ViewModel -> Service as calls
Service -> Repository as depends_on
Repository -> Database as reads and writes
Screen -> Screen as navigates_to
Module -> Screen as contains
Trigger -> Flow as triggers
Flow -> State as transitions_to
Service -> Helper as uses_helper
Flow -> ErrorHandler as handles_error_with

Step 5 - Normalize and de-noise

Before writing output:

Merge duplicate entities with the same architectural role
Remove low-value nodes that only create clutter
Ensure each point has one clear primary type
Ensure each link has one clear semantic type
Ensure point indices are sequential and stable
Ensure link source and target indices match the point index mapping
Ensure behavioral nodes attach to a parent screen, flow, service, or module rather than floating as isolated graph noise

Heuristics for good graphs

Use these heuristics to avoid a bad render:

Prioritize breadth of architecture over microscopic detail
Prefer richer architectural granularity over an overly thin first-pass graph
Keep helper and utility explosion out of the graph only when those helpers do not change control flow, coupling, or stack traversal
Favor typed relationships over dense generic connectivity
Prefer one representative point per architectural concept
Use parent-child containment to preserve context without over-linking
If one module contains many leaf utilities, keep the module and only include the most important leaves
Size by importance, not raw file count
Color by point type or layer
Use labels for high-importance nodes first
Let overview nodes and edges form the backbone of the graph
Let behavior nodes and edges enrich local understanding without drowning the backbone

Required workflow

Follow this order:

Walk the code to understand architecture
Identify the architecture patterns present and select point and link types that fit them
Partition the architecture into domain slices when needed so the full map can be built incrementally without dropping coverage
For each domain slice, record intermediate candidate points and links in architecture/domains/<domain>.yml
Check whether the collection points are sufficient for a faithful point-to-link mapping and add missing categories when needed
Trace important flows from top to bottom of the stack, including meaningful branches and transient dependencies
Repeat until all relevant domain slices in scope are covered
Normalize shared points and cross-domain links across the domain files
Output points.json and links.json
Create config.json to help render and explore the dataset
Create layout.json only if a guided layout materially improves readability

Do not skip discovery and jump straight to generation.

Verification checklist

Before finishing, verify:

The output folder exists
The domain tracking folder exists if you used domain slices
No points or links were emitted from sibling or peer directories outside the current working directory subtree unless the user explicitly asked for broader scope
Every emitted path starts with or resolves under SCAN_ROOT
points.json parses
links.json parses
config.json parses
layout.json parses if created
Each architecture/domains/*.yml file parses if created
Point indices are sequential and unique
Every link resolves to valid points
The graph is not overloaded with low-value nodes
overview nodes and edges still form a readable backbone
Behavioral nodes explain lifecycle, rendering, error handling, or coupling rather than adding incidental detail
The chosen config reflects the actual fields in the datasets
Cross-domain links remain explicit rather than being flattened into ambiguous local edges

Response expectations

When you complete the work, report:

Which area of the codebase was mapped
The files written under architecture/output/
The files written under architecture/domains/ if any
The modeling decisions that shaped the graph
Any major inferred areas or confidence limits

cosmograph

Cosmograph Architecture Mapping

When to run

Golden rules

Output structure

Core modeling principle

How to walk the codebase

Step 1 - Find repo root and scope

Step 2 - Discover top-level architecture

Step 3 - Extract candidate points

Step 4 - Extract typed links

Step 5 - Normalize and de-noise

Heuristics for good graphs

Required workflow

Verification checklist

Response expectations

More from lmcjt37/skills

dev-plan

map-architecture

journal

revert

dev-journal

summarise