cosmograph

Installation
SKILL.md

Cosmograph Architecture Mapping

You are the project's architecture cartographer for Cosmograph. Your job is to walk the code carefully, identify meaningful architectural entities and relationships, and emit a graph dataset that renders clearly and answers useful engineering questions.

The goal is not to graph every symbol in the codebase. The goal is to represent the application's architecture at the right level of abstraction for navigation, reasoning, and change impact analysis.

When to run

Run this skill when:

  • The user asks for a Cosmograph of an application, feature, or architecture
  • The user wants points.json, links.json, config.json, or layout.json
  • The user wants a codebase explored and represented as a graph
  • The user wants touchpoints, dependencies, navigation, or data flow mapped into Cosmograph

Golden rules

  1. Define SCAN_ROOT as the current working directory at skill start.
  2. SCAN_ROOT is the authoritative read scope unless the user explicitly broadens it.
  3. Never read, trace, classify, or emit files outside the SCAN_ROOT subtree unless the user explicitly asks for broader coverage.
  4. Treat the closest git repo root only as the write safety boundary, not as the scan scope.
  5. Never write outside the closest git repo root.
  6. Start with discovery. Do not emit graph data until you understand the architecture well enough to defend the node and link choices.
  7. Stay architecture-pattern agnostic. Detect the architecture that exists instead of forcing the codebase into a preconceived pattern.
  8. Represent meaningful architectural entities, not every implementation detail.
  9. Bias toward denser architectural coverage once an entity or relationship is meaningful. Prefer more truthful detail over an overly sparse graph.
  10. Every link must have semantic meaning. Avoid generic unlabeled edges.
  11. Prefer evidence over interpretation. Mark inferred edges or classifications explicitly.
  12. Keep the graph renderable. If a choice would create noise without insight, collapse or omit it.
  13. Use stable IDs and stable indices so repeated runs produce comparable output.
  14. If the codebase already has useful architecture docs under architecture/, use them as supporting context, but verify against code before emitting the graph.
  15. If layout is obvious from the graph shape, omit layout.json. Only create it when it materially improves readability.
  16. The output should be useful both for visual rendering and for downstream filtering, grouping, and drill-down behavior.
  17. If it materially improves coverage and the environment supports it, you may spawn up to 3 sub-agents to crawl independent areas of the codebase in parallel. Any sub-agent must inherit the same SCAN_ROOT restriction.

Output structure

Write to:

  • architecture/output/points.json
  • architecture/output/links.json
  • architecture/output/config.json
  • architecture/output/layout.json when a guided layout materially improves the render
  • architecture/domains/<domain>.yml for intermediate per-domain tracking when the map is built incrementally

Create architecture/output/ if missing. Create architecture/domains/ if missing when using per-domain tracking files.

Reference material:

  • For point and link schemas, output contracts, stable ID patterns, evidence rules, and render tuning, load references/guidelines.md.

Core modeling principle

Model the application as:

  • Points: meaningful architectural entities a developer would navigate to directly
  • Links: typed relationships between those entities

Do not blindly make every file a point. Make something a point when it is a stable touchpoint in the architecture, such as:

  • A package or module
  • A screen or route
  • A major view or component
  • A view model, controller, store, or state container
  • A service, repository, adapter, or gateway
  • A domain model or schema root
  • A persistence boundary such as a database or cache
  • An external system such as an API, SDK, queue, or vendor service

Usually do not make these first-class points unless the user explicitly wants them:

  • Tiny helpers
  • Formatters
  • Extensions
  • Small utility functions
  • Constants files
  • Pure implementation detail types with no architectural role

Those details can be:

  • omitted
  • folded into a parent point
  • surfaced as metadata on a point

How to walk the codebase

Step 1 - Find repo root and scope

  • Capture the current working directory as SCAN_ROOT.
  • Determine the closest git repo root.
  • Treat the repo root only as the write safety boundary for emitted files.
  • Identify whether the user wants the full architecture within SCAN_ROOT or a bounded domain within SCAN_ROOT.
  • Do not assume the repo root is the requested scope. The human is responsible for positioning the working directory before running the skill.
  • Enforce a simple path rule: if a candidate file or directory does not live under SCAN_ROOT, exclude it unless the user explicitly broadens scope.
  • If the user runs the skill from <root>/ios/, map only the iOS subtree and do not emit Android or other peer-platform traces.
  • Default to a full-architecture map for the scanned area under SCAN_ROOT.
  • If the codebase is large, break the architecture into domain slices and map one slice at a time until the full architecture is covered.
  • Do not reduce scope to only "meaningful top-level areas" as a shortcut. Coverage across the full architecture is the default requirement.
  • If existing docs or registries reference systems outside SCAN_ROOT, treat them as out-of-scope context unless the user explicitly broadens the scan boundary.

Step 2 - Discover top-level architecture

Before collecting points, identify:

  • The dominant architecture patterns or composition styles present in the scanned area
  • Top-level packages, apps, modules, and folders
  • Entry points such as app bootstrap, main routes, or feature registries
  • Major screens or routes
  • Primary state containers or orchestration layers
  • Data and integration boundaries
  • External dependencies that shape the architecture

The point of pattern detection is not to label the codebase for its own sake. The point is to choose the right collection points for that codebase. Examples:

  • Layered or clean architecture may emphasize use cases, repositories, gateways, and boundary crossings
  • MVC, MVVM, MVP, Redux, Elm-style, or Flux-like systems may emphasize controllers, presenters, reducers, stores, selectors, and actions
  • Component-driven frontend systems may emphasize routes, layouts, components, hooks, contexts, and client-server boundaries
  • Event-driven or workflow-oriented systems may emphasize jobs, handlers, queues, triggers, flows, retries, and state transitions
  • Modular monoliths or package-oriented repos may emphasize packages, modules, registries, feature roots, public APIs, and shared infrastructure

For this step, explicitly ask:

  • What architectural patterns are actually present?
  • Which point types and link types best fit those patterns?
  • Which collection points would be missing if you only modeled the obvious top-level files?

Useful things to inspect:

  • Package manifests
  • App entrypoints
  • Navigation or router definitions
  • Dependency injection setup
  • Feature registries
  • Store or state composition
  • Service and repository directories
  • Network and persistence layers
  • Domain models and schema roots
  • Background jobs, workers, schedulers, queues, and event handlers
  • Hooks, contexts, middleware, interceptors, and composition roots
  • Configuration that rewires behavior across environments or feature flags

If beneficial, split discovery across up to 3 sub-agents by independent areas such as:

  • feature domains
  • architectural layers
  • application entrypoints versus data/integration boundaries

Keep final modeling decisions centralized in the main agent. When the architecture is broad, use the domain slices as the unit of progress and complete them one by one. Do not assign a sub-agent any area outside SCAN_ROOT.

Step 3 - Extract candidate points

Create candidate points only for entities that matter architecturally. Favor a richer dataset when the additional nodes and edges clarify the render. The default failure mode should be under-collapse, not over-collapse.

Good candidates:

  • User-visible screens and routes
  • Major views/components that structure a screen
  • View models, controllers, stores, presenters
  • Services and repositories
  • Databases, caches, queues, or APIs
  • Packages or modules that contain meaningful feature boundaries
  • Use cases, reducers, actions, selectors, middleware, handlers, coordinators, hooks, contexts, registries, and adapters when they materially shape architecture
  • Important flows, triggers, rendered states, or helpers when they materially clarify lifecycle, control flow, error handling, or coupling

Weak candidates that usually should not stand alone:

  • Tiny helpers
  • Mappers with no independent lifecycle
  • One-line wrappers
  • Small leaf utility files

When in doubt:

  • Prefer keeping a candidate if it clarifies stack traversal, domain clustering, or cross-domain coupling
  • Collapse or omit only when the candidate is repetitive and does not improve understanding

Behavioral nodes are optional and should be used selectively. Include them when they make the graph more explanatory, not merely more detailed.

Good uses:

  • A flow point that explains a key lifecycle such as initial load, checkout submit, or sync recovery
  • A trigger point that clarifies what starts important work
  • A state point for loading, success, empty, disabled, or error when those states are architecturally important
  • A helper or error_handler point when it materially shapes control flow or coupling

Poor uses:

  • Emitting every helper as a point
  • Modeling every function as a point
  • Creating isolated behavioral nodes that are not anchored to a screen, flow, service, or module

Coverage check for candidate points:

  • Do the chosen points let you trace the system from entrypoint to external boundary?
  • Do they cover both steady-state dependencies and transient runtime touchpoints?
  • Do they expose domain clusters and the important shared infrastructure between domains?
  • Are there missing orchestration points such as reducers, actions, handlers, use cases, middleware, contexts, jobs, or schedulers that would make the links more truthful?
  • Are there missing boundary points such as caches, queues, SDKs, webhooks, feature flags, configuration registries, or schema roots that would make cross-domain behavior more legible?
  • Are there enough intermediate points to make the render legible without forcing a human to infer large hidden jumps?
  • Have you traced through enough lower-level components that each important domain path reads as a chain rather than a single coarse edge?

When working domain-by-domain, keep an intermediate tracking file for each domain under architecture/domains/. Recommended filename:

  • architecture/domains/auth.yml
  • architecture/domains/payments.yml
  • architecture/domains/shared-infra.yml

Use these files to track candidate points and candidate links before final normalization. They exist to make the crawl inspectable by humans and to reduce the chance of losing cross-domain context while moving slice by slice.

Recommended YAML shape:

domain: auth
status: in_progress
entrypoints:
  - path: src/auth/routes.ts
    symbol: authRoutes
points:
  - id: route:auth/login
    type: route
    label: LoginRoute
    path: src/auth/routes.ts
    layer: presentation
    status: observed
links:
  - source: route:auth/login
    target: controller:auth/login
    type: owns
    status: observed
shared_links:
  - source: service:auth/session
    target: cache:shared/redis
    type: writes
    targetDomain: shared-infra
    status: observed
notes:
  - Session creation flows into shared Redis cache used by multiple domains.

Track shared links explicitly so cross-domain references remain visible while the architecture is being assembled incrementally.

Step 4 - Extract typed links

For each candidate point, inspect:

  • What creates it
  • What renders it
  • What it calls
  • What it depends on
  • What state it binds to
  • What data sources it reads or writes
  • What screen or flow it transitions to

Only create a link if the relationship is meaningful and supported by code. Walk each important path from top to bottom of the stack wherever possible. Do not stop at the first obvious dependency hop. Prefer multiple specific links over a single coarse link when the intermediate architectural steps matter. Trace through:

  • conditional branches
  • fallback paths
  • feature-flagged behavior
  • async triggers and callbacks
  • transient dependencies such as helpers, middleware, adapters, or mappers when they materially shape control flow
  • cross-domain handoffs and shared infrastructure

The target outcome is not just a bag of local edges. The graph should reveal domain clusters, full-stack paths through those clusters, and the shared links between domains where those links are real.

Examples:

  • Screen -> ViewModel as binds_to
  • ViewModel -> Service as calls
  • Service -> Repository as depends_on
  • Repository -> Database as reads and writes
  • Screen -> Screen as navigates_to
  • Module -> Screen as contains
  • Trigger -> Flow as triggers
  • Flow -> State as transitions_to
  • Service -> Helper as uses_helper
  • Flow -> ErrorHandler as handles_error_with

Step 5 - Normalize and de-noise

Before writing output:

  • Merge duplicate entities with the same architectural role
  • Remove low-value nodes that only create clutter
  • Ensure each point has one clear primary type
  • Ensure each link has one clear semantic type
  • Ensure point indices are sequential and stable
  • Ensure link source and target indices match the point index mapping
  • Ensure behavioral nodes attach to a parent screen, flow, service, or module rather than floating as isolated graph noise

Heuristics for good graphs

Use these heuristics to avoid a bad render:

  • Prioritize breadth of architecture over microscopic detail
  • Prefer richer architectural granularity over an overly thin first-pass graph
  • Keep helper and utility explosion out of the graph only when those helpers do not change control flow, coupling, or stack traversal
  • Favor typed relationships over dense generic connectivity
  • Prefer one representative point per architectural concept
  • Use parent-child containment to preserve context without over-linking
  • If one module contains many leaf utilities, keep the module and only include the most important leaves
  • Size by importance, not raw file count
  • Color by point type or layer
  • Use labels for high-importance nodes first
  • Let overview nodes and edges form the backbone of the graph
  • Let behavior nodes and edges enrich local understanding without drowning the backbone

Required workflow

Follow this order:

  1. Walk the code to understand architecture
  2. Identify the architecture patterns present and select point and link types that fit them
  3. Partition the architecture into domain slices when needed so the full map can be built incrementally without dropping coverage
  4. For each domain slice, record intermediate candidate points and links in architecture/domains/<domain>.yml
  5. Check whether the collection points are sufficient for a faithful point-to-link mapping and add missing categories when needed
  6. Trace important flows from top to bottom of the stack, including meaningful branches and transient dependencies
  7. Repeat until all relevant domain slices in scope are covered
  8. Normalize shared points and cross-domain links across the domain files
  9. Output points.json and links.json
  10. Create config.json to help render and explore the dataset
  11. Create layout.json only if a guided layout materially improves readability

Do not skip discovery and jump straight to generation.

Verification checklist

Before finishing, verify:

  • The output folder exists
  • The domain tracking folder exists if you used domain slices
  • No points or links were emitted from sibling or peer directories outside the current working directory subtree unless the user explicitly asked for broader scope
  • Every emitted path starts with or resolves under SCAN_ROOT
  • points.json parses
  • links.json parses
  • config.json parses
  • layout.json parses if created
  • Each architecture/domains/*.yml file parses if created
  • Point indices are sequential and unique
  • Every link resolves to valid points
  • The graph is not overloaded with low-value nodes
  • overview nodes and edges still form a readable backbone
  • Behavioral nodes explain lifecycle, rendering, error handling, or coupling rather than adding incidental detail
  • The chosen config reflects the actual fields in the datasets
  • Cross-domain links remain explicit rather than being flattened into ambiguous local edges

Response expectations

When you complete the work, report:

  • Which area of the codebase was mapped
  • The files written under architecture/output/
  • The files written under architecture/domains/ if any
  • The modeling decisions that shaped the graph
  • Any major inferred areas or confidence limits
Related skills
Installs
7
Repository
lmcjt37/skills
First Seen
Mar 30, 2026