entity-normalization
Entity normalisation
The vendor's wire format is not my domain model.
External APIs return whatever they want — Spotify returns external_urls.spotify, GitHub returns html_url, Linear returns url. If raw vendor responses leak into the rest of the codebase, every consumer ends up special-casing every vendor. The fix is to normalise at the boundary: one shape, one discriminator, vendor-specific noise tucked into a metadata bag.
This is the ait ETL pattern. It's the data-channel counterpart to result-not-throw (the error channel). Together they make the type system useful: errors are typed, data is typed, the boundary is the only place either lives in the wild.
When this skill is active
You are about to:
- Pull data from an external API and store / index / process it
- Add a new connector or integration
- Define a new entity type
- Write a mapper from vendor response → internal type
- Touch the
EntityTypeunion or itsVALID_ENTITY_TYPESset
The contract
export type EntityType =
| 'spotify_track' | 'spotify_artist' | 'spotify_album'
| 'github_repository' | 'github_pull_request' | 'github_issue'
| 'linear_issue' | 'notion_page' | 'slack_message'
// ...
;
export interface NormalizedEntity {
__type: EntityType;
id: string; // namespaced: `${__type}_${externalId}`
externalId: string; // vendor's ID, untouched
title: string;
description?: string;
url?: string;
metadata: Record<string, unknown>; // vendor-specific extras
createdAt: Date;
updatedAt: Date;
}
Two invariants:
__typeis the discriminator the rest of the system switches on. Format:<vendor>_<resource>. Lowercase, snake_case. No exceptions —SpotifyTrackis wrong,spotify-trackis wrong.idis namespaced:${__type}_${externalId}. Two vendors can have the sameexternalId; the namespacedidis globally unique.
The mapper pattern
Every connector exports a function per resource: map<Vendor><Resource>(raw) → NormalizedEntity.
// packages/connectors/src/domain/mappers/spotify.mapper.ts
export function mapSpotifyTrack(raw: SpotifyApi.TrackObject): NormalizedEntity {
return {
__type: 'spotify_track',
id: `spotify_track_${raw.id}`,
externalId: raw.id,
title: raw.name,
description: `${raw.artists.map(a => a.name).join(', ')} — ${raw.album.name}`,
url: raw.external_urls.spotify,
metadata: {
duration_ms: raw.duration_ms,
popularity: raw.popularity,
album: raw.album.name,
artists: raw.artists.map(a => a.name),
},
createdAt: new Date(),
updatedAt: new Date(),
};
}
Mappers are pure. No IO, no logging, no side effects — just vendor → domain. That makes them trivial to test and trivial to compose.
Adding a new entity type
- Add the literal to the
EntityTypeunion. Stable name:<vendor>_<resource>. - Add it to the
VALID_ENTITY_TYPESset. That's the runtime validator. - Write the mapper in
packages/connectors/src/domain/mappers/<vendor>.mapper.ts. - Add the vendor-specific raw type under
packages/core/src/types/integrations/<vendor>.ts. - Register the mapper in the connector's service layer.
Five steps, every time, in that order. Skipping any of them creates silent failure modes downstream.
Querying across vendors
Because every entity has the same shape, downstream code can be vendor-agnostic:
function summarise(entities: NormalizedEntity[]): string {
return entities
.map(e => `${e.__type}: ${e.title}`)
.join('\n');
}
The discriminator is there when you want to specialise:
switch (entity.__type) {
case 'spotify_track': return formatTrack(entity);
case 'github_pull_request': return formatPR(entity);
// ...
}
Exhaustiveness is checked by the compiler when you add a new EntityType.
Anti-patterns
__type: 'SpotifyTrack'(PascalCase) — break the convention.__typestrings are kebab-style with_separators. The compiler doesn't enforce this; the convention does.- Putting vendor-specific fields at the top level instead of in
metadata.entity.albumis wrong for a track that's also an album field on a playlist — vendor-specific extras live inmetadata. - Mappers that throw. If the raw payload is invalid, return
Result<NormalizedEntity, ValidationError>(seeresult-not-throw) — don't throw inside a pure mapper. - Mappers that do IO (fetch related data, hit the DB, log). Pure transformation only. If you need related data, fetch it before calling the mapper.
- Forgetting to update
VALID_ENTITY_TYPESwhen adding a new literal. The TypeScript union is compile-time; the set is runtime. Both have to move together. - Different
idformats per vendor.spotify_${id}vsgh-${id}vs rawid— pick one (${__type}_${externalId}) and never deviate. - Storing the raw vendor payload alongside the entity "just in case". It's
metadataor it's gone. The vendor payload is not part of the domain.
Cross-references
- Canonical implementation:
personal/ait/references/core-entity-types.md - Connectors using this pattern:
personal/ait/references/features-connectors.md - Domain glossary:
CONTEXT.md