id-graph-canonical-id-size

Installation
SKILL.md

ID Graph - Canonical ID Size Analysis

Analyze canonical ID group sizes to identify over-stitching patterns where too many individual IDs are mapped to a single canonical ID.

Requirements

  • Parent segment ID with RT 2.0 enabled

Database

The database name contains the parent segment ID and has this format cdp_audience_273509. This is where you can plug in the parent segment the user gives in the request.

ids_updated table

The ID graph table is typically called ids_updated and is in the Parent Segment real time database. This table contains the canonical ID mappings created by RT 2.0's ID stitching process.

Schema

The ids_updated table has the following key columns:

  • canonical_id, string, the canonical identifier for a group of stitched IDs
  • id_set, array(string), array of individual identifiers that belong to this canonical group

Core Query

The main query to analyze ID counts per canonical ID:

WITH flattened AS (
    SELECT
        canonical_id,
        id_value
    FROM cdp_audience_<parent_segment_id>.ids_updated
    CROSS JOIN UNNEST(id_set) AS t(id_value)
    WHERE td_interval(time, '-7d')  -- Avoid full table scan
)
SELECT
    canonical_id,
    count(DISTINCT id_value) as unique_id_count
FROM flattened
GROUP BY canonical_id
ORDER BY unique_id_count DESC
LIMIT 100;

Troubleshooting

  • Ensure the parent segment has RT 2.0 enabled
  • Large result sets may need LIMIT clauses for performance
Related skills

More from treasure-data/td-skills

Installs
1
GitHub Stars
18
First Seen
Mar 31, 2026
Security Audits