parent-segment-analysis
tdx Parent-Segment-Analysis - CDP Parent Segment Data Analysis
Workflow
- Get output database schema
- Read ALL columns from the schema file to understand available data
- Identify relevant columns for the analysis (use sub-agent if schema is large)
- Query customers or behavior tables
Get Output Database Schema
Parent segments generate an output database (cdp_audience_{id}) with:
customerstable - master + all attributes joinedbehavior_{name}tables - one per behavior
# Save schema to file (recommended for large schema data)
tdx ps desc "Customer 360" -o customer_360_schema.json
# Output summary:
# Schema saved to customer_360_schema.json
# Database: cdp_audience_12345
# Tables: 1 customers + 5 behaviors
# Columns: 42 total
Schema JSON Structure
One column per line for grep and progressive disclosure:
{
"database": "cdp_audience_12345",
"parent_segment": "Customer 360",
"parent_id": "12345",
"customers": {
"table": "customers",
"columns": [
{ "name": "cdp_customer_id", "type": "varchar" },
{ "name": "email", "type": "varchar" }
]
},
"behaviors": [
{
"table": "behavior_purchases",
"columns": [
{ "name": "cdp_customer_id", "type": "varchar" },
{ "name": "amount", "type": "double" }
]
}
]
}
Discovering Relevant Columns
IMPORTANT: Always read the full schema file first to understand what data is available. After reviewing the schema, you can use grep/search to find specific columns.
For large schemas: use a sub-agent to discover relevant columns for your analysis goal
Analysis Guidelines
- Check null ratios for identified customer columns and exclude null values from aggregations to ensure accurate analysis.
- Build queries incrementally rather than attempting everything at once.
- Provide an extremely brief summary of key findings in one sentence.
- Always be very explicit and detailed about the data sources you are using, including table names, column names.
Array Columns
- For SQL queries needing array analysis/aggregation: Use CROSS JOIN UNNEST.
SELECT tag, count(0) AS cnt FROM customers CROSS JOIN UNNEST(tags) AS t(tag) GROUP BY 1 ORDER BY 2 DESC LIMIT 10 - Don't use the same name for alias as column name (e.g., use
tagnottagsfor the alias) - Never use
ARRAY_CONTAINS— usecontains(array_column, value)or filter via CROSS JOIN UNNEST
Time Function Usage
Always first check the format of the time column before applying time functions.
Time functions:
td_interval(time, '-1d')— filter records within a time intervaltd_time_parse('2024-01-01', 'UTC')— parse a time string into Unix timestamp (seconds)
Related Skills
- parent-segment - Manage parent segment configuration
- segment - Create child segments from parent segment data
- trino - Trino SQL syntax and TD-specific functions
More from treasure-data/td-skills
pytd
Expert assistance for using pytd (Python SDK) to query and import data with Treasure Data. Use this skill when users need help with Python-based data analysis, querying Presto/Hive, importing pandas DataFrames, bulk data uploads, or integrating TD with Python analytical workflows.
20tdx-basic
Executes tdx CLI commands for Treasure Data. Covers `tdx databases`, `tdx tables`, `tdx describe`, `tdx query`, `tdx auth setup`, context management with profiles/sessions, and output formats (JSON/TSV/table). Use when users need tdx command syntax, authentication setup, database/table exploration, schema inspection, or query execution.
3workflow
Manages TD workflows using `tdx wf` commands. Covers project sync (pull/push/clone), running workflows, monitoring sessions/attempts, task timeline visualization, retry/kill operations, and secrets management. Use when users need to manage, monitor, or debug Treasure Workflow projects via tdx CLI.
3journey
Load when the client wants to create, edit, or manage a CDP customer journey. Use for building journey YAML with segments, activations, and stage steps, modifying journey stages or flow logic (decision points, condition waits, A/B tests), or pushing journey changes to Treasure Data. Also load when the client wants to analyze journey performance, query journey tables, create journey dashboards, or generate journey action reports.
2connector-config
Writes connector_config for segment/journey activations using `tdx connection schema <type>` to discover available fields. Use when configuring activations - always run schema command first to see connector-specific fields.
2trino
TD Trino SQL with TD-specific functions (td_interval, td_time_range, td_time_string, td_sessionize). Use for time-based filtering, partition pruning, and TD query patterns.
2