dt-obs-frontends
Frontend Observability Skill
Monitor web and mobile frontends using Real User Monitoring (RUM) with DQL queries. This skill targets the new RUM experience only; do not use classic RUM data.
Overview
This skill helps you:
- Monitor Core Web Vitals and frontend performance
- Track user sessions, engagement, and behavior
- Analyze errors and correlate with backend traces
- Optimize mobile app startup and stability
- Diagnose performance issues with detailed timing analysis
Data Sources:
- Metrics:
timeserieswithdt.frontend.*(trends, alerting) - Events:
fetch user.events(individual page views, requests, clicks, errors) - Sessions:
fetch user.sessions(session-level aggregates: duration, bounce, counts)
Quick Reference
Common Metrics
dt.frontend.user_action.count- User action volumedt.frontend.user_action.duration- User action durationdt.frontend.request.count- Request volumedt.frontend.request.duration- Request latency (ms)dt.frontend.error.count- Error countsdt.frontend.session.active.estimated_count- Active sessionsdt.frontend.user.active.estimated_count- Unique usersdt.frontend.web.page.cumulative_layout_shift- CLS metricdt.frontend.web.navigation.dom_interactive- DOM interactive timedt.frontend.web.page.first_input_delay- FID metric (legacy; prefer INP)dt.frontend.web.page.largest_contentful_paint- LCP metricdt.frontend.web.page.interaction_to_next_paint- INP metricdt.frontend.web.navigation.load_event_end- Load event enddt.frontend.web.navigation.time_to_first_byte- Time to first byte
Common Filters
frontend.name- Filter by frontend name (e.g.my-frontend)dt.rum.user_type- Exclude synthetic monitoringgeo.country.iso_code- Geographic filteringdevice.type- Mobile, desktop, tabletbrowser.name- Browser filtering
Common Timeseries Dimensions
Use these for dt.frontend.* timeseries splits and breakdowns:
frontend.name- Frontend namegeo.country.iso_codedevice.typebrowser.nameos.nameuser_type-real_user,synthetic,robot
fetch user.events, from: now() - 2h
| filter characteristics.has_page_summary == true
| summarize page_views = count(), by: {frontend.name}
| sort page_views desc
Event Characteristics
characteristics.has_page_summary- Page views (web)characteristics.has_view_summary- Views (mobile)characteristics.has_navigation- Navigation eventscharacteristics.has_user_interaction- Clicks, forms, etc.characteristics.has_request- Network request eventscharacteristics.has_error- Error eventscharacteristics.has_crash- Mobile crashescharacteristics.has_long_task- Long JavaScript taskscharacteristics.has_csp_violation- CSP violations
Full event model: https://docs.dynatrace.com/docs/semantic-dictionary/model/rum/user-events
Session Data (user.sessions)
user.sessions contains session-level aggregates produced by the session aggregation service from user.events. Field names differ from user.events — sessions use underscores where events use dots.
Session identity and context:
dt.rum.session.id— Session ID (NOTdt.rum.session_id)dt.rum.instance.id— Instance IDfrontend.name- array of frontends involved in sessiondt.rum.application.type—webormobiledt.rum.user_type—real_user,synthetic, orrobot
Session aggregates (underscore naming — NOT dot):
| Field | Description | ⚠️ NOT this |
|---|---|---|
navigation_count |
Number of navigations | navigation.count |
user_interaction_count |
Clicks, form submissions | user_interaction.count |
user_action_count |
User actions | user_action.count |
request_count |
XHR/fetch requests | request.count |
event_count |
Total events in session | event.count |
page_summary_count |
Page views (web) | page_summary.count |
view_summary_count |
Views (mobile/SPA) | view_summary.count |
Error fields (dot naming — same as events):
error.count,error.exception_count,error.http_4xx_count,error.http_5xx_counterror.anr_count,error.csp_violation_count,error.has_crash
Session lifecycle:
start_time,end_time,duration(nanoseconds)end_reason—timeout,synthetic_execution_finished, etc.characteristics.is_bounce— Boolean bounce flagcharacteristics.has_replay— Session replay available
User identity:
dt.rum.user_tag— User identifier (typically email, username or customerId), set viadtrum.identifyUser()API call in the instrumented frontend. Not always populated — only present when the frontend explicitly callsidentifyUser().- When
dt.rum.user_tagis empty,dt.rum.instance.idis often the only user differentiator. The value is a random ID assigned by the RUM agent on the client side, so it is not personally identifiable but can be used to distinguish unique users whenuser_tagis not set. On web this is based on a persistent cookie, so it can be deleted by the user. - The user tag is a session-level field — query it from
user.sessions, notuser.events(where it may be empty even if the session has one).
Client/device context:
browser.name,browser.version,device.type,os.namegeo.country.iso_code,client.ip,client.isp
Synthetic-only fields:
dt.entity.synthetic_test,dt.entity.synthetic_location,dt.entity.synthetic_test_step
Time window behavior:
fetch user.sessions, from: X, to: Yonly returns sessions that started in[X, Y]— NOT sessions that were merely active during that window.- Sessions can last 8h+ (the aggregation service waits 30+ minutes of inactivity before closing a session).
- To find all sessions active during a time window, extend the lookback by at least 8 hours: e.g., to cover events from the last 24h, query
fetch user.sessions, from: now() - 32h. - This matters for correlation queries (e.g., matching
user.eventstouser.sessionsby session ID) — a narrowuser.sessionswindow will miss long-running sessions and produce false "orphans."
Session creation delay:
- The session aggregation service waits for ~30+ minutes of inactivity before closing a session and writing the
user.sessionsrecord. - This means recent events (last ~1 hour) will not yet have a matching
user.sessionsentry — this is normal, not a data gap. - When correlating
user.eventswithuser.sessions, exclude recent data (e.g., useto: now() - 1h) to avoid counting in-progress sessions as orphans.
Zombie sessions (events without a user.sessions record):
- Not every
dt.rum.session.idinuser.eventswill have a correspondinguser.sessionsrecord. The session aggregation service intentionally skips zombie sessions — sessions with no real user activity (zero navigations and zero user interactions). - Zombie sessions contain only background, machine-driven activity (e.g., automatic XHR requests, heartbeats) with no page views or clicks. Serializing them would add no value to users.
- When correlating
user.eventswithuser.sessions, expect a large number of unmatched session IDs. This is by design, not a data gap. Filter to sessions with activity before diagnosing orphans:fetch user.events, from: now() - 2h, to: now() - 1h | filter isNotNull(dt.rum.session.id) | summarize navs = countIf(characteristics.has_navigation == true), interactions = countIf(characteristics.has_user_interaction == true), by: {dt.rum.session.id} | filter navs > 0 or interactions > 0
Example — bounce rate and session quality:
fetch user.sessions, from: now() - 24h
| filter dt.rum.user_type == "real_user"
| summarize
total_sessions = count(),
bounces = countIf(characteristics.is_bounce == true),
zero_activity = countIf(toLong(navigation_count) == 0 and toLong(user_interaction_count) == 0),
avg_duration_s = avg(toLong(duration)) / 1000000000
| fieldsAdd bounce_rate_pct = round((bounces * 100.0) / total_sessions, decimals: 1)
Performance Thresholds
- LCP: Good <2.5s | Poor >4.0s
- INP: Good <200ms | Poor >500ms
- CLS: Good <0.1 | Poor >0.25
- Cold Start: Good <3s | Poor >5s
- Long Tasks: >50ms problematic, >250ms severe
Core Workflows
1. Web Performance Monitoring
Track Core Web Vitals, page performance, and request latency for SEO and UX optimization.
Primary Files:
references/WebVitals.md- Core Web Vitals (LCP, INP, CLS)references/performance-analysis.md- Request and page performance
Common Queries:
- All Core Web Vitals summary
- Web Vitals by page/device
- Request duration SLA monitoring
- Page load performance trends
2. User Session & Behavior Analysis
Understand user engagement, navigation patterns, and session characteristics. Analyze button clicks, form interactions, and user journeys.
Data source choice:
- Use
fetch user.sessionsfor session-level analysis (bounce rate, session duration, session counts) - Use
fetch user.eventsfor event-level detail (individual clicks, navigation timing, specific pages)
Primary Files:
references/user-sessions.md- Session tracking and user analyticsreferences/performance-analysis.md- Navigation and engagement patterns
Common Queries:
- Active sessions by frontend
- Sessions by custom property
- Bounce rate analysis (use
user.sessionswithcharacteristics.is_bounce) - Session quality (zero-activity sessions via
navigation_count,user_interaction_count) - Click analysis on UI elements (use
user.eventswithcharacteristics.has_user_interaction) - External referrers (traffic sources)
3. Error Tracking & Debugging
Monitor error rates, analyze exceptions, and correlate frontend issues with backend.
Primary Files:
references/error-tracking.md- Error analysis and debuggingreferences/performance-analysis.md- Trace correlation
Common Queries:
- Error rate monitoring
- JavaScript exceptions by type
- Failed requests with backend traces
- Request timing breakdown
4. Mobile Frontend Monitoring
Track mobile app performance, startup times, and crash analytics for iOS and Android. Analyze app version performance and device-specific issues.
Primary Files:
references/mobile-monitoring.md- App starts, crashes, and mobile-specific metrics
Common Queries:
- Cold start performance by app version (iOS, Android)
- Warm start and hot start metrics
- Crash rate by device model and OS version
- ANR events (Android)
- Native crash signals
- App version comparison
5. Advanced Performance Optimization
Deep performance diagnostics including JavaScript profiling, main thread blocking, UI jank analysis, and geographic performance.
Primary Files:
references/performance-analysis.md- Advanced diagnostics and long tasks
Common Queries:
- Long JavaScript tasks blocking main thread
- UI jank and rendering delays
- Tasks >50ms impacting responsiveness
- Third-party long tasks (iframes)
- Single-page app performance issues
- Geographic performance distribution
- Performance degradation detection
Best Practices
-
Use metrics for trends, events for debugging
- Metrics: Timeseries dashboards, alerting, capacity planning
- Events: Root cause analysis, detailed diagnostics
-
Filter by frontend in multi-app environments
- Always use
frontend.namefor clarity
- Always use
-
Match interval to time range
- 5m intervals for hours, 1h for days, 1d for weeks
-
Exclude synthetic traffic when analyzing real users
- Filter
dt.rum.user_typeto focus on genuine behavior
- Filter
-
Combine metrics with events for complete insights
- Start with metric trends, drill into events for details
-
Extend
user.sessionstime window for correlation queriesuser.sessionsonly returns sessions that started in the query window- Sessions can last 8h+, so extend lookback by at least 8h when joining with
user.events
Slow Page Load Playbook
Start by segmenting the problem by page, browser, geo location, and dt.rum.user_type.
Heuristics:
- High TTFB -> slow backend
- High LCP with normal TTFB -> render bottleneck
- High CLS -> layout shifts (late-loading content, ads, fonts)
- Long tasks dominate -> JavaScript execution bottlenecks (heavy frameworks, large bundles)
Backend latency (high TTFB)
fetch user.events
| filter frontend.name == "my-frontend" and characteristics.has_request == true
| filter page.url.path == "/checkout"
| summarize avg_ttfb = avg(request.time_to_first_byte), avg_duration = avg(duration)
If TTFB is high, analyze backend spans by correlating frontend events with backend traces using dt.rum.trace_id.
Heavy JavaScript execution (long tasks)
Long tasks by page:
fetch user.events, from: now() - 2h
| filter characteristics.has_long_task == true
| summarize
long_task_count = count(),
total_blocking_time = sum(duration),
by: {frontend.name, page.url.path}
| sort total_blocking_time desc
| limit 20
Long tasks by script source:
fetch user.events, from: now() - 2h
| filter frontend.name == "my-frontend"
| filter characteristics.has_long_task == true
| summarize
long_task_count = count(),
total_blocking_time = sum(duration),
by: {long_task.attribution.container_src}
| sort total_blocking_time desc
| limit 20
Large JavaScript bundles
fetch user.events
| filter frontend.name == "my-frontend"
| filter characteristics.has_request
| filter endsWith(url.full, ".js")
| summarize dls = max(performance.decoded_body_size), by: url.full
| sort dls desc
| limit 20
Large resources
fetch user.events
| filter frontend.name == "my-frontend"
| filter characteristics.has_request
| summarize dls = max(performance.decoded_body_size), by: url.full
| sort dls desc
| limit 20
Cache effectiveness
fetch user.events, from: now() - 2h
| filter frontend.name == "my-frontend"
| filter characteristics.has_request == true
| fieldsAdd cache_status = if(
performance.incomplete_reason == "local_cache" or performance.transfer_size == 0 and
(performance.encoded_body_size > 0 or performance.decoded_body_size > 0),
"cached",
else: if(performance.transfer_size > 0, "network", else: "uncached")
)
| summarize
request_count = count(),
avg_duration = avg(duration),
by: {url.domain, cache_status}
Compression waste
fetch user.events, from: now() - 2h
| filter characteristics.has_request == true
| filter isNotNull(performance.encoded_body_size) and isNotNull(performance.decoded_body_size)
| filter performance.encoded_body_size > 0
| fieldsAdd
expansion_ratio = performance.decoded_body_size / performance.encoded_body_size,
wasted_bytes = performance.decoded_body_size - performance.encoded_body_size
| summarize
requests = count(),
avg_expansion_ratio = avg(expansion_ratio),
total_wasted_bytes = sum(wasted_bytes),
by: {request.url.host, request.url.path}
| sort total_wasted_bytes desc
| limit 50
Network issues
Compare by location and domain when TTFB is high but backend performance is good:
fetch user.events, from: now() - 2h
| filter characteristics.has_request == true
| summarize
request_count = count(),
avg_duration = avg(duration),
p75_duration = percentile(duration, 75),
p95_duration = percentile(duration, 95),
by: {geo.country.iso_code, request.url.domain}
| sort p95_duration desc
| limit 50
Analyze DNS time:
fetch user.events, from: now() - 2h
| filter characteristics.has_request == true
| filter isNotNull(performance.domain_lookup_start) and isNotNull(performance.domain_lookup_end)
| fieldsAdd dns_ms = performance.domain_lookup_end - performance.domain_lookup_start
| summarize
request_count = count(),
avg_dns_ms = avg(dns_ms),
p75_dns_ms = percentile(dns_ms, 75),
p95_dns_ms = percentile(dns_ms, 95),
by: {request.url.domain}
| sort p95_dns_ms desc
| limit 50
Analyze by protocol (http/1.1, h2, h3):
fetch user.events
| filter characteristics.has_request
| summarize cnt = count(), by: {url.domain, performance.next_hop_protocol}
| sort cnt desc
| limit 50
Third-party dependencies
Analyze request performance by domain:
fetch user.events, from: now() - 2h
| filter characteristics.has_request == true
| summarize
request_count = count(),
avg_duration = avg(duration),
p75_duration = percentile(duration, 75),
p95_duration = percentile(duration, 95),
by: {request.url.domain}
| sort p95_duration desc
| limit 50
Troubleshooting
Handling Zero Results
When queries return no data, follow this diagnostic workflow:
-
Validate Timeframe
- Check if timeframe is appropriate for the data type
- RUM data may have delay (1-2 minutes for recent events)
- Verify timeframe syntax:
now()-1h to now()or similar - Try expanding timeframe:
now()-24hfor initial exploration
-
Verify frontend Configuration
- Confirm frontend is instrumented and sending RUM data
- Check
frontend.namefilter is correct - Test without frontend filter to see if any RUM data exists
- Verify frontend name matches the environment
-
Check Data Availability
- Run basic query:
fetch user.events | limit 1 - If no events exist, RUM may not be configured
- Check if timeframe predates frontend deployment
- Verify user has access to the environment
- Run basic query:
-
Review Query Syntax
- Validate filters aren't too restrictive
- Check for typos in field names or metric names
- Test query incrementally: start simple, add filters gradually
- Verify characteristics filters match event types
When to Ask User for Clarification:
- No RUM data exists in environment → "Is RUM configured for this frontend?"
- Timeframe unclear → "What time period should I analyze?"
- Expected data missing → "Has this frontend sent data recently?"
Handling Anomalous Results
When query results seem unexpected or suspicious:
Unexpected High Values:
- Metric spikes: Verify interval aggregation (avg vs. max vs. sum)
- Session counts: Check for bot traffic or synthetic monitoring
- Error rates: Confirm error definition matches expectations
- Performance degradation: Look for deployment or infrastructure changes
Unexpected Low Values:
- Missing sessions: Verify
dt.rum.user_typefilter isn't excluding real users - Low request counts: Check if frontend filter is too narrow
- Few errors: Confirm error characteristics filter is correct
- Missing mobile data: Verify platform-specific fields exist
Inconsistent Data:
- Metrics vs. Events mismatch: Different aggregation methods are expected
- Geographic anomalies: Check timezone assumptions
- Device distribution skew: May reflect actual user base
- Version mismatches: Verify app version filtering logic
Decision Tree: Ask vs. Investigate
Query returns unexpected results
│
├─ Is this a zero-result scenario?
│ ├─ YES → Follow "Handling Zero Results" workflow
│ └─ NO → Continue
│
├─ Can I validate the result independently?
│ ├─ YES → Run validation query
│ │ ├─ Validation confirms result → Report findings
│ │ └─ Validation contradicts → Investigate further
│ └─ NO → Continue
│
├─ Is the anomaly clearly explained by data?
│ ├─ YES → Report with explanation
│ └─ NO → Continue
│
├─ Do I need domain knowledge to interpret?
│ ├─ YES → Ask user for context
│ │ Example: "The error rate is 15%. Is this expected for your frontend?"
│ └─ NO → Continue
│
└─ Is the issue ambiguous or requires clarification?
├─ YES → Ask specific question with data context
│ Example: "I see two frontends named 'web-app'. Which frontend name should I use?"
└─ NO → Investigate and report findings with caveats
Common Investigation Steps
For Performance Issues:
- Compare to baseline: Query same metric for previous week
- Segment by dimension: Break down by device, browser, geography
- Check for outliers: Use percentiles (p50, p95, p99) vs. averages
- Correlate with deployments: Filter by app version or time windows
For Data Availability Issues:
- Start broad: Query all RUM data without filters
- Add filters incrementally: Isolate which filter eliminates data
- Check related metrics: If events missing, try timeseries
- Validate entity relationships: Confirm frontend-to-service links
For Unexpected Patterns:
- Expand timeframe: Look for historical context
- Cross-reference data sources: Compare events and metrics
- Check sampling: Verify no sampling is affecting results
- Consider external factors: Holidays, outages, traffic changes
Red Flags: When to Stop and Ask
Always ask the user when:
- ❌ No RUM data exists anywhere in the environment
- ❌ Multiple frontends match the user's description
- ❌ Results contradict user's stated expectations explicitly
- ❌ Data suggests monitoring is misconfigured
- ❌ Query requires business context (e.g., "acceptable error rate")
- ❌ Timeframe is ambiguous and affects interpretation significantly
Example clarifying questions:
- "I found two frontends named 'checkout'. Which one:
checkout-weborcheckout-mobile?" - "The query returns 0 results for the past hour. Should I expand the timeframe, or do you expect real-time data?"
- "The average LCP is 8 seconds, which exceeds the 4-second threshold. Is this frontend known to have performance issues?"
- "I see only synthetic traffic. Should I include
dt.rum.user_type='REAL_USER'to focus on real users?"
When to Use This Skill
Use frontend-observability skill when:
- Monitoring web or mobile frontend performance
- Analyzing Core Web Vitals for SEO
- Tracking user sessions, engagement, or behavior
- Analyzing click events and button interactions
- Debugging frontend errors or slow requests
- Correlating frontend issues with backend traces
- Optimizing mobile app startup or crash rates (iOS, Android)
- Analyzing app version performance
- Diagnosing UI jank and main thread blocking
- Analyzing security compliance (CSP violations)
- Profiling JavaScript performance (long tasks)
Do NOT use for:
- Backend service monitoring (use services skill)
- Infrastructure metrics (use infrastructure skill)
- Log analysis (use logs skill)
- Business process monitoring (use business-events skill)
Progressive Disclosure
Always Available
- FrontendBasics.md - RUM fundamentals and quick reference
Loaded by Workflow
- Web Performance: WebVitals.md, performance-analysis.md
- User Behavior: user-sessions.md, performance-analysis.md
- Error Analysis: error-tracking.md, performance-analysis.md
- Mobile Apps: mobile-monitoring.md
Load on Explicit Request
- Advanced diagnostics (long tasks, user actions)
- Security compliance (CSP violations, visibility tracking)
- Specialized mobile features (platform-specific phases)
Reference Files
Core Reference Documents
references/WebVitals.md- Core Web Vitals monitoringreferences/user-sessions.md- Session and user analyticsreferences/error-tracking.md- Error analysis and debuggingreferences/mobile-monitoring.md- Mobile app performance and crashesreferences/performance-analysis.md- Advanced performance diagnostics