NYC
skills/daymade/claude-code-skills/windows-remote-desktop-connection-doctor

windows-remote-desktop-connection-doctor

SKILL.md

Windows Remote Desktop Connection Doctor

Diagnose and fix Windows App (AVD/WVD/W365) connection quality issues on macOS, with focus on transport protocol optimization.

Background

Azure Virtual Desktop transport priority: UDP Shortpath > TCP > WebSocket. UDP Shortpath provides the best experience (lowest latency, supports UDP Multicast). When it fails, the client falls back to WebSocket over TCP 443 through the gateway, adding significant latency overhead.

Diagnostic Workflow

Step 1: Collect Connection Info

Ask the user to provide the Connection Info from Windows App (click the signal icon in the toolbar). Key fields to extract:

Field What It Tells
Transport Protocol Current transport: UDP, UDP Multicast, WebSocket, or TCP
Round-Trip Time (RTT) End-to-end latency in ms
Available Bandwidth Current bandwidth in Mbps
Gateway The AVD gateway hostname and port
Service Region Azure region code (e.g., SEAS = South East Asia)

If Transport Protocol is UDP or UDP Multicast, the connection is optimal — no further diagnosis needed.

If Transport Protocol is WebSocket or TCP, proceed to Step 2.

Step 2: Collect Network Evidence

Gather evidence in parallel — do NOT make assumptions. Run the following checks simultaneously:

2A: Network Interfaces and Routing

ifconfig | grep -E "^[a-z]|inet |utun"
netstat -rn | head -40
scutil --proxy

Look for:

  • utun interfaces: Identify VPN/proxy TUN tunnels (ShadowRocket, Clash, Tailscale)
  • Default route priority: Which interface handles default traffic
  • Split routing: 0/1 + 128.0/1 → utun pattern means a VPN captures all traffic
  • System proxy: HTTP/HTTPS proxy enabled on localhost ports

2B: RDP Client Process and Connections

# Find the Windows App process (NOT "msrdc" — the new client uses "Windows" as process name)
ps aux | grep -i -E 'msrdc|Windows' | grep -v grep
# Check its network connections
lsof -i -n -P 2>/dev/null | grep -i "Windows" | head -20
# Check for UDP connections
lsof -i UDP -n -P 2>/dev/null | head -30

Key evidence to look for:

  • Source IP 198.18.0.x: Traffic is being routed through ShadowRocket/proxy TUN tunnel
  • No UDP connections from Windows process: Shortpath not established
  • Only TCP 443: Fallback to gateway WebSocket transport

2C: VPN/Proxy State

# Environment proxy variables
env | grep -i proxy
# System proxy via scutil
scutil --proxy
# ShadowRocket config API (if accessible on local network)
NO_PROXY="<local-ip>" curl -s --connect-timeout 5 "http://<local-ip>:8080/api/read"

2D: Tailscale State (if running)

tailscale status
tailscale netcheck

The netcheck output reveals NAT type (MappingVariesByDestIP), UDP support, and public IP — valuable even when Tailscale is not the problem.

Step 3: Analyze Windows App Logs

This is the most critical step. Windows App logs contain transport negotiation details that no network-level test can reveal.

Log location on macOS:

~/Library/Containers/com.microsoft.rdc.macos/Data/Library/Logs/Windows App/

Files are named: com.microsoft.rdc.macos_v<version>_<date>_<time>.log

See references/windows_app_log_analysis.md for detailed log parsing guidance.

Quick Log Search

LOG_DIR=~/Library/Containers/com.microsoft.rdc.macos/Data/Library/Logs/Windows\ App
# Find the most recent log
LATEST_LOG=$(ls -t "$LOG_DIR"/*.log 2>/dev/null | head -1)

# Search for transport-critical entries (filter out noise)
grep -i -E "STUN|TURN|VPN|Routed|Shortpath|FetchClient|clientoption|GATEWAY.*ERR|Certificate.*valid|InternetConnectivity|Passed URL" "$LATEST_LOG" | grep -v "BasicStateManagement\|DynVC\|dynvcstat\|asynctransport"

Key Log Patterns

Log Pattern Meaning
Passed: InternetConnectivity Health check completed successfully
TCP/IP Traffic Routed Through VPN: No/Yes Client detected VPN routing for TCP
STUN/TURN Traffic Routed Through VPN: Yes Client detected VPN routing for STUN/TURN
Passed URL: https://...wvd.microsoft.com/ Response Time: Nms Gateway reachability confirmed
FetchClientOptions exception: Request timed out Critical: Client cannot get transport options from gateway
Certificate validation failed TLS interception or DNS poisoning detected
OnRDWebRTCRedirectorRpc rtcSession not handled WebRTC session setup not handled by client

Compare Working vs Broken Logs

When possible, compare a log from when the connection worked (UDP) with the current log:

# Compare startup health check blocks
for f in "$LOG_DIR"/*.log; do
  echo "=== $(basename "$f") ==="
  grep -E "InternetConnectivity|Routed Through VPN|Passed URL|FetchClient" "$f" | head -10
  echo ""
done

A working log will contain the full health check block (InternetConnectivity, VPN routing detection, gateway URL tests). A broken log may show these entries missing entirely, or show certificate/timeout errors instead.

Step 4: Determine Root Cause

Based on collected evidence, identify the root cause category:

Category A: VPN/Proxy Interference

Evidence: Windows App source IP is 198.18.0.x, STUN/TURN routed through VPN, no UDP connections.

Fix: Add DIRECT rules for AVD traffic in the proxy tool:

DOMAIN-SUFFIX,wvd.microsoft.com,DIRECT
DOMAIN-SUFFIX,microsoft.com,DIRECT
IP-CIDR,13.104.0.0/14,DIRECT

Verify: Temporarily disable VPN/proxy, reconnect VDI, check if transport changes to UDP.

Category B: ISP/Network UDP Restriction

Evidence: Even with all VPNs off, still WebSocket. No UDP connections. FetchClientOptions timeout.

Verify:

# Test STUN connectivity to a known server
python3 -c "
import socket, struct, os
header = struct.pack('!HHI', 0x0001, 0, 0x2112A442) + os.urandom(12)
for srv in [('stun.l.google.com', 19302), ('stun1.l.google.com', 19302)]:
    try:
        s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
        s.settimeout(3)
        s.sendto(header, srv)
        data, addr = s.recvfrom(1024)
        print(f'STUN from {srv[0]}: OK')
        s.close(); break
    except: print(f'STUN from {srv[0]}: FAILED'); s.close()
"

Fix options:

  • Try mobile hotspot (isolate home network from ISP)
  • Check router NAT type (Full Cone NAT preferred)
  • Enable UPnP on router
  • Try IPv6 if available
  • Contact ISP about UDP restrictions

Category C: Client Health Check Failure

Evidence: Log shows certificate validation errors at startup, health check block (InternetConnectivity, STUN/TURN detection) missing from log, FetchClientOptions timeout.

This means the client cannot complete its diagnostic/capability discovery, preventing Shortpath negotiation.

Possible causes:

  • ISP HTTPS interception/MITM (especially in China)
  • DNS poisoning returning incorrect IPs for Microsoft diagnostic endpoints
  • Firewall blocking Microsoft telemetry endpoints

Fix options:

  • Change DNS to 8.8.8.8 or 1.1.1.1 (bypass ISP DNS)
  • Route Microsoft traffic through a clean proxy
  • Check if ISP injects certificates

Category D: Server-Side Shortpath Not Enabled

Evidence: Log shows no STUN/TURN or Shortpath related entries at all (not even detection), but health checks pass and no errors.

This means the AVD host pool does not have RDP Shortpath enabled. This requires admin action on the Azure portal.

Step 5: Verify Fix

After applying a fix, reconnect the VDI session and verify:

  1. Check Connection Info — Transport Protocol should show UDP or UDP Multicast
  2. RTT should drop significantly (e.g., from 165ms to 40-60ms)
  3. Verify with lsof:
lsof -i UDP -n -P 2>/dev/null | grep -i "Windows"
# Should show UDP connections if Shortpath is active

References

Weekly Installs
8
First Seen
5 days ago
Installed on
claude-code8
gemini-cli7
antigravity7
opencode7
github-copilot6
codex6