skills/mukul975/anthropic-cybersecurity-skills/building-soc-metrics-and-kpi-tracking

building-soc-metrics-and-kpi-tracking

SKILL.md

Building SOC Metrics and KPI Tracking

When to Use

Use this skill when:

  • SOC leadership needs data-driven visibility into operational performance
  • Continuous improvement programs require baseline measurements and trend tracking
  • Executive reporting demands quantified security posture and ROI metrics
  • Staffing decisions need objective workload and capacity data
  • Compliance audits require documented SOC performance evidence

Do not use metrics as punitive measures against analysts — metrics should drive process improvement, not individual performance management.

Prerequisites

  • SIEM with 90+ days of incident and alert disposition data
  • Incident ticketing system (ServiceNow, Jira) with timestamp data for incident lifecycle
  • Analyst shift schedules and staffing data
  • ATT&CK Navigator for detection coverage tracking
  • Dashboard platform (Splunk, Grafana, or Power BI)

Workflow

Step 1: Define Core SOC Metrics Framework

Establish the key metrics aligned to NIST CSF functions:

Metric Definition Target NIST CSF
MTTD Time from threat occurrence to SOC detection <15 min Detect
MTTA Time from alert to analyst acknowledgment <5 min Respond
MTTI Time from acknowledgment to investigation start <10 min Respond
MTTC Time from investigation to containment <1 hour Respond
MTTR Time from detection to full resolution <4 hours Recover
FP Rate Percentage of false positive alerts <30% Detect
TP Rate Percentage of true positive alerts >40% Detect
Coverage ATT&CK techniques with active detection >60% Detect
Dwell Time Attacker time in network before detection <24 hours Detect
Escalation Rate % of Tier 1 alerts escalated to Tier 2/3 15-25% Respond

Step 2: Implement MTTD/MTTR Measurement

Mean Time to Detect (MTTD):

index=notable earliest=-30d status_label="Resolved*"
| eval mttd_seconds = _time - orig_time
| where mttd_seconds > 0 AND mttd_seconds < 86400  --- Exclude data quality issues
| stats avg(mttd_seconds) AS avg_mttd,
        median(mttd_seconds) AS med_mttd,
        perc90(mttd_seconds) AS p90_mttd,
        perc95(mttd_seconds) AS p95_mttd
  by urgency
| eval avg_mttd_min = round(avg_mttd / 60, 1)
| eval med_mttd_min = round(med_mttd / 60, 1)
| eval p90_mttd_min = round(p90_mttd / 60, 1)
| table urgency, avg_mttd_min, med_mttd_min, p90_mttd_min

Mean Time to Respond (MTTR):

index=notable earliest=-30d status_label="Resolved*"
| eval mttr_seconds = status_end - _time
| where mttr_seconds > 0 AND mttr_seconds < 604800  --- <7 days
| stats avg(mttr_seconds) AS avg_mttr,
        median(mttr_seconds) AS med_mttr,
        perc90(mttr_seconds) AS p90_mttr
  by urgency
| eval avg_mttr_hours = round(avg_mttr / 3600, 1)
| eval med_mttr_hours = round(med_mttr / 3600, 1)
| eval p90_mttr_hours = round(p90_mttr / 3600, 1)
| table urgency, avg_mttr_hours, med_mttr_hours, p90_mttr_hours

MTTD/MTTR Trend Over Time:

index=notable earliest=-90d status_label="Resolved*"
| eval mttd_min = (_time - orig_time) / 60
| eval mttr_hours = (status_end - _time) / 3600
| bin _time span=1w
| stats avg(mttd_min) AS avg_mttd_min, avg(mttr_hours) AS avg_mttr_hours,
        count AS incidents by _time
| table _time, incidents, avg_mttd_min, avg_mttr_hours

Step 3: Measure Alert Quality and Analyst Productivity

Alert Disposition Analysis:

index=notable earliest=-30d
| stats count AS total,
        sum(eval(if(status_label="Resolved - True Positive", 1, 0))) AS tp,
        sum(eval(if(status_label="Resolved - False Positive", 1, 0))) AS fp,
        sum(eval(if(status_label="Resolved - Benign", 1, 0))) AS benign,
        sum(eval(if(status_label="New" OR status_label="In Progress", 1, 0))) AS pending
| eval tp_rate = round(tp / total * 100, 1)
| eval fp_rate = round(fp / total * 100, 1)
| eval signal_noise = round(tp / (fp + 0.01), 2)
| table total, tp, fp, benign, pending, tp_rate, fp_rate, signal_noise

Analyst Productivity Metrics:

index=notable earliest=-30d status_label="Resolved*"
| stats count AS alerts_resolved,
        avg(eval((status_end - status_transition_time) / 60)) AS avg_triage_min,
        dc(rule_name) AS unique_rule_types
  by owner
| eval alerts_per_day = round(alerts_resolved / 30, 1)
| sort - alerts_resolved
| table owner, alerts_resolved, alerts_per_day, avg_triage_min, unique_rule_types

Shift-Based Workload Distribution:

index=notable earliest=-30d
| eval hour = strftime(_time, "%H")
| eval shift = case(
    hour >= 6 AND hour < 14, "Day (06-14)",
    hour >= 14 AND hour < 22, "Swing (14-22)",
    1=1, "Night (22-06)"
  )
| stats count AS alerts, dc(owner) AS analysts by shift
| eval alerts_per_analyst = round(alerts / analysts / 30, 1)
| table shift, alerts, analysts, alerts_per_analyst

Step 4: Track Detection Coverage

ATT&CK Coverage Score:

| inputlookup detection_rules_attack_mapping.csv
| stats dc(technique_id) AS covered_techniques by tactic
| join tactic type=left [
    | inputlookup attack_techniques_total.csv
    | stats dc(technique_id) AS total_techniques by tactic
  ]
| eval coverage_pct = round(covered_techniques / total_techniques * 100, 1)
| sort tactic
| table tactic, covered_techniques, total_techniques, coverage_pct

Data Source Coverage:

| inputlookup expected_data_sources.csv
| join data_source type=left [
    | tstats count where index=* by sourcetype
    | rename sourcetype AS data_source
    | eval status = "Active"
  ]
| eval source_status = if(isnotnull(status), "Collecting", "MISSING")
| stats count by source_status
| table source_status, count

Step 5: Build Executive Reporting Dashboard

Monthly SOC Executive Summary:

--- Incident summary by category
index=notable earliest=-30d status_label="Resolved*"
| stats count by urgency
| eval order = case(urgency="critical", 1, urgency="high", 2, urgency="medium", 3,
                    urgency="low", 4, urgency="informational", 5)
| sort order

--- Month-over-month comparison
index=notable earliest=-60d
| eval period = if(_time > relative_time(now(), "-30d"), "This Month", "Last Month")
| stats count by period, urgency
| chart sum(count) AS incidents by urgency, period

--- Top 5 incident categories
index=notable earliest=-30d status_label="Resolved - True Positive"
| top rule_name limit=5
| table rule_name, count, percent

Security Posture Scorecard:

| makeresults
| eval metrics = mvappend(
    "MTTD: 8.3 min (Target: <15 min) | STATUS: GREEN",
    "MTTR: 3.2 hours (Target: <4 hours) | STATUS: GREEN",
    "FP Rate: 27% (Target: <30%) | STATUS: GREEN",
    "Detection Coverage: 64% (Target: >60%) | STATUS: GREEN",
    "Analyst Utilization: 78% (Target: 60-80%) | STATUS: GREEN",
    "Incident Backlog: 12 (Target: <20) | STATUS: GREEN"
  )
| mvexpand metrics
| table metrics

Step 6: Implement Continuous Improvement Tracking

Track improvement initiatives and their impact:

--- Improvement initiative tracking
| inputlookup soc_improvement_initiatives.csv
| eval status_color = case(
    status="Completed", "green",
    status="In Progress", "yellow",
    status="Planned", "gray"
  )
| table initiative, start_date, target_date, status, metric_impact, baseline, current

Example initiatives:

initiative,start_date,target_date,status,metric_impact,baseline,current
Risk-Based Alerting,2024-01-15,2024-03-15,Completed,Alert Volume,-84%,287/day
Sigma Rule Library,2024-02-01,2024-04-01,In Progress,ATT&CK Coverage,61%,64%
SOAR Phishing Playbook,2024-02-15,2024-03-30,In Progress,Phishing MTTR,45min,18min
Analyst Training Program,2024-01-01,2024-06-30,In Progress,TP Rate,31%,41%

Key Concepts

Term Definition
MTTD Mean Time to Detect — average time from threat occurrence to SOC alert generation
MTTR Mean Time to Respond — average time from detection to incident resolution
MTTA Mean Time to Acknowledge — average time from alert generation to analyst assignment
Signal-to-Noise Ratio Ratio of true positive alerts to total alerts — higher is better
Dwell Time Duration an attacker remains undetected in the environment — key indicator of detection effectiveness
Analyst Utilization Percentage of analyst time spent on productive investigation vs. overhead tasks

Tools & Systems

  • Splunk Dashboard Studio: Advanced visualization framework for building interactive SOC metric dashboards
  • Grafana: Open-source analytics and visualization platform supporting multiple data sources
  • Power BI: Microsoft business intelligence tool for executive-level reporting and trend analysis
  • ATT&CK Navigator: MITRE tool for visualizing detection coverage as layered heatmaps
  • ServiceNow Performance Analytics: ITSM analytics module for tracking incident lifecycle metrics

Common Scenarios

  • Quarterly Business Review: Present MTTD/MTTR trends, detection coverage growth, and alert quality improvements
  • Staffing Justification: Use workload metrics to justify additional analyst headcount or shift adjustments
  • Tool ROI Assessment: Compare alert quality and response times before and after new tool deployment
  • Compliance Evidence: Provide documented SOC performance metrics for ISO 27001 or SOC 2 audits
  • Vendor Comparison: Benchmark SOC metrics against industry peers using surveys (SANS, Ponemon)

Output Format

SOC PERFORMANCE REPORT — March 2024
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

KEY METRICS:
  Metric              Current    Target     Trend    Status
  MTTD                8.3 min    <15 min    -12%     GREEN
  MTTR                3.2 hrs    <4 hrs     -18%     GREEN
  FP Rate             27%        <30%       -5%      GREEN
  TP Rate             41%        >40%       +3%      GREEN
  ATT&CK Coverage     64%        >60%       +3%      GREEN
  Alerts/Analyst/Day  24         <50        -84%     GREEN

INCIDENT SUMMARY:
  Total Incidents:     147 (Critical: 3, High: 23, Medium: 78, Low: 43)
  Avg Resolution:      3.2 hours (Critical: 1.8h, High: 2.9h, Medium: 4.1h)
  SLA Compliance:      94% (Target: >90%)

IMPROVEMENT HIGHLIGHTS:
  [1] RBA deployment reduced daily alerts from 1,847 to 287 (-84%)
  [2] New Sigma rules added 12 ATT&CK techniques to coverage
  [3] SOAR phishing playbook reduced phishing MTTR by 60%

AREAS FOR IMPROVEMENT:
  [1] Lateral movement detection coverage at 58% (below 60% target)
  [2] Night shift MTTD 23% slower than day shift
  [3] 4 critical vulnerability scan tickets overdue on SLA
Weekly Installs
2
GitHub Stars
1.3K
First Seen
1 day ago
Installed on
amp2
cline2
opencode2
cursor2
kimi-cli2
codex2