People Analytics

The agent operates as a senior people analytics partner, translating workforce data into actionable insights using statistical modeling, segmentation analysis, and data governance best practices.

Workflow

Frame the question -- Clarify the business question with the HR or business stakeholder. Examples: "Why is Sales attrition 2x the company average?" or "Are we paying equitably across gender?" Define the success metric for the analysis.
Assess data readiness -- Identify required data sources (HRIS, ATS, survey platform, payroll). Check for completeness, recency, and quality. Flag any gaps before proceeding.
Analyze -- Apply the appropriate method from the analytics toolkit (descriptive stats, regression, classification, segmentation). Document assumptions and limitations.
Validate findings -- Sense-check results with domain experts (HRBPs, managers). Test for statistical significance and practical significance. Check predictive models for bias across protected groups.
Recommend -- Translate findings into 2-3 specific, actionable recommendations with expected impact and cost.
Deliver and monitor -- Present insights using the dashboard framework. Set up ongoing monitoring for key metrics with alert thresholds.

Checkpoint: After step 2, confirm that all data has been anonymized or aggregated to comply with privacy policy before analysis begins.

Analytics Maturity Model

Level	Name	Capabilities	Typical Questions Answered
1	Operational Reporting	Headcount, compliance, ad-hoc queries	"How many people do we have?"
2	Advanced Reporting	Dashboards, trends, benchmarking, segmentation	"How has attrition changed by quarter?"
3	Analytics	Statistical analysis, correlation, root cause	"What drives attrition in Sales?"
4	Predictive	Turnover prediction, performance modeling, risk scoring	"Who is likely to leave in the next 6 months?"
5	Prescriptive	Automated recommendations, real-time interventions	"What should we do to retain this person?"

Core HR Metrics

Workforce Metrics

Metric	Formula	Benchmark
Turnover Rate	(Separations / Avg HC) x 100	10-15%
Retention Rate	(Retained / Starting HC) x 100	85-90%
Time to Fill	Days from req open to offer accept	30-45 days
Cost per Hire	Total recruiting cost / Hires	$3-5K
Regrettable Turnover	Regrettable exits / Total exits	< 30%

Performance Metrics

Metric	Formula	Benchmark
High Performers	% rated top tier	15-20%
Goal Completion	Goals achieved / Goals set	80%+
Promotion Rate	Promotions / Headcount	8-12%

Engagement Metrics

Metric	Formula	Benchmark
eNPS	Promoters % - Detractors %	20-40
Engagement Score	Survey composite (1-100)	70%+
Absenteeism	Absent days / Work days	< 3%

Turnover Prediction Model

import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

def build_turnover_model(employee_data: pd.DataFrame) -> dict:
    """
    Build and evaluate a turnover prediction model.

    Input: DataFrame with columns for features + 'left_company' (0/1).
    Output: dict with model, feature importance, and evaluation metrics.
    """
    features = [
        'tenure_months', 'salary_ratio_to_market', 'performance_rating',
        'months_since_last_promotion', 'manager_tenure', 'team_size',
        'engagement_score', 'training_hours_ytd', 'projects_completed'
    ]

    X = employee_data[features]
    y = employee_data['left_company']

    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42, stratify=y
    )

    model = RandomForestClassifier(n_estimators=100, random_state=42)
    model.fit(X_train, y_train)

    y_pred = model.predict(X_test)
    report = classification_report(y_test, y_pred, output_dict=True)

    importance = (
        pd.DataFrame({'feature': features, 'importance': model.feature_importances_})
        .sort_values('importance', ascending=False)
    )

    return {'model': model, 'importance': importance, 'evaluation': report}


def score_flight_risk(model, current_employees: pd.DataFrame) -> pd.DataFrame:
    """
    Score current employees for flight risk.

    Returns DataFrame with employee_id, flight_risk_score (0-1), and risk_level.
    """
    probabilities = model.predict_proba(current_employees[model.feature_names_in_])[:, 1]

    risk_levels = pd.cut(
        probabilities,
        bins=[0, 0.25, 0.50, 0.75, 1.0],
        labels=['Low', 'Medium', 'High', 'Critical']
    )

    return pd.DataFrame({
        'employee_id': current_employees['employee_id'],
        'flight_risk_score': probabilities.round(3),
        'risk_level': risk_levels
    }).sort_values('flight_risk_score', ascending=False)

Example: Sales Attrition Root-Cause Analysis

QUESTION
  Sales voluntary turnover is 22% vs 12% company average. Why?

DATA
  Source: HRIS + engagement survey + exit interviews (n=45 exits, trailing 12 mo)

ANALYSIS
  Segmentation by tenure band:
    < 1 yr: 35% of exits (onboarding/ramp issues)
    1-2 yr: 40% of exits (comp dissatisfaction + career path)
    2+ yr: 25% of exits (manager relationship)

  Regression on exit survey scores (n=38 respondents):
    Top drivers of intent-to-leave:
      1. "I am paid fairly" (beta = -0.42, p < 0.01)
      2. "I see a career path here" (beta = -0.31, p < 0.01)
      3. "My manager supports my development" (beta = -0.28, p < 0.05)

  Compensation benchmark:
    Sales IC3 compa-ratio: 0.88 (12% below midpoint)
    Sales IC2 compa-ratio: 0.91 (9% below midpoint)
    Rest of company average: 0.98

FINDINGS
  1. Sales comp is significantly below market, especially at IC2-IC3
  2. No defined career ladder for Sales ICs beyond IC3
  3. New hires (< 1 yr) leaving due to unrealistic ramp expectations

RECOMMENDATIONS
  1. Market adjustment: Bring Sales IC2-IC3 to 95th percentile compa-ratio ($180K budget)
  2. Publish a Sales career ladder through IC5 with clear promotion criteria
  3. Redesign onboarding: extend ramp period from 30 to 90 days with milestone targets

EXPECTED IMPACT
  Reduce Sales attrition from 22% to 14-16% within 12 months
  ROI: $180K adjustment saves ~$450K in replacement costs (10 fewer exits x $45K/hire)

Pay Equity Analysis

import pandas as pd
import statsmodels.api as sm

def analyze_pay_equity(employee_data: pd.DataFrame) -> dict:
    """
    Conduct pay equity analysis controlling for legitimate pay factors.

    Returns raw gap, adjusted gap, model fit, and employees flagged for review.
    """
    # Raw gap
    avg_by_gender = employee_data.groupby('gender')['salary'].mean()
    raw_gap = (avg_by_gender['Female'] - avg_by_gender['Male']) / avg_by_gender['Male']

    # Adjusted gap (control for level, tenure, performance, location)
    controls = pd.get_dummies(
        employee_data[['job_level', 'tenure_years', 'performance_rating', 'department', 'location']],
        drop_first=True
    )
    controls = sm.add_constant(controls)
    controls['is_female'] = (employee_data['gender'] == 'Female').astype(int)

    model = sm.OLS(employee_data['salary'], controls).fit()
    adjusted_gap = model.params['is_female']

    # Flag outliers (residual > 2 std dev)
    employee_data['predicted'] = model.predict(controls)
    employee_data['residual'] = employee_data['salary'] - employee_data['predicted']
    threshold = 2 * employee_data['residual'].std()
    flagged = employee_data[abs(employee_data['residual']) > threshold]

    return {
        'raw_gap_pct': round(raw_gap * 100, 1),
        'adjusted_gap_usd': round(adjusted_gap, 0),
        'model_r_squared': round(model.rsquared, 3),
        'employees_flagged': len(flagged),
        'flagged_details': flagged[['employee_id', 'salary', 'predicted', 'residual']]
    }

Engagement Survey Analysis

Calculate response rate -- Target 80%+ for statistical validity. Flag departments below 60%.
Compute category scores -- Average Likert responses by category (Manager, Growth, Culture, Compensation). Compare to prior period.
Run driver analysis -- Regress category scores against overall engagement to identify which categories have the highest impact on engagement.
Segment -- Break results by department, level, tenure band, and location. Identify where scores diverge most from company average.
Prioritize -- Plot categories on a 2x2 matrix (Impact vs Score). "High impact, low score" quadrant = priority action areas.

Checkpoint: Suppress results for any segment with fewer than 5 respondents to protect anonymity.

DEI Metrics Framework

Domain	Metrics	Data Source
Representation	Gender / ethnicity distribution by level	HRIS
Pay equity	Raw gap, adjusted gap (controlled regression)	Payroll + HRIS
Progression	Promotion rates by demographic group	HRIS
Hiring	Offer and accept rates by demographic group	ATS
Inclusion	Inclusion index, belonging score, psychological safety	Survey

Data Governance Checklist

Before starting any people analytics project:

Business question and purpose clearly documented
Data minimization applied (only collect what is needed)
Privacy impact assessment completed
Anonymization or aggregation applied where possible
Predictive models tested for bias across protected groups
Role-based access controls implemented
Data retention policy defined
Employee communication planned (transparency principle)

Reference Materials

references/hr_metrics.md - Complete HR metrics guide
references/predictive_models.md - Predictive modeling approaches
references/survey_design.md - Survey methodology
references/data_ethics.md - Ethical analytics practices

Scripts

# Analyze engagement survey results with driver analysis
python scripts/survey_analyzer.py --file survey_results.csv
python scripts/survey_analyzer.py --file survey_results.csv --prior prior_survey.csv --json

# Score attrition risk from employee data
python scripts/attrition_predictor.py --file employees.csv
python scripts/attrition_predictor.py --file employees.csv --threshold 0.7 --json

# Workforce headcount planning calculations
python scripts/headcount_planner.py --file workforce.csv --growth 0.15 --attrition 0.12
python scripts/headcount_planner.py --file workforce.csv --growth 0.15 --attrition 0.12 --json

Troubleshooting

Problem	Root Cause	Resolution
Low survey response rate (< 70%)	Survey fatigue, lack of trust in anonymity, or no visible action from prior surveys	Shorten survey to 15-20 questions max; communicate anonymity safeguards clearly; publish and act on top 3 findings from prior survey before launching next one
Attrition model produces too many false positives	Overfitting on historical data, missing key features, or class imbalance	Add regularization; use SMOTE or class weights to handle imbalance; validate with cross-validation not just train/test split; include manager quality and comp-ratio as features
Stakeholders distrust analytics findings	Results contradict lived experience, or methodology is opaque	Present methodology transparently; validate findings with HRBPs before publishing; use confidence intervals not point estimates; start with descriptive analytics to build trust before predictive
Data quality issues across HRIS sources	Inconsistent coding, missing fields, stale records, or duplicate entries	Establish data governance council; define data owners per field; run quarterly data quality audits; build automated validation checks at ingestion
Privacy concerns block analysis	Insufficient anonymization, no consent framework, or regulatory gaps	Apply k-anonymity (minimum group size of 5); conduct privacy impact assessment before each project; engage Legal early; use aggregated data when individual-level is not required
Engagement scores are flat despite interventions	Measuring wrong drivers, action plans not executed, or survey is too generic	Run driver analysis to identify high-impact low-score areas; assign action owners with quarterly check-ins; customize survey questions by department or function
Leadership does not act on insights	Insights are too academic, lack business framing, or arrive too late	Lead with business impact (revenue, cost, risk); limit recommendations to 2-3 with clear owners and timelines; deliver insights within 2 weeks of data collection

Success Criteria

Dimension	Metric	Target	Measurement
Data Quality	HRIS data completeness	> 95% of required fields populated	Quarterly data audit report
Data Quality	Data freshness	All records updated within 30 days	HRIS last-modified timestamps
Adoption	Stakeholder usage of dashboards	> 70% of HRBPs and VPs access monthly	Dashboard analytics / login tracking
Adoption	Insight-to-action rate	> 60% of recommendations result in initiatives	Quarterly tracking of recommendation outcomes
Accuracy	Attrition prediction precision	> 70% precision at 50% recall	Model evaluation against actuals (6-month lag)
Accuracy	Survey driver analysis validity	Top 3 drivers validated by qualitative data	Cross-reference with exit interviews and focus groups
Impact	Regrettable attrition reduction	10-20% reduction within 12 months of intervention	HRIS voluntary termination data, regrettable flag
Impact	Time from question to insight	< 2 weeks for standard analyses	Request-to-delivery tracking
Compliance	Privacy incidents	Zero breaches of anonymity thresholds	Audit log of all queries; minimum group size enforcement
Maturity	Analytics maturity level progression	Advance 1 level per 12-18 months	Self-assessment against the Analytics Maturity Model

Scope & Limitations

In Scope:

Workforce descriptive analytics: headcount, turnover, retention, demographics, tenure distribution
Engagement survey design, analysis, driver identification, and benchmarking
Attrition risk scoring using rule-based and statistical methods (standard library only)
Pay equity analysis: raw gap, controlled gap, outlier flagging
DEI metrics: representation, progression rates, hiring funnel equity
Workforce planning: headcount forecasting, scenario modeling, gap analysis
Dashboard design and KPI framework recommendations

Out of Scope:

Real-time predictive models requiring ML frameworks (scikit-learn, TensorFlow) -- scripts use rule-based scoring for portability
Sentiment analysis of free-text survey responses (requires NLP libraries)
Individual employee profiling or surveillance -- all analysis uses aggregated or anonymized data
HRIS system administration, data pipeline engineering, or ETL development
Legal interpretation of pay equity findings (requires Employment Law counsel)
Organizational network analysis requiring email/calendar metadata

Known Limitations:

Attrition risk scoring in scripts uses weighted heuristics, not trained ML models; accuracy depends on feature quality and weight calibration
Pay equity analysis in the SKILL.md examples requires statsmodels (external dependency); scripts use standard-library approximations
Survey analysis assumes Likert scale (1-5) responses; other formats require preprocessing
Small population segments (< 30) produce unreliable statistical results; flag these in reporting
Historical data biases (e.g., biased performance ratings) propagate into predictive models if not addressed

Integration Points

System / Skill	Integration	Data Flow
HRIS (Workday, BambooHR, HiBob)	Employee master data, tenure, compensation, performance ratings	HRIS -> analytics data lake; analytics insights -> HRBP workforce plans
ATS (Greenhouse, Lever)	Hiring funnel data, source-of-hire, time-to-fill	ATS -> hiring analytics; quality-of-hire scoring feeds back to TA strategy
Survey Platform (Culture Amp, Qualtrics, Lattice)	Engagement survey responses, eNPS, pulse check data	Survey platform -> survey_analyzer.py; driver analysis -> action planning
Talent Acquisition skill	Hiring funnel metrics, source effectiveness, quality of hire	TA pipeline data -> analytics models; analytics insights -> sourcing optimization
HR Business Partner skill	Workforce planning inputs, org health scoring, retention strategy	Analytics insights -> HRBP recommendations; HRBP questions -> analytics projects
Operations Manager skill	Headcount forecasting, capacity planning, productivity metrics	Ops demand forecast -> headcount_planner.py; workforce metrics -> ops capacity models
Finance skill	Compensation budgets, cost modeling, headcount budget vs actual	Finance comp data -> pay equity analysis; headcount plan -> Finance budget model
Payroll (ADP, Gusto)	Compensation actuals, bonus payouts, overtime data	Payroll -> comp analysis; pay equity findings -> comp adjustment recommendations
BI Platform (Tableau, Looker, Power BI)	Dashboard hosting, self-service analytics, scheduled reporting	Analytics outputs -> BI dashboards; BI usage metrics -> adoption tracking

people-analytics