Interview System Designer

The agent designs role-specific interview loops, generates competency-based question banks with scoring rubrics, and detects interviewer bias through statistical calibration analysis.

Quick Start

# Design a complete interview loop for a senior software engineer role
python loop_designer.py --role "Senior Software Engineer" --level senior --team platform --output loops/

# Generate a question bank for a product manager position
python question_bank_generator.py --role "Product Manager" --level senior --competencies leadership,strategy,analytics --output questions/

# Analyze interview calibration across candidates and interviewers
python hiring_calibrator.py --input interview_data.json --output calibration_report.json --analysis-type full

Core Workflows

Workflow 1: Design an Interview Loop

Define role requirements (title, level, team, 3-5 critical competencies)
Run loop_designer.py with role parameters to generate rounds, time allocations, and scorecards
Review generated loop for competency coverage -- every required competency maps to at least one round
Customize interviewer skill requirements per round
Validation checkpoint: 100% competency coverage; no round exceeds 90 minutes; total loop under 6 hours

python loop_designer.py --role "Staff Data Scientist" --level staff \
  --competencies ml,statistics,leadership --format json --output loops/ds-staff.json

Workflow 2: Generate a Question Bank

Identify target role and experience level
Select competency areas and question types (technical, behavioral, situational)
Run question_bank_generator.py to produce questions with scoring rubrics
Review for duplicate or overlapping questions across competency areas
Validation checkpoint: <15% duplicate rate; each competency has 3+ questions; calibration examples (poor/good/great) present for every question

python question_bank_generator.py --role "Frontend Engineer" \
  --competencies react,typescript,system-design --num-questions 30

Workflow 3: Calibrate Hiring Bar

Collect interview results data (minimum 10 records for statistical significance)
Run hiring_calibrator.py with comprehensive analysis
Review interviewer deviation metrics -- flag anyone >0.5 standard deviations from team mean
Generate coaching recommendations for flagged interviewers
Validation checkpoint: Bias detection precision >80%; score distribution follows target (20/40/30/10 split)

python hiring_calibrator.py --input q1_interviews.json \
  --analysis-type comprehensive --trend-analysis --period quarterly

Interview Loop Templates

Software Engineering Loops

Level	Duration	Rounds	Focus Areas
Junior/Mid (2-4 yr)	3-4 hours	3-4	Coding fundamentals, debugging, system basics, growth mindset
Senior (5-8 yr)	4-5 hours	4-5	System design, technical leadership, mentoring, code quality
Staff+ (8+ yr)	5-6 hours	5-6	Architecture vision, org impact, technical strategy, cross-functional leadership

Senior Software Engineer Example:

Technical Phone Screen (45min) -- Advanced algorithms, optimization
System Design (60min) -- Scalability, trade-offs, architectural decisions
Coding Excellence (60min) -- Code quality, testing strategies, refactoring
Technical Leadership (45min) -- Mentoring, technical decisions, cross-team collaboration
Behavioral & Culture (30min) -- Leadership examples, conflict resolution

Sample Questions by Level

Junior: "Implement a function to find the second largest element in an array" Senior: "Design a real-time chat system supporting 1M concurrent users" Staff+: "How would you evaluate and introduce a new programming language to the organization?"

Behavioral (STAR Method):

"Tell me about a time you had to influence a decision without formal authority"
"Walk me through a time when you had to make a decision with incomplete information"

Scoring Rubric

4-Point Scale

Score	Label	Description
4	Exceeds	Demonstrates mastery beyond required level
3	Meets	Solid performance meeting all requirements
2	Partial	Shows potential but has development areas
1	Does Not Meet	Significant gaps in required competencies

Calibration Benchmarks

Target distribution: 20% (4s), 40% (3s), 30% (2s), 10% (1s)
Interviewer consistency: <0.5 std dev from team average
Pass rate: 15-25% for most roles
New hire correlation: >0.6 between interview scores and 6-month performance

Anti-Patterns

Unstandardized loops -- different question sets per candidate prevent fair comparison; always use structured guides
Halo effect scoring -- one strong answer inflates all dimensions; score each competency independently before debrief
Similarity bias -- favoring candidates with similar backgrounds; require diverse panels and rotate assignments
Skipping calibration -- interviewers drift over time without regular calibration sessions (monthly minimum)
Over-indexing on algorithms -- testing LeetCode for a staff role that requires architecture and leadership; match round focus to actual job requirements
No debrief structure -- unstructured debriefs lead to anchoring on the loudest voice; require independent score submission before group discussion

Troubleshooting

Problem	Cause	Solution
Loop designer produces generic rounds with no role-specific focus	The `--competencies` flag was omitted, so the tool falls back to default competency mapping for the role family	Re-run with explicit `--competencies` listing the 3-5 most critical skills for the position
Question bank output has too many behavioral questions and too few technical ones	The `--question-types` flag was not provided, causing the generator to use a balanced default split	Supply `--question-types technical,system-design` (or whichever mix is needed) to control the ratio
Hiring calibrator reports "insufficient data" for bias detection	The input JSON contains fewer than 10 interview records, which is below the statistical minimum	Collect more interview data before running bias analysis; use `--analysis-type scoring` for small datasets
Calibrator trend analysis returns empty results	The input data lacks date fields or all records fall within a single period	Ensure each interview record has a valid date field and that the dataset spans multiple periods matching `--period`
Loop designer ignores the `--team` flag	The team value does not match any of the predefined team mappings in the tool	Check supported team names in the tool's `TEAM_CONFIGS` dictionary, or omit `--team` and rely on competency overrides
Score distribution chart shows all interviewers clustered at the same score	Interviewers are not applying the full 1-4 rubric scale (central tendency bias)	Run `--analysis-type calibration` to identify leniency/severity patterns and use the coaching recommendations
Question bank generates duplicate questions across competency areas	Overlapping competency keywords (e.g., "leadership" appears in both behavioral and technical mappings)	Use more specific competency terms or reduce `--num-questions` to avoid exhausting the unique question pool

Success Criteria

Interview loop coverage: Every generated loop maps 100% of required competencies to at least one round with a dedicated scoring dimension.
Question bank diversity: Generated banks contain no more than 15% duplicate or near-duplicate questions across competency areas.
Calibration detection accuracy: Bias detection flags interviewer score deviation greater than 0.5 standard deviations from the team mean with at least 80% precision.
Time-to-design reduction: Designing a complete interview loop (rounds, scorecards, question sets) takes under 10 minutes compared to the typical 2-4 hours of manual design.
Rubric consistency: Generated scoring rubrics achieve inter-rater reliability (Cohen's kappa) of 0.7 or higher when tested with calibration panels.
Candidate experience alignment: Loops designed with this tool target a candidate experience satisfaction score of 4.0/5.0 or above.
Hiring quality signal: Organizations using the calibrator report a correlation of 0.6 or higher between interview scores and 6-month performance reviews.

Scope & Limitations

This skill covers:

Designing end-to-end interview loops for engineering, product, design, and data roles across all seniority levels (junior through principal)
Generating competency-based question banks with structured scoring rubrics and calibration examples
Detecting statistical bias and calibration drift across interviewers and time periods
Producing scorecard templates, debrief guides, and interviewer assignment recommendations

This skill does NOT cover:

Applicant tracking system (ATS) integration, job posting, or candidate sourcing pipeline management — see hr-operations/talent-acquisition
Compensation benchmarking, offer negotiation strategy, or total rewards analysis — see hr-operations/hr-business-partner
Workforce planning, headcount modeling, or organizational design — see hr-operations/people-analytics
Post-hire onboarding program design or new-hire ramp-up tracking — see engineering/codebase-onboarding

Integration Points

Skill	Integration	Data Flow
`hr-operations/talent-acquisition`	Feed designed interview loops and scorecards into the talent acquisition pipeline for end-to-end hiring execution	Loop JSON output → talent acquisition workflow input
`hr-operations/people-analytics`	Supply calibration reports and interviewer performance data for workforce-level hiring analytics	Calibrator JSON reports → people analytics dashboards
`engineering/codebase-onboarding`	Hand off hired candidate profiles and assessed competency gaps to onboarding plan generation	Scorecard results → onboarding skill-gap inputs
`hr-operations/hr-business-partner`	Provide interview quality metrics and pass-rate data to support hiring bar discussions with HR leadership	Calibration trend data → HRBP quarterly reviews
`product-team`	Align PM interview loop competencies with the product team's competency frameworks and role leveling guides	Competency matrix → PM loop designer `--competencies` input
`engineering/pr-review-expert`	Use coding round evaluation criteria to inform code review standards for new hires during their ramp period	Scoring rubric technical criteria → PR review checklist alignment

Tool Reference

loop_designer.py

Purpose: Generates calibrated interview loops tailored to specific roles, levels, and teams. Produces complete loops with rounds, focus areas, time allocation, interviewer skill requirements, and scorecard templates.

Usage:

python loop_designer.py --role "Senior Software Engineer" --level senior --team platform --output loops/

Flags/Parameters:

Flag	Type	Required	Default	Description
`--role`	`str`	No	—	Job role title (e.g., "Senior Software Engineer")
`--level`	`str`	No	—	Experience level: `junior`, `mid`, `senior`, `staff`, `principal`
`--team`	`str`	No	—	Team or department name (optional context for loop customization)
`--competencies`	`str`	No	—	Comma-separated list of specific competencies to focus on
`--input`	`str`	No	—	Input JSON file with role definition
`--output`	`str`	No	—	Output directory or file path
`--format`	`str`	No	`both`	Output format: `json`, `text`, or `both`

Example:

python loop_designer.py --role "Staff Data Scientist" --level staff --competencies ml,statistics,leadership --format json --output loops/ds-staff.json

Output Formats:

JSON: Structured loop definition with rounds array, competency mappings, time allocations, and scorecard templates suitable for programmatic consumption.
Text: Human-readable interview guide with formatted round descriptions, interviewer requirements, and evaluation criteria.
Both (default): Writes both JSON and text outputs to the specified directory.

question_bank_generator.py

Purpose: Generates comprehensive, competency-based interview questions with detailed scoring criteria, follow-up probes, and calibration examples organized by competency area.

Usage:

python question_bank_generator.py --role "Frontend Engineer" --competencies react,typescript,system-design --output questions/

Flags/Parameters:

Flag	Type	Required	Default	Description
`--role`	`str`	No	—	Job role title (e.g., "Frontend Engineer")
`--level`	`str`	No	`senior`	Experience level: `junior`, `mid`, `senior`, `staff`, `principal`
`--competencies`	`str`	No	—	Comma-separated list of competencies to focus on
`--question-types`	`str`	No	—	Comma-separated list of question types: `technical`, `behavioral`, `situational`
`--num-questions`	`int`	No	`20`	Number of questions to generate
`--input`	`str`	No	—	Input JSON file with role requirements
`--output`	`str`	No	—	Output directory or file path
`--format`	`str`	No	`both`	Output format: `json`, `text`, or `both`

Example:

python question_bank_generator.py --role "Product Manager" --level mid --question-types behavioral,situational --num-questions 30 --format text

Output Formats:

JSON: Array of question objects each containing the question text, competency area, difficulty level, scoring rubric (1-4 scale), follow-up probes, and calibration examples (poor/good/great answers).
Text: Formatted question bank grouped by competency with inline scoring guidance and example answers for interviewer reference.
Both (default): Writes both JSON and text outputs to the specified directory.

hiring_calibrator.py

Purpose: Analyzes interview scores from multiple candidates and interviewers to detect bias, calibration issues, and inconsistent rubric application. Generates calibration reports with recommendations for interviewer coaching and process improvements.

Usage:

python hiring_calibrator.py --input interview_results.json --analysis-type comprehensive --output report.json

Flags/Parameters:

Flag	Type	Required	Default	Description
`--input`	`str`	Yes	—	Input JSON file with interview results data
`--analysis-type`	`str`	No	`comprehensive`	Analysis type: `comprehensive`, `bias`, `calibration`, `interviewer`, `scoring`
`--competencies`	`str`	No	—	Comma-separated list of competencies to focus on
`--trend-analysis`	flag	No	`false`	Enable trend analysis over time
`--period`	`str`	No	`monthly`	Trend period: `daily`, `weekly`, `monthly`, `quarterly`
`--output`	`str`	No	—	Output file path
`--format`	`str`	No	`both`	Output format: `json`, `text`, or `both`

Example:

python hiring_calibrator.py --input q1_interviews.json --analysis-type bias --competencies technical,leadership --trend-analysis --period quarterly --format json --output calibration/q1_bias.json

Output Formats:

JSON: Structured calibration report containing score distributions, interviewer deviation metrics, bias indicators, trend data (if enabled), and prioritized coaching recommendations.
Text: Human-readable report with summary statistics, flagged interviewers, bias findings, and actionable improvement recommendations formatted for management review.
Both (default): Writes both JSON and text outputs to the specified path.

interview-system-designer

Interview System Designer

Quick Start

Core Workflows

Workflow 1: Design an Interview Loop

Workflow 2: Generate a Question Bank

Workflow 3: Calibrate Hiring Bar

Interview Loop Templates

Software Engineering Loops

Sample Questions by Level

Scoring Rubric

4-Point Scale

Calibration Benchmarks

Anti-Patterns

Troubleshooting

Success Criteria

Scope & Limitations

Integration Points

Tool Reference

loop_designer.py

question_bank_generator.py

hiring_calibrator.py