WebFlux Test Reviewer

Evaluate WebFlux technical assessments from multiple candidates. Read the problem statement, analyze each candidate's code, and produce a structured evaluation report.

Project Structure

The assessment project must follow this layout:

project-root/
├── statement/
│   └── *.md                    # The technical assessment problem statement
├── candidate-name-1/
│   ├── (cloned repo contents)
│   └── EVALUATION.md           # ← You generate this
├── candidate-name-2/
│   ├── (cloned repo contents)
│   └── EVALUATION.md           # ← You generate this
└── ...

Every directory at the root that is NOT statement/ is a candidate. Extract the candidate's display name from the directory name (e.g., jonathan-camano → Jonathan Camano).

Execution Flow

1. Read the Statement

Read all markdown files in statement/. This is your baseline — everything the candidate delivers gets contrasted against what was actually requested. Understanding the statement deeply is critical because it lets you catch candidates who missed requirements, over-engineered, or solved a different problem.

2. Evaluate All Candidates

Process all candidates automatically. If sub-agents are available, use them to evaluate candidates in parallel with dangerously_trust_all_tools: true to avoid hanging on tool approvals — each candidate is independent so this is safe. Otherwise, evaluate sequentially.

For each candidate:

Explore the codebase selectively. Don't read every file. Focus on these key files:

README.md — for readme evaluation
build.gradle / pom.xml — to identify framework, dependencies, and scaffold usage
Dockerfile, docker-compose.yml — for infra evaluation
IaC files (*.tf, cloudformation.*, etc.) — if they exist
application.yml / application.properties — for configuration and secrets check
Controllers/Entry points — to evaluate API design and reactive usage
Use cases / Services — to evaluate business logic, operators, and exception handling
Global exception handler — to evaluate error management
Test directory structure — to verify tests exist and estimate coverage (don't read every test file, check count and which layers are tested)
.gitignore — to check what's tracked

Use directory listings and file structure to assess architecture without reading every class.

Check git history. Run git log --oneline and git branch -a inside the candidate's directory. This reveals commit discipline, branch strategy, and work progression.

Apply evaluation criteria. Evaluate each criterion from references/EVALUATION_CRITERIA.md. The criteria cover: Readme, Git, Webflux, Tests, Architecture, Clean Code, Exception Handling, IaC, Dockerfile, and Docker-compose. Each evaluation is free-text — describe what you found, not just "yes/no".

Contrast against the statement. For each criterion, consider whether the candidate fulfilled what was asked. If the statement required a CRUD API and the candidate built something else, that matters.

Detect AI misuse. Look for obvious AI-generated comments ("si quieres te puedo dar ideas..."), style inconsistencies across the codebase, and suspicious patterns like technically perfect code that shows no understanding of the business context. AI usage isn't inherently bad — what matters is whether the candidate understands what was generated. Note findings and turn them into interview questions.

Flag exposed secrets. If you find hardcoded credentials, API keys, or passwords, mention it in the relevant criterion. This isn't a formal criterion but it's worth noting.

Generate interview questions. Based on inconsistencies, questionable decisions, missing justifications, or interesting technical choices you found, generate questions to probe the candidate's understanding. No limit — generate as many as the code warrants.

Write the conclusion. A free-text overall assessment. Be direct and honest.

3. Generate EVALUATION.md

Write the report in each candidate's directory using this format:

# [Candidate Name]

## BACK:
• Readme: [free-text evaluation]
• GIT: [free-text evaluation]
• Webflux: [free-text evaluation]
• Test Unitarios: [free-text evaluation]
• Arquitectura: [free-text evaluation]
• Clean Code: [free-text evaluation]
• Manejo de Excepciones: [free-text evaluation]
• IaC: [free-text evaluation]
• DockerFile: [free-text evaluation]
• Docker-compose: [free-text evaluation]

## Preguntas:
• [question based on code findings]
• [question based on code findings]
• ...

## Conclusión:
[free-text overall assessment]

Write the report in the same language as the statement. Spanish statement → Spanish report. English statement → English report.

Writing style for evaluations: Be concise and direct. State the verdict and the key finding, don't repeat the same idea with different words. Avoid filler phrases.

Bad (redundant): Existe pero es muy básico. Copia el JAR pre-compilado, no hace build multi-stage. Funcional pero no óptimo. Good (concise): Solo copia el JAR, no hace build multi-stage.

Bad (verbose): Arquitectura por capas (controller, service, repository, model, dto, mapper). Usa interfaces para servicios y DTOs separados, lo cual es positivo. Sin embargo, es una estructura plana sin inversión de dependencias. No hay separación de dominio e infraestructura a nivel de módulo. No aplica Clean Architecture ni el scaffold de Bancolombia. El dominio (modelos JPA) está acoplado directamente a la infraestructura de persistencia. Arquitectura básica. Good (concise): Arquitectura por capas básica (controller-service-repository). Sin inversión de dependencias ni separación dominio/infraestructura. Modelos JPA acoplados a persistencia.

The rule: say it once, say it well, move on.

4. WebFlux Technical Reference

When evaluating reactive code quality, reference the webflux-reactive-patterns skill if available. Key things to verify:

flatMap() for async operations vs map() for sync transformations
No blocking calls (block(), Thread.sleep(), blocking I/O)
Lazy error handling with Mono.defer(() -> Mono.error(...))
Parallel operations with Mono.zip() where applicable
switchIfEmpty() for empty stream handling
No imperative constructs (if/else, throw) inside reactive chains

Evaluation Criteria Reference

For detailed criteria with examples of good and bad evaluations, read references/EVALUATION_CRITERIA.md. Consult it when you need guidance on what specifically to look for in each criterion.

Setup Script

To quickly create the candidate directory structure from GitHub URLs, run scripts/setup_candidates.sh:

./scripts/setup_candidates.sh https://github.com/user1/repo https://github.com/user2/repo

This creates a directory per GitHub user and clones their repo inside it. After running, add a statement/ directory with the assessment markdown.

Report Template

The exact template with placeholders is at assets/EVALUATION_TEMPLATE.md.

webflux-test-reviewer

WebFlux Test Reviewer

Project Structure

Execution Flow

1. Read the Statement

2. Evaluate All Candidates

3. Generate EVALUATION.md

4. WebFlux Technical Reference

Evaluation Criteria Reference

Setup Script

Report Template

More from jheisonmb/skills

webflux-reactive-patterns

mindful-precision

task-trigger

rust-idiomatic-patterns

code-design

texforge