webflux-test-reviewer
WebFlux Test Reviewer
Evaluate WebFlux technical assessments from multiple candidates. Read the problem statement, analyze each candidate's code, and produce a structured evaluation report.
Project Structure
The assessment project must follow this layout:
project-root/
├── statement/
│ └── *.md # The technical assessment problem statement
├── candidate-name-1/
│ ├── (cloned repo contents)
│ └── EVALUATION.md # ← You generate this
├── candidate-name-2/
│ ├── (cloned repo contents)
│ └── EVALUATION.md # ← You generate this
└── ...
Every directory at the root that is NOT statement/ is a candidate. Extract the candidate's display name from the directory name (e.g., jonathan-camano → Jonathan Camano).
Execution Flow
1. Read the Statement
Read all markdown files in statement/. This is your baseline — everything the candidate delivers gets contrasted against what was actually requested. Understanding the statement deeply is critical because it lets you catch candidates who missed requirements, over-engineered, or solved a different problem.
2. Evaluate All Candidates
Process all candidates automatically. If sub-agents are available, use them to evaluate candidates in parallel with dangerously_trust_all_tools: true to avoid hanging on tool approvals — each candidate is independent so this is safe. Otherwise, evaluate sequentially.
For each candidate:
Explore the codebase selectively. Don't read every file. Focus on these key files:
README.md— for readme evaluationbuild.gradle/pom.xml— to identify framework, dependencies, and scaffold usageDockerfile,docker-compose.yml— for infra evaluation- IaC files (
*.tf,cloudformation.*, etc.) — if they exist application.yml/application.properties— for configuration and secrets check- Controllers/Entry points — to evaluate API design and reactive usage
- Use cases / Services — to evaluate business logic, operators, and exception handling
- Global exception handler — to evaluate error management
- Test directory structure — to verify tests exist and estimate coverage (don't read every test file, check count and which layers are tested)
.gitignore— to check what's tracked
Use directory listings and file structure to assess architecture without reading every class.
Check git history. Run git log --oneline and git branch -a inside the candidate's directory. This reveals commit discipline, branch strategy, and work progression.
Apply evaluation criteria. Evaluate each criterion from references/EVALUATION_CRITERIA.md. The criteria cover: Readme, Git, Webflux, Tests, Architecture, Clean Code, Exception Handling, IaC, Dockerfile, and Docker-compose. Each evaluation is free-text — describe what you found, not just "yes/no".
Contrast against the statement. For each criterion, consider whether the candidate fulfilled what was asked. If the statement required a CRUD API and the candidate built something else, that matters.
Detect AI misuse. Look for obvious AI-generated comments ("si quieres te puedo dar ideas..."), style inconsistencies across the codebase, and suspicious patterns like technically perfect code that shows no understanding of the business context. AI usage isn't inherently bad — what matters is whether the candidate understands what was generated. Note findings and turn them into interview questions.
Flag exposed secrets. If you find hardcoded credentials, API keys, or passwords, mention it in the relevant criterion. This isn't a formal criterion but it's worth noting.
Generate interview questions. Based on inconsistencies, questionable decisions, missing justifications, or interesting technical choices you found, generate questions to probe the candidate's understanding. No limit — generate as many as the code warrants.
Write the conclusion. A free-text overall assessment. Be direct and honest.
3. Generate EVALUATION.md
Write the report in each candidate's directory using this format:
# [Candidate Name]
## BACK:
• Readme: [free-text evaluation]
• GIT: [free-text evaluation]
• Webflux: [free-text evaluation]
• Test Unitarios: [free-text evaluation]
• Arquitectura: [free-text evaluation]
• Clean Code: [free-text evaluation]
• Manejo de Excepciones: [free-text evaluation]
• IaC: [free-text evaluation]
• DockerFile: [free-text evaluation]
• Docker-compose: [free-text evaluation]
## Preguntas:
• [question based on code findings]
• [question based on code findings]
• ...
## Conclusión:
[free-text overall assessment]
Write the report in the same language as the statement. Spanish statement → Spanish report. English statement → English report.
Writing style for evaluations: Be concise and direct. State the verdict and the key finding, don't repeat the same idea with different words. Avoid filler phrases.
Bad (redundant): Existe pero es muy básico. Copia el JAR pre-compilado, no hace build multi-stage. Funcional pero no óptimo.
Good (concise): Solo copia el JAR, no hace build multi-stage.
Bad (verbose): Arquitectura por capas (controller, service, repository, model, dto, mapper). Usa interfaces para servicios y DTOs separados, lo cual es positivo. Sin embargo, es una estructura plana sin inversión de dependencias. No hay separación de dominio e infraestructura a nivel de módulo. No aplica Clean Architecture ni el scaffold de Bancolombia. El dominio (modelos JPA) está acoplado directamente a la infraestructura de persistencia. Arquitectura básica.
Good (concise): Arquitectura por capas básica (controller-service-repository). Sin inversión de dependencias ni separación dominio/infraestructura. Modelos JPA acoplados a persistencia.
The rule: say it once, say it well, move on.
4. WebFlux Technical Reference
When evaluating reactive code quality, reference the webflux-reactive-patterns skill if available. Key things to verify:
flatMap()for async operations vsmap()for sync transformations- No blocking calls (
block(),Thread.sleep(), blocking I/O) - Lazy error handling with
Mono.defer(() -> Mono.error(...)) - Parallel operations with
Mono.zip()where applicable switchIfEmpty()for empty stream handling- No imperative constructs (
if/else,throw) inside reactive chains
Evaluation Criteria Reference
For detailed criteria with examples of good and bad evaluations, read references/EVALUATION_CRITERIA.md. Consult it when you need guidance on what specifically to look for in each criterion.
Setup Script
To quickly create the candidate directory structure from GitHub URLs, run scripts/setup_candidates.sh:
./scripts/setup_candidates.sh https://github.com/user1/repo https://github.com/user2/repo
This creates a directory per GitHub user and clones their repo inside it. After running, add a statement/ directory with the assessment markdown.
Report Template
The exact template with placeholders is at assets/EVALUATION_TEMPLATE.md.