devils-advocate
Devil's Advocate Skill
Challenge research assumptions and identify weaknesses in your arguments.
Purpose
Based on Scott Cunningham's Part 3: "Creating Devil's Advocate Agents for Tough Problems" - addressing the "LLM thing of over-confidence in diagnosing a problem."
For formal code audits with replication scripts and referee reports, use the Referee 2 agent instead (.claude/agents/referee2-reviewer.md). This skill is for quick adversarial feedback on arguments, not systematic audits.
When to Use
- Before submitting a paper
- When stuck on a research problem
- When you want to stress-test an argument
- During paper revision planning
When NOT to Use
- Code audits — use the Referee 2 agent instead
- Replication verification — use the Referee 2 agent instead
- Quick proofreading — just ask for a read-through
- When you want validation — this skill is designed to challenge, not affirm
Workflow
- Understand the claim — Read the paper/argument being evaluated
- Generate competing hypotheses — If evaluating a research question or design, load
references/competing-hypotheses.mdand generate 3-5 rival explanations before critiquing - Run the debate — Use the multi-turn debate protocol below (default) or single-shot mode for quick checks
- Deliver the verdict — Synthesize surviving critiques with severity ratings
Multi-Turn Debate Protocol (Default)
Inspired by the simulated scientific debates in Google's AI Co-Scientist. A one-shot critique is easy for an LLM to produce but often superficial. Multi-turn debates force each critique to survive a defense, filtering out weak objections and sharpening the strong ones.
Round 1: Adversarial Critic
Adopt the persona of a hostile but competent reviewer. Challenge on:
- Theoretical foundations — Are the assumptions justified?
- Methodology — Limitations? Alternative approaches?
- Data — Selection bias? Measurement issues? External validity?
- Causal claims — Alternative explanations? Confounders?
- Contribution — Novel enough? Does it matter?
Produce numbered critiques (aim for 5-8), each with a concrete statement of the problem.
Round 2: Defense
Switch persona to the paper's author. For each numbered critique, provide the strongest possible defense:
- Cite evidence from the paper that addresses the concern
- Explain design choices that mitigate the issue
- Acknowledge limitations honestly where the defense is weak
- Propose concrete fixes where the critique has merit
Round 3: Adjudication
Switch to an impartial senior reviewer. For each critique-defense pair, rule:
- Critique stands — the defense is insufficient; this is a real weakness
- Critique partially addressed — defense has merit but issue remains
- Critique resolved — the defense adequately addresses the concern
Final Synthesis
Produce a structured report with only the surviving critiques (stands + partially addressed), ranked by severity:
## Devil's Advocate Report
### Critical (must fix before submission)
1. [Critique] — [Why the defense failed] — [Suggested fix]
### Major (reviewers will likely raise)
2. [Critique] — [What remains after defense] — [Suggested fix]
### Minor (worth acknowledging)
3. [Critique] — [Residual concern] — [How to preempt]
### Dismissed
- [Critiques that were resolved in Round 2, listed briefly for transparency]
Single-Shot Mode
For quick checks (e.g., "just poke holes in this argument"), skip the multi-turn protocol and produce a direct critique. Use when the user says "quick", "just challenge this", or the input is a paragraph rather than a full paper.
Example Use
"Play devil's advocate on my research paper about preference drift - specifically challenge my identification strategy and the assumptions about utility functions."
Council Mode (Optional)
For the highest-stakes arguments, run the devil's advocate debate across multiple LLM providers. Different models have genuinely different reasoning patterns — a critique that Claude finds weak, GPT may find devastating, and vice versa. This produces adversarial tension that a single model cannot replicate internally.
Trigger: "Council devil's advocate" or "thorough challenge"
How it works:
- Each model independently plays Adversarial Critic (Round 1) using the same paper/argument
- Cross-review: each model evaluates the others' critiques — identifying which challenges are strongest
- Chairman synthesis: produces a single report with surviving critiques ranked by cross-model agreement
Invocation (CLI backend):
cd packages/cli-council
uv run python -m cli_council \
--prompt-file /tmp/devils-advocate-prompt.txt \
--context-file /tmp/paper-content.txt \
--output-md /tmp/devils-advocate-council.md \
--chairman claude \
--timeout 180
See skills/shared/council-protocol.md for the full orchestration protocol.
Value: High — the multi-turn debate protocol (Round 1→2→3) becomes genuinely adversarial when different models play different roles. A critique that survives cross-model scrutiny is almost certainly a real weakness.
Cross-References
| Skill | When to use instead/alongside |
|---|---|
/interview-me |
To develop the idea further through structured interview |
/multi-perspective |
For multi-perspective analysis with disciplinary diversity |
/proofread |
For language/formatting review rather than argument critique |
More from flonat/claude-research
update-focus
Use when you need to update current-focus.md with a structured session summary.
10project-safety
Use when you need to set up safety rules and folder structures for a research project.
10latex-autofix
Use when you need to compile LaTeX with autonomous error resolution and citation audit.
7literature
Use when you need academic literature discovery, synthesis, or bibliography management. Supports standalone searches and end-to-end project pipelines with vault sync and auto-commit.
7pre-submission-report
Use when you need all quality checks run before submission, producing a single dated report.
6process-reviews
Use when you need to process referee comments from a reviews PDF into tracking files.
6