idea-tournament
Idea Tournament
A structured framework for generating diverse research ideas through tree-based expansion, then selecting the strongest candidate via Elo-rated pairwise tournaments across four quality dimensions.
When to Use This Skill
- User has a research direction from
research-ideationand needs concrete, ranked ideas - User wants to systematically compare multiple research ideas before committing
- User asks about idea ranking, competitive selection, or proposal generation
- User wants to explore variations of a research concept and select the best one
- User mentions "idea tournament", "rank ideas", "compare approaches", "research proposal", "which idea is best"
From Direction to Proposal
The gap between "I have a research direction" and "I have a concrete proposal" is where most researchers stall. They either commit to their first idea (missing better alternatives) or endlessly brainstorm without converging (analysis paralysis).
The tournament solves both problems. Phase 1 forces breadth — you generate up to N_I=21 candidates (the paper's maximum) by systematically varying technique, domain, and formulation. Phase 2 forces convergence — pairwise Elo comparisons identify the strongest idea without requiring you to hold all candidates in your head simultaneously.
Before starting:
- Load prior knowledge from Ideation Memory (M_I):
- Refer to the evo-memory skill → Read M_I at
/memory/ideation-memory.md - Select the top-2 entries (k_I=2) most relevant to the user's current goal by comparing each entry's Summary and Retrieval Tags against the goal
- Feasible directions become tree seeds — incorporate them as Level 1 branches in Phase 1
- Unsuccessful directions (fundamental failures only) are used during pruning — prune any tree branch that matches a fundamental failure pattern
- If M_I doesn't exist yet (first cycle), skip this step
- Refer to the evo-memory skill → Read M_I at
- Retrieve relevant literature L for the user goal G. The paper defines idea tree search as IdeaTreeSearch(G, L, K_I) — literature is a formal input alongside the user goal and retrieved memory. Use web search or provided papers to ground idea generation in existing work.
Phase 1: Tree-Structured Idea Generation
Expand a seed idea into a tree of candidates by varying one axis per level. The tree structure ensures diversity — each branch explores a fundamentally different variation rather than minor tweaks of the same concept.
The Three Axes
| Level | Axis | What Varies | Example |
|---|---|---|---|
| 0 | Seed | Starting research direction | "Efficient LLM inference" |
| 1 | Technique | The core technical approach | Pruning, quantization, distillation |
| 2 | Domain | The application context | Edge devices, multi-modal, long-context |
| 3 | Formulation | The problem framing | Latency-constrained, memory-constrained, accuracy-preserving |
Expansion Process
Level 0 — Seed (1 node): Start with the research direction from research-ideation. This is your root node.
Level 1 — Technique variants (3 nodes): Generate 3 fundamentally different technical approaches to the seed direction. These should be distinct paradigms, not variations of the same technique. Reflect carefully to verify each is genuinely different.
Level 2 — Domain adaptations (6-9 nodes): For each Level 1 node, generate 2-3 domain-specific adaptations. How does this technique apply differently in different contexts? What domain-specific constraints create new challenges?
Level 3 — Formulation variants (up to N_I=21 total leaves): For each Level 2 node, refine into 1-3 specific problem formulations. A formulation pins down the exact problem statement — the inputs, outputs, constraints, and evaluation criteria. The paper sets N_I=21 as the maximum number of candidate ideas. If the tree produces fewer than 15 leaves, expand Level 2 or Level 3 further. If more than 21, prune to stay within the N_I limit.
Per-Node Cycle: Propose → Review → Refine
For each new node:
- Propose: Write a 2-3 sentence description of the idea
- Review: Evaluate critically — Is this genuinely different from sibling nodes? Is it at least plausible?
- Refine: Sharpen the description based on the review. Remove vague language. Make the novelty claim specific.
Pruning
After expanding each level, prune clearly infeasible branches. A branch is "clearly infeasible" if:
- It requires resources fundamentally unavailable (e.g., proprietary datasets you can't access)
- It contradicts well-established theoretical results
- It duplicates an existing, well-established solution with no meaningful variation
- It appears in
evo-memory's unsuccessful directions as a fundamental failure (not implementation failure)
Important: Pruning removes only the obviously unworkable. Do NOT prune ideas that are risky, unconventional, or outside your current expertise — these are exactly the ideas tournaments are designed to evaluate fairly.
Save the complete tree to /idea-tree.md.
See references/tree-search-protocol.md for detailed expansion rules and diversity metrics.
Phase 2: Elo Tournament Ranking
Rank all leaf candidates through pairwise comparisons on four quality dimensions. Swiss-system pairing keeps the number of comparisons manageable while still producing reliable rankings.
The Four Dimensions
| Dimension | Weight | What It Measures |
|---|---|---|
| Novelty | 25% | How different is this from existing published work? |
| Feasibility | 25% | Can this be implemented and validated within reasonable time and resources? |
| Relevance | 25% | Does this address an important, open problem in the field? |
| Clarity | 25% | Is the idea well-defined enough to start working on immediately? |
All dimensions are weighted equally. Researchers tend to overweight novelty and underweight feasibility — equal weights correct this bias.
Tournament Mechanics
Starting Elo: 1500 for all candidates.
K-factor: 32 (standard for new players; large enough that a few matches significantly move ratings).
Swiss-system pairing (4-5 rounds):
- Round 1: Random pairing
- Subsequent rounds: Pair candidates with similar current Elo ratings, avoiding rematches
- 4-5 rounds is sufficient for 15-21 candidates to produce stable rankings
Per-match process:
- Present both candidates side by side with their full descriptions
- Score each on all 4 dimensions (1-10 scale)
- Compute composite scores (average of 4 dimensions)
- Determine the match winner (higher composite score)
- Update Elo ratings using the standard formula (see elo-ranking-guide.md for the formula, worked example, and convergence criteria)
Save rankings to /idea-rankings.md.
See references/elo-ranking-guide.md for the detailed rubric and convergence criteria.
Phase 3: Direction Summarization
Synthesize the top-3 ranked ideas into a "promising directions" summary. This serves two purposes: it preserves optionality (the best idea may combine elements from multiple candidates), and it feeds into evo-memory for future cycles.
Summarization Process
For each of the top-3 ideas:
- Extract the core research direction (abstract away from specific implementation details)
- Identify the key insight that makes this direction promising
- Note the primary risk or uncertainty
- Check against
evo-memory— has this direction been explored before? What was learned?
Then synthesize across the top-3:
- What common threads run through the top-3? These may suggest an even stronger combined direction.
- What dimensions do the top ideas excel in? Are there patterns (e.g., all top ideas score high on feasibility but moderate on novelty)?
- What's missing? Are there important aspects of the original seed that none of the top ideas address?
Save to /direction-summary.md.
After completion, trigger evo-memory IDE (Idea Direction Evolution) to update Ideation Memory with the promising directions identified.
Phase 4: Proposal Extension
Extend the tournament winner (rank #1) into a full research proposal with enough detail to begin implementation.
Proposal Structure
The paper defines proposal P as containing 5 sections: background, related work, method, experiment plan, and expected results. We extend this with a 6th practical section (risks and mitigations).
1. Background: Define the exact problem — inputs, outputs, constraints, and why existing solutions are insufficient. Be specific: "LLM inference on edge devices with <2GB memory while maintaining >90% of full-model accuracy" is a background statement; "make LLMs faster" is not. Include context and motivation.
2. Related Work: Position the idea within the existing literature. What has been tried? What are the gaps? This should draw on the literature L retrieved during Phase 1 tree generation.
3. Proposed Method: Describe the technical approach at a level of detail sufficient for implementation. Include the key insight that differentiates this from prior work. State assumptions explicitly. List 3 testable contributions.
4. Experiment Plan: Datasets, baselines, metrics, and ablation design. This should align with what experiment-pipeline Stage 4 will need. Include both quantitative metrics and qualitative evaluation where appropriate.
5. Expected Results: Quantitative targets (e.g., "15-20% latency reduction with <2% accuracy loss") and qualitative expectations. Being specific about expected results forces you to think about whether the idea is realistic.
6. Risks and Mitigations (practical extension): Technical risks that could prevent success, and fallback plans for each. A proposal without risks is either dishonest or insufficiently analyzed. This section is not in the paper but is valuable for practical research planning.
Save to /research-proposal.md.
See references/proposal-extension.md for detailed guidance on each section.
Counterintuitive Tournament Rules
Prioritize these rules during idea generation and ranking:
-
Quantity before quality: Generate many candidates before evaluating any. Premature filtering kills diversity. You can't know which idea is strongest until you've seen the alternatives — and the best ideas often emerge from unexpected branches of the tree.
-
Vary one axis per level: Changing multiple axes simultaneously produces ideas that are different but not meaningfully diverse. Each level of the tree should explore ONE dimension of variation, so you understand exactly what makes each branch unique.
-
Feasibility is not optional: Brilliant but infeasible ideas waste entire research cycles. A novel idea that can't be validated within your constraints is not a contribution — it's a thought experiment. Weight feasibility equally with novelty.
-
The tournament finds surprises: Structured pairwise comparison often reveals that your initial favorite isn't actually the strongest idea. Trust the rankings over your gut feeling. If the results surprise you, that means the tournament is working — it's surfacing information you wouldn't have found through intuition alone.
-
Pruning is not selecting: Prune only clearly infeasible branches. The tournament handles quality ranking. If you aggressively prune before the tournament, you're substituting your initial intuition for systematic comparison — exactly the bias the tournament is designed to correct.
-
Top-3, not top-1: Summarizing the top 3 directions (not just the winner) preserves optionality. The best final approach may combine elements from multiple top candidates. Committing to exactly one idea too early discards valuable signal.
Handoff to Planning
When the tournament is complete and the proposal is written, pass these artifacts to paper-planning:
| Artifact | Source Phase | Used By |
|---|---|---|
| Research proposal (5+1 sections) | Phase 4 | Story design, experiment planning |
| Idea tree (full structure) | Phase 1 | Related work positioning |
| Elo rankings with scores | Phase 2 | Justification for chosen direction |
| Direction summary (top-3) | Phase 3 | Fallback directions if primary fails |
| Tournament scorecards | Phase 2 | Understanding idea strengths/weaknesses |
Also pass results to evo-memory for evolution updates:
- Trigger IDE (Idea Direction Evolution) with the top-3 directions from Phase 3
Skill Integration
Before Starting (load memory)
Refer to the evo-memory skill to read Ideation Memory:
→ Read M_I at /memory/ideation-memory.md
After Phase 3 (update memory)
Refer to the evo-memory skill and trigger IDE:
→ Run IDE protocol with /direction-summary.md
After Phase 4 (handoff to planning)
Refer to the paper-planning skill:
→ Pass /research-proposal.md
Reference Navigation
| Topic | Reference File | When to Use |
|---|---|---|
| Tree expansion rules and diversity | tree-search-protocol.md | Generating diverse idea candidates |
| Elo formula, rubric, and pairing | elo-ranking-guide.md | Running the tournament |
| Proposal section guidance | proposal-extension.md | Writing the research proposal |
| Idea candidate template | idea-candidate-template.md | Describing individual ideas |
| Ranking scorecard template | ranking-scorecard-template.md | Recording pairwise comparisons |
| Direction summary template | direction-summary-template.md | Synthesizing top-3 directions |
More from evoscientist/evoskills
paper-review
Guides self-review of YOUR OWN academic paper before submission with adversarial stress-testing. Core method: 5-aspect checklist (contribution sufficiency, writing clarity, results quality, testing completeness, method design), counterintuitive protocol (reject-first simulation, delete unsupported claims, score trust, promote limitations, attack novelty), reverse-outlining, and figure/table quality checks. Use when: user wants to self-review or self-check their own paper draft before submission, stress-test their claims, prepare for reviewer criticism, or mentions 'self-review', 'check my draft', 'is my paper ready'. Do NOT use for writing a peer review of someone else's paper, and do NOT use after receiving actual reviews (use paper-rebuttal instead).
265paper-writing
Guides writing academic papers section by section using an 11-step workflow with LaTeX templates and counterintuitive writing tactics. Covers Abstract, Introduction, Method, Experiments, Related Work, Conclusion, and Supplementary. Use when: user asks to write or draft a paper section, needs LaTeX templates, wants to improve academic writing quality, optimize novelty framing, or mentions 'write introduction', 'draft method', 'paper writing'. Do NOT use for pre-submission review (use paper-review), experiment execution (use experiment-pipeline), or paper planning/story design (use paper-planning).
250paper-rebuttal
Guides writing effective rebuttals after receiving peer review feedback. Covers review diagnosis (score-driven color-coding), response strategy (champion identification, common-theme consolidation), tactical writing (18 rules), and counterintuitive rebuttal principles. Use when: user received reviewer scores/comments, needs to write a rebuttal or author response, wants to respond to specific criticism (e.g. 'limited novelty', 'missing baselines'), mentions 'rebuttal', 'reviewer comments', 'author response', or 'respond to reviewers'. Do NOT use for pre-submission self-review (use paper-review instead).
244paper-planning
Guides pre-writing planning for academic papers with 4 structured steps: story design (task-challenge-insight-contribution-advantage), experiment planning (comparisons + ablations), figure design (pipeline + teaser), and 4-week timeline management. Includes counterintuitive planning tactics (write a mock rejection letter to identify weaknesses before writing, narrow before broad claims, design ablations first). Use when: user wants to plan a paper before writing, design story/contributions, plan experiments, create figure sketches, set a writing timeline, or write a pre-emptive rejection letter for planning purposes. Do NOT use for actual writing (use paper-writing), running experiments (use experiment-pipeline), self-reviewing a finished draft (use paper-review), or finding research problems (use research-ideation).
239research-ideation
End-to-end research ideation pipeline: literature grounding → multi-track idea generation (3 personas: innovator/pragmatist/critic) → iterative refinement → ELO tournament ranking → update evo-memory (IDE) → user selects direction → expand into manuscript-quality proposal. Use when: user wants to find a research direction, brainstorm ideas, evaluate idea novelty, design a novel solution, rank/compare research ideas, or generate a research proposal. Do NOT use for finding/searching/reading papers (use paper-navigator), literature survey reports (use research-survey), or planning a paper (use paper-planning).
236experiment-pipeline
Guides structured 4-stage experiment execution with attempt budgets and gate conditions: Stage 1 initial implementation (reproduce baseline), Stage 2 hyperparameter tuning, Stage 3 proposed method validation, Stage 4 ablation study. Integrates with evo-memory (load prior strategies, trigger IVE/ESE) and experiment-craft (5-step diagnostic on failure). Use when: user has a planned experiment, needs to reproduce baselines, organize experiment workflow, or systematically validate a method. Do NOT use for debugging a specific experiment failure (use experiment-craft) or designing which experiments to run (use paper-planning).
229