skills/kthorn/research-superpower/Answering Research Questions

Answering Research Questions

SKILL.md

Answering Research Questions

Overview

Orchestrate the complete research workflow from query to findings.

Core principle: Systematic, trackable, comprehensive. Search β†’ Evaluate β†’ Traverse β†’ Synthesize.

Announce at start: "I'm using the Answering Research Questions skill to find [specific data] about [topic]."

The Process

Phase 1: Parse Query

Extract from user's request:

Keywords:

  • Main concepts (e.g., "BTK inhibitor", "selectivity")
  • Synonyms and alternatives (e.g., "Bruton tyrosine kinase")
  • Related terms (e.g., "off-target", "kinase panel")

Data types needed:

  • Specific measurements (IC50, KD, EC50, etc.)
  • Methods or protocols
  • Structures or sequences
  • Results or conclusions

Constraints:

  • Date ranges
  • Specific compounds/targets
  • Organisms or systems
  • Publication types

Ask clarifying questions if needed:

  • "Are you looking for in vitro or in vivo data?"
  • "Any specific time frame?"
  • "Which kinases are you most interested in?"
  • "What email address should I use for Unpaywall API requests?" (Required for finding open access papers)

Phase 2: Initialize Research Session

Propose folder name:

research-sessions/YYYY-MM-DD-brief-description/

Example: research-sessions/2025-10-11-btk-inhibitor-selectivity/

Show proposal to user:

πŸ“ Creating research folder: research-sessions/2025-10-11-btk-inhibitor-selectivity/
   Proceed? (y/n)

Create folder structure:

mkdir -p "research-sessions/YYYY-MM-DD-description"/{papers,citations}

Initialize files:

Core files (always create these):

papers-reviewed.json:

{}

citations/citation-graph.json:

{}

SUMMARY.md:

# Research Query: [User's question]

**Started:** YYYY-MM-DD HH:MM
**Keywords:** keyword1, keyword2, keyword3
**Data types sought:** IC50 values, selectivity data, synthesis methods

---

## Highly Relevant Papers (Score β‰₯ 8)

Papers scored using `evaluating-paper-relevance` skill:
- Score 0-10 based on: Keywords (0-3) + Data type (0-4) + Specificity (0-3)
- Score β‰₯ 8: Highly relevant with significant data
- Score 7: Relevant with useful data
- Score 5-6: Possibly relevant
- Score < 5: Not relevant

(Papers will be added here as found)

Example format:
### [Paper Title](https://doi.org/10.1234/example)
**DOI:** [10.1234/example](https://doi.org/10.1234/example) | **PMID:** [12345678](https://pubmed.ncbi.nlm.nih.gov/12345678/)

---

## Relevant Papers (Score 7)

(Papers will be added here as found)

---

## Possibly Relevant Papers (Score 5-6)

(Noted for potential follow-up)

---

## Search Progress

- Initial PubMed search: X results
- Papers reviewed: Y
- Papers with relevant data: Z
- Citations followed: N

---

## Key Findings

(Synthesized findings will be added as research progresses)

CRITICAL: Always use clickable markdown links for DOIs and PMIDs

Auxiliary files (for large searches >100 papers):

See evaluating-paper-relevance skill for guidance on when to create:

  • README.md - Project overview, methodology, file inventory
  • TOP_PRIORITY_PAPERS.md - Curated priority list organized by tier
  • evaluated-papers.json - Rich structured data for programmatic access

For small searches (<50 papers), stick to core files only. For large searches (>100 papers), auxiliary files add significant organizational value.

Phase 3: Search Literature

Use searching-literature skill:

  1. Construct PubMed query from keywords
  2. Execute search (start with 100 results)
  3. Save results to initial-search-results.json
  4. Report: "πŸ”Ž Found N papers matching query"

Phase 4: Evaluate Papers

Use evaluating-paper-relevance skill:

For each paper:

  1. Check papers-reviewed.json (skip if already processed)
  2. Stage 1: Score abstract (0-10)
  3. If score β‰₯ 7: Stage 2 deep dive
  4. Extract findings to SUMMARY.md
  5. Download PDF and supplementary if available
  6. Update papers-reviewed.json (for ALL papers, even low-scoring ones)
  7. If score β‰₯ 7: proceed to Phase 5 for this paper

CRITICAL: Add every paper to papers-reviewed.json regardless of score. This prevents re-review and tracks complete search history.

Report progress for EVERY paper:

πŸ“„ [15/100] Screening: "Paper Title"
   Abstract score: 8 β†’ Fetching full text...
   βœ“ Found IC50 data for 8 compounds
   β†’ Added to SUMMARY.md

πŸ“„ [16/100] Screening: "Another Paper"
   Abstract score: 3 β†’ Skipping (not relevant)

πŸ“„ [17/100] Screening: "Third Paper"
   Abstract score: 7 β†’ Relevant, adding to queue...

Every 10 papers, give summary update

Phase 5: Traverse Citations

Use traversing-citations skill:

For papers scoring β‰₯ 7:

  1. Get references (backward)
  2. Get citations (forward)
  3. Filter for relevance (score β‰₯ 5)
  4. Add to processing queue
  5. Evaluate queued papers (return to Phase 4)

Report progress:

πŸ”— Following citations from highly relevant paper
   β†’ Found 12 relevant references
   β†’ Found 8 relevant citing papers
   β†’ Adding 20 papers to queue

Phase 6: Checkpoint

Check after:

  • Every 50 papers reviewed
  • Every 5 minutes of processing
  • Queue exhausted

Ask user:

⏸️  Checkpoint: Reviewed 50 papers, found 12 relevant
    Papers with data: 7
    Continue searching? (y/n/summary)

Options:

  • y - Continue processing
  • n - Stop and finalize
  • summary - Show current findings, then decide

Phase 7: Synthesize Findings

When stopping (user says no or queue empty):

Option A: Manual synthesis (small research sessions)

  1. Review SUMMARY.md - Organize by relevance and topic
  2. Extract key findings - Group by data type
  3. Add synthesis section:
## Key Findings Summary

### IC50 Values for BTK Inhibitors
- Compound A: 12 nM (Smith et al., 2023)
- Compound B: 45 nM (Doe et al., 2024)
- [More compounds...]

### Selectivity Data
- Compound A shows >80-fold selectivity vs other kinases
- Tested against panel of 50 kinases (Jones et al., 2023)

### Synthesis Methods
- Lead compounds synthesized via [method]
- Yields: 30-45%
- Full protocols in [papers]

### Gaps Identified
- No data on selectivity vs [specific kinase]
- Limited in vivo data
- Few papers on resistance mechanisms
  1. Update search progress stats
  2. List all files downloaded

Option B: Script-based synthesis (large research sessions >50 papers)

For large research sessions, consider creating a synthesis script:

create generate_summary.py:

  • Read evaluated-papers.json from helper scripts
  • Aggregate findings by priority and scaffold type
  • Generate comprehensive SUMMARY.md with:
    • Executive summary with statistics
    • Papers grouped by relevance score
    • Priority recommendations for next steps
    • Methodology documentation
  • Include timestamps and reproducibility info

Benefits:

  • Consistent formatting across sessions
  • Easy to regenerate as more papers added
  • Can customize grouping/filtering logic
  • Documents complete methodology

Final report:

βœ… Research complete!

πŸ“Š Summary:
   - Papers reviewed: 127
   - Relevant papers: 18
   - Highly relevant: 7
   - Data extracted: IC50 values for 45 compounds, selectivity data, synthesis methods

πŸ“ All findings in: research-sessions/2025-10-11-btk-inhibitor-selectivity/
   - SUMMARY.md (organized findings)
   - papers/ (14 PDFs + supplementary data)
   - papers-reviewed.json (complete tracking)

Phase 8: Final Consolidation

CRITICAL: Always consolidate findings at the end

1. Create relevant-papers.json

Filter papers-reviewed.json to extract only relevant papers (score β‰₯ 7):

# Read papers-reviewed.json
with open('papers-reviewed.json') as f:
    all_papers = json.load(f)

# Filter for relevant papers (score >= 7)
relevant_papers = {
    doi: data for doi, data in all_papers.items()
    if data.get('score', 0) >= 7
}

# Save to relevant-papers.json
with open('relevant-papers.json', 'w') as f:
    json.dump(relevant_papers, f, indent=2)

Format:

{
  "10.1234/example1.2023": {
    "pmid": "12345678",
    "title": "Paper title",
    "status": "highly_relevant",
    "score": 9,
    "source": "pubmed_search",
    "timestamp": "2025-10-11T16:00:00Z",
    "found_data": ["IC50 values", "synthesis methods"],
    "chembl_id": "CHEMBL1234567"
  },
  "10.1234/example2.2023": {
    "pmid": "23456789",
    "title": "Another paper",
    "status": "relevant",
    "score": 7,
    "source": "forward_citation",
    "timestamp": "2025-10-11T16:15:00Z",
    "found_data": ["MIC data"]
  }
}

2. Enhance SUMMARY.md with Methodology Section

Add these sections to the TOP of existing SUMMARY.md (before paper listings):

# Research Query: [User's question]

**Date:** 2025-10-11
**Duration:** 2h 15m
**Status:** Complete

---

## Search Strategy

**Keywords:** BTK, Bruton tyrosine kinase, inhibitor, selectivity, off-target, kinase panel, IC50
**Data types sought:** IC50 values, selectivity data, kinase panel screening
**Constraints:** None (open date range)

**PubMed Query:**

("BTK" OR "Bruton tyrosine kinase") AND (inhibitor OR "kinase inhibitor") AND (selectivity OR "off-target")


---

## Screening Methodology

**Rubric:** Abstract scoring (0-10)
- Key terms: +3 pts each (or Keywords 0-3, Data type 0-4, Specificity 0-3 if using old rubric)
- Relevant terms: +1 pt each
- Threshold: β‰₯7 = relevant

**Sources:**
- Initial PubMed search
- Forward/backward citations via Semantic Scholar

---

## Results Statistics

**Papers Screened:**
- Total reviewed: 127 papers
- Highly relevant (β‰₯8): 12 papers
- Relevant (7): 18 papers
- Possibly relevant (5-6): 23 papers
- Not relevant (<5): 74 papers

**Data Extracted:**
- IC50 values: 45 compounds across 12 papers
- Selectivity data: 8 papers with kinase panel screening
- Full text obtained: 18/30 relevant papers (60%)

**Citation Traversal:**
- Papers with citations followed: 7
- References screened: 45 papers
- Citing papers screened: 38 papers
- Relevant papers found via citations: 8 papers

---

## Key Findings Summary

### IC50 Values for BTK Inhibitors
- Ibrutinib: 0.5 nM (Smith et al., 2023)
- Acalabrutinib: 3 nM (Doe et al., 2024)
- [Additional findings synthesized from papers below]

### Selectivity Patterns
- Most inhibitors show >50-fold selectivity vs other kinases
- Common off-targets: TEC, BMX (other TEC family kinases)

### Gaps Identified
- Limited data on selectivity vs JAK/SYK
- Few papers on resistance mechanisms
- No in vivo selectivity data found

---

## File Inventory

- `SUMMARY.md` - This file (methodology + findings)
- `relevant-papers.json` - 30 relevant papers (score β‰₯7)
- `papers-reviewed.json` - All 127 papers screened
- `papers/` - 18 PDFs + 5 supplementary files
- `citations/citation-graph.json` - Citation relationships

---

## Reproducibility

**To reproduce:**
1. Use PubMed query above
2. Apply screening rubric (threshold β‰₯7)
3. Follow citations from highly relevant papers (β‰₯8)
4. Check Unpaywall for paywalled papers

**Software:** Research Superpowers skills v2025-10-11

---

[Existing paper listings follow below...]

## Highly Relevant Papers (Score β‰₯ 8)

### [Paper Title]...

Report to user:

βœ… Research session complete!

πŸ“„ Consolidation complete:
   1. SUMMARY.md - Enhanced with methodology, statistics, and findings
   2. relevant-papers.json - 30 relevant papers (score β‰₯7) in JSON format

πŸ“ All files in: research-sessions/2025-10-11-btk-inhibitor-selectivity/
   - SUMMARY.md (complete: methodology + paper-by-paper findings)
   - relevant-papers.json (30 relevant papers for programmatic access)
   - papers-reviewed.json (127 total papers screened)
   - papers/ (18 PDFs)

πŸ” Quick access:
   - Open SUMMARY.md for complete findings and methodology
   - Use relevant-papers.json for programmatic access

πŸ’‘ Optional: Clean up intermediate files?
   β†’ Use cleaning-up-research-sessions skill to safely remove temporary files

Workflow Checklist

Use TodoWrite to track these steps:

  • Parse user query (keywords, data types, constraints)
  • Propose and create research folder
  • Initialize tracking files (SUMMARY.md, papers-reviewed.json, citation-graph.json)
  • Search PubMed using searching-literature skill
  • For each paper: evaluate using evaluating-paper-relevance skill
  • For relevant papers (β‰₯7): traverse citations using traversing-citations skill
  • Report progress regularly
  • Checkpoint every 50 papers or 5 minutes
  • When done: synthesize findings and enhance SUMMARY.md with methodology
  • Create relevant-papers.json (filtered JSON for programmatic access)
  • Final report with stats and file locations

Integration Points

Skills used:

  1. searching-literature - Initial PubMed search
  2. evaluating-paper-relevance - Score and extract from papers
  3. traversing-citations - Follow citation networks

All skills coordinate through:

  • Shared papers-reviewed.json (deduplication)
  • Shared SUMMARY.md (findings accumulation)
  • Shared citation-graph.json (relationship tracking)

File organization:

  • Small searches (<50 papers): Core files only (papers-reviewed.json, SUMMARY.md, citation-graph.json)
  • All searches: Create relevant-papers.json at end; enhance SUMMARY.md with methodology
  • Large searches (>100 papers): May add auxiliary files (README.md, TOP_PRIORITY_PAPERS.md, evaluated-papers.json) for better organization

Error Handling

No results found:

  • Try broader keywords
  • Remove constraints
  • Check spelling
  • Try different synonyms

API rate limiting:

  • Report to user: "⏸️ Rate limited, waiting..."
  • Wait required time
  • Resume automatically

Full text unavailable:

  • Note in SUMMARY.md
  • Continue with abstract-only evaluation
  • Flag for manual retrieval if highly relevant

Too many results (>500):

  • Suggest narrowing query
  • Process first 100, ask if continue
  • Focus on most recent or most cited

Quick Reference

Phase Skill Output
Parse (built-in) Keywords, data types, constraints
Initialize (built-in) Folder, SUMMARY.md, tracking files
Search searching-literature List of papers with metadata
Evaluate evaluating-paper-relevance Scored papers, extracted findings
Traverse traversing-citations Additional papers from citations
Synthesize (built-in) Enhanced SUMMARY.md with methodology + findings
Consolidate (built-in) relevant-papers.json (filtered to score β‰₯7)

Common Mistakes

Not tracking all papers: Only adding relevant papers to papers-reviewed.json β†’ Add EVERY paper to prevent re-review, track complete history Creating unnecessary auxiliary files for small searches: For <50 papers, stick to core files (papers-reviewed.json, SUMMARY.md, citation-graph.json). For large searches (>100 papers), auxiliary files like README.md and TOP_PRIORITY_PAPERS.md add value. Silent work: User can't see progress β†’ Report EVERY paper, give updates every 10 Non-clickable identifiers: Plain text DOIs/PMIDs β†’ Always use markdown links Jumping to evaluation without good search: Too narrow results β†’ Optimize search first Not tracking papers: Re-reviewing same papers β†’ Always use papers-reviewed.json Following all citations: Exponential explosion β†’ Filter before traversing No checkpoints: User loses context β†’ Report and ask every 50 papers Poor synthesis: Just list papers β†’ Group by data type, extract key findings Batch reporting: Reporting 20 papers at once β†’ Report each one as you go

User Communication (CRITICAL)

NEVER work silently! User needs continuous feedback.

Report frequency:

  • Every paper: Brief status as you screen (πŸ“„ [N/Total] Title... Score: X)
  • Every 5-10 papers: Progress summary with counts
  • Every finding: Immediately report what data you found
  • Every decision point: Ask before changing direction

Be specific in progress reports:

  • βœ… "Found IC50 = 12 nM for compound 7 (Table 2)"
  • ❌ "Found data"
  • βœ… "Screening paper 25/127: Not relevant (score 3)"
  • ❌ Silently skip papers

Ask for clarification when needed:

  • βœ… "Are you looking for in vitro or in vivo IC50 values?"
  • ❌ Assume and potentially waste time

Report blockers immediately:

  • βœ… "⚠️ Paper behind paywall - evaluating from abstract only"
  • ❌ Silently skip without mentioning

Periodic summaries (every 10-15 papers):

πŸ“Š Progress update:
   - Reviewed: 30/127 papers
   - Highly relevant: 3 (scores 8-10)
   - Relevant: 5 (score 7)
   - Currently: Screening paper 31...

Why: User can course-correct early, knows work is happening, can stop if needed

Success Criteria

Research session successful when:

  • All relevant papers found and evaluated
  • Specific data extracted and organized
  • Citations followed systematically
  • No duplicate processing
  • Clear SUMMARY.md with actionable findings
  • User questions answered with evidence

Next Steps

After completing research:

  • User reviews SUMMARY.md and relevant-papers.json
  • Optional: Run cleaning-up-research-sessions skill to remove intermediate files
  • May request deeper dive into specific papers
  • May request follow-up searches with refined keywords
  • May archive or share research session folder
Weekly Installs
0
GitHub Stars
29
First Seen
Jan 1, 1970