devtu-auto-discover-apis
Automated Life Science API Discovery & Tool Creation
Discover, create, validate, and integrate life science APIs into ToolUniverse.
Four-Phase Workflow
Gap Analysis → API Discovery → Tool Creation → Validation → Integration
↓ ↓ ↓ ↓ ↓
Coverage Web Search devtu-create devtu-fix Git PR
Human approval gates after: discovery, creation, validation, and before PR.
Phase 1: Discovery & Gap Analysis
1.1 Analyze Current Coverage
Load ToolUniverse, categorize tools by domain (genomics, proteomics, drug discovery, clinical, omics, imaging, literature, pathways, systems biology). Count per category.
1.2 Identify Gap Domains
- Critical Gap: <5 tools in category
- Moderate Gap: 5-15 tools, missing key subcategories
- Emerging Gap: New technologies not represented
Common gaps: single-cell genomics, metabolomics, patient registries, microbial genomics, multi-omics integration, synthetic biology, toxicology.
1.3 Web Search for APIs
For each gap domain, run multiple queries:
"[domain] API REST JSON"— direct API search"[domain] public database"— database discovery"[domain] API 2025 OR 2026"— recent releases"[domain] database" site:nar.oxfordjournals.org— NAR Database Issue
Extract: base URL, endpoints, auth method, parameter schemas, rate limits.
1.4 Score and Prioritize
| Criterion | Max Points |
|---|---|
| Documentation Quality | 20 |
| API Stability | 15 |
| Authentication Simplicity | 15 |
| Coverage | 15 |
| Maintenance | 10 |
| Community | 10 |
| License | 10 |
| Rate Limits | 5 |
High priority (>=70), Medium (50-69), Low (<50).
1.5 Generate Discovery Report
Coverage analysis, prioritized candidates with scores, implementation roadmap.
Phase 2: Tool Creation
For each API, use Skill(skill="devtu-create-tool") or follow these patterns.
Architecture Decision
- Multiple endpoints → multi-operation tool (single class, multiple JSON wrappers)
- Single endpoint → single-operation acceptable
Key Steps
- Design tool class following template — see references/tool-templates.md
- Create JSON config with oneOf return_schema
- Find real test examples (use List endpoint → extract IDs → verify)
- Register in
default_config.py
Critical Requirements
- return_schema MUST have
oneOf(success + error schemas) - test_examples MUST use real IDs (NO placeholders)
- Tool name <= 55 characters
- NEVER raise exceptions in
run()— return error dict - Set timeout on all HTTP requests (30s)
Phase 3: Validation
Full guide: references/validation-guide.md
Quick Validation Checklist
- Schema: oneOf structure, data wrapper, error field
- Placeholders: No TEST/DUMMY/PLACEHOLDER in test_examples
- Loading: 3-step check (class registered, config registered, wrappers generated)
- Integration tests:
python scripts/test_new_tools.py [api_name] -v→ 100% pass
Fix failures with Skill(skill="devtu-fix-tool").
Phase 4: Integration
Use Skill(skill="devtu-github") or:
- Create branch:
feature/add-[api-name]-tools - Stage tool files + default_config.py
- Commit with descriptive message
- Push and create PR with validation results
Processing Patterns
| Pattern | When to Use |
|---|---|
| Batch (multiple APIs → single PR) | Same domain, similar structure |
| Iterative (one API at a time) | Complex auth, novel patterns |
| Discovery-only (report, no tools) | Planning roadmap |
| Validation-only (audit existing) | PR review, quality check |
References
- Tool templates (Python class + JSON config): references/tool-templates.md
- Validation & integration guide: references/validation-guide.md