tooluniverse-target-research
Comprehensive Target Intelligence Gatherer
Gather complete target intelligence by exploring 9 parallel research paths. Supports targets identified by gene symbol, UniProt accession, Ensembl ID, or gene name.
KEY PRINCIPLES:
- Report-first approach - Create report file FIRST, then populate progressively
- Tool parameter verification - Verify params via
get_tool_infobefore calling unfamiliar tools - Evidence grading - Grade all claims by evidence strength (T1-T4). See EVIDENCE_GRADING.md
- Citation requirements - Every fact must have inline source attribution
- Mandatory completeness - All sections must exist with data minimums or explicit "No data" notes
- Disambiguation first - Resolve all identifiers before research
- Negative results documented - "No drugs found" is data; empty sections are failures
- Collision-aware literature search - Detect and filter naming collisions
- English-first queries - Always use English terms in tool calls, even if the user writes in another language. Translate gene names, disease names, and search terms to English. Only try original-language terms as a fallback if English returns no results. Respond in the user's language
When to Use This Skill
Apply when users:
- Ask about a drug target, protein, or gene
- Need target validation or assessment
- Request druggability analysis
- Want comprehensive target profiling
- Ask "what do we know about [target]?"
- Need target-disease associations
- Request safety profile for a target
When NOT to use: Simple protein lookup, drug-only queries, disease-centric queries, sequence retrieval, structure download -- use specialized skills instead.
Phase 0: Tool Parameter Verification (CRITICAL)
BEFORE calling ANY tool for the first time, verify its parameters:
tool_info = tu.tools.get_tool_info(tool_name="Reactome_map_uniprot_to_pathways")
# Reveals: takes `id` not `uniprot_id`
Known Parameter Corrections
| Tool | WRONG Parameter | CORRECT Parameter |
|---|---|---|
Reactome_map_uniprot_to_pathways |
uniprot_id |
id |
ensembl_get_xrefs |
gene_id |
id |
GTEx_get_median_gene_expression |
gencode_id only |
gencode_id + operation="median" |
OpenTargets_* |
ensemblID |
ensemblId (camelCase) |
STRING_get_protein_interactions |
single ID | protein_ids (list), species |
intact_get_interactions |
gene symbol | identifier (UniProt accession) |
GTEx Versioned ID Fallback (CRITICAL)
GTEx often requires versioned Ensembl IDs. If ENSG00000123456 returns empty, try ENSG00000123456.{version} from ensembl_lookup_gene.
Critical Workflow Requirements
1. Report-First Approach (MANDATORY)
DO NOT show the search process or tool outputs to the user. Instead:
- Create the report file FIRST (
[TARGET]_target_report.md) with all section headers and[Researching...]placeholders. See REPORT_FORMAT.md for template. - Progressively update each section as data is retrieved.
- Methodology in appendix only - create separate
[TARGET]_methods_appendix.mdif requested.
2. Evidence Grading (MANDATORY)
Grade every claim by evidence strength using T1-T4 tiers. See EVIDENCE_GRADING.md for tier definitions, required locations, and citation format.
Core Strategy: 9 Research Paths
Target Query (e.g., "EGFR" or "P00533")
|
+- IDENTIFIER RESOLUTION (always first)
| +- Check if GPCR -> GPCRdb_get_protein
|
+- PATH 0: Open Targets Foundation (ALWAYS FIRST - fills gaps in all other paths)
|
+- PATH 1: Core Identity (names, IDs, sequence, organism)
| +- InterProScan_scan_sequence for novel domain prediction
+- PATH 2: Structure & Domains (3D structure, domains, binding sites)
| +- If GPCR: GPCRdb_get_structures (active/inactive states)
+- PATH 3: Function & Pathways (GO terms, pathways, biological role)
+- PATH 4: Protein Interactions (PPI network, complexes)
+- PATH 5: Expression Profile (tissue expression, single-cell)
+- PATH 6: Variants & Disease (mutations, clinical significance)
| +- DisGeNET_search_gene for curated gene-disease associations
+- PATH 7: Drug Interactions (known drugs, druggability, safety)
| +- Pharos_get_target for TDL classification (Tclin/Tchem/Tbio/Tdark)
| +- BindingDB_get_ligands_by_uniprot for known ligands
| +- PubChem_search_assays_by_target_gene for HTS data
| +- If GPCR: GPCRdb_get_ligands (curated agonists/antagonists)
| +- DepMap_get_gene_dependencies for target essentiality
+- PATH 8: Literature & Research (publications, trends)
For detailed code implementations of each path, see IMPLEMENTATION.md.
Identifier Resolution (Phase 1)
Resolve ALL identifiers before any research path. Required IDs:
- UniProt accession (for protein data, structure, interactions)
- Ensembl gene ID + versioned ID (for Open Targets, GTEx)
- Gene symbol (for DGIdb, gnomAD, literature)
- Entrez gene ID (for KEGG, MyGene)
- ChEMBL target ID (for bioactivity)
- Synonyms/full name (for collision-aware literature search)
After resolution, check if target is a GPCR via GPCRdb_get_protein. See IMPLEMENTATION.md for resolution and GPCR detection code.
PATH 0: Open Targets Foundation (ALWAYS FIRST)
Populates baseline data for Sections 5, 6, 8, 9, 10, 11 before specialized queries.
| Endpoint | Report Section | Data Type |
|---|---|---|
OpenTargets_get_diseases_phenotypes_by_target_ensemblId |
8 | Diseases/phenotypes |
OpenTargets_get_target_tractability_by_ensemblId |
9 | Druggability assessment |
OpenTargets_get_target_safety_profile_by_ensemblId |
10 | Safety liabilities |
OpenTargets_get_target_interactions_by_ensemblId |
6 | PPI network |
OpenTargets_get_target_gene_ontology_by_ensemblId |
5 | GO annotations |
OpenTargets_get_publications_by_target_ensemblId |
11 | Literature |
OpenTargets_get_biological_mouse_models_by_ensemblId |
8/10 | Mouse KO phenotypes |
OpenTargets_get_chemical_probes_by_target_ensemblId |
9 | Chemical probes |
OpenTargets_get_associated_drugs_by_target_ensemblId |
9 | Known drugs |
PATH 1: Core Identity
Tools: UniProt_get_entry_by_accession, UniProt_get_function_by_accession, UniProt_get_recommended_name_by_accession, UniProt_get_alternative_names_by_accession, UniProt_get_subcellular_location_by_accession, MyGene_get_gene_annotation
Populates: Sections 2 (Identifiers), 3 (Basic Information)
PATH 2: Structure & Domains
Use 3-step structure search chain (do NOT rely solely on PDB text search):
- UniProt PDB cross-references (most reliable)
- Sequence-based PDB search (catches missing annotations)
- Domain-based search (for multi-domain proteins)
- AlphaFold (always check)
Tools: UniProt_get_entry_by_accession (PDB xrefs), get_protein_metadata_by_pdb_id, PDB_search_similar_structures, alphafold_get_prediction, InterPro_get_protein_domains, UniProt_get_ptm_processing_by_accession
GPCR targets: Also query GPCRdb_get_structures for active/inactive state data.
Populates: Section 4 (Structural Biology)
See IMPLEMENTATION.md for the 3-step chain code.
PATH 3: Function & Pathways
Tools: GO_get_annotations_for_gene, Reactome_map_uniprot_to_pathways, kegg_get_gene_info, WikiPathways_search, enrichr_gene_enrichment_analysis
Populates: Section 5 (Function & Pathways)
PATH 4: Protein Interactions
Tools: STRING_get_protein_interactions, intact_get_interactions, intact_get_complex_details, BioGRID_get_interactions, HPA_get_protein_interactions_by_gene
Minimum: 20 interactors OR documented explanation.
Populates: Section 6 (Protein-Protein Interactions)
PATH 5: Expression Profile
GTEx with versioned ID fallback + HPA as backup. For comprehensive HPA data, also query cell line expression comparison.
Tools: GTEx_get_median_gene_expression, HPA_get_rna_expression_by_source, HPA_get_comprehensive_gene_details_by_ensembl_id, HPA_get_subcellular_location, HPA_get_cancer_prognostics_by_gene, HPA_get_comparative_expression_by_gene_and_cellline, CELLxGENE_get_expression_data
Populates: Section 7 (Expression Profile)
See IMPLEMENTATION.md for GTEx fallback and HPA extended expression code.
PATH 6: Variants & Disease
Separate SNVs from CNVs in ClinVar results. Integrate DisGeNET for curated gene-disease association scores.
Tools: gnomad_get_gene_constraints, clinvar_search_variants, OpenTargets_get_diseases_phenotypes_by_target_ensembl, DisGeNET_search_gene, civic_get_variants_by_gene, cBioPortal_get_mutations
Required: All 4 constraint scores (pLI, LOEUF, missense Z, pRec).
Populates: Section 8 (Genetic Variation & Disease)
PATH 7: Druggability & Target Validation
Comprehensive druggability assessment including TDL classification, binding data, screening data, and essentiality.
Tools: OpenTargets_get_target_tractability_by_ensemblID, DGIdb_get_gene_druggability, DGIdb_get_drug_gene_interactions, ChEMBL_search_targets, ChEMBL_get_target_activities, Pharos_get_target, BindingDB_get_ligands_by_uniprot, PubChem_search_assays_by_target_gene, DepMap_get_gene_dependencies, OpenTargets_get_target_safety_profile_by_ensemblID, OpenTargets_get_biological_mouse_models_by_ensemblID
GPCR targets: Also query GPCRdb_get_ligands.
Populates: Sections 9 (Druggability), 10 (Safety), 12 (Competitive Landscape)
Key Data Sources for Druggability
| Source | What It Provides |
|---|---|
| Pharos TDL | Tclin/Tchem/Tbio/Tdark classification |
| BindingDB | Experimental Ki/IC50/Kd values |
| PubChem BioAssay | HTS screening hits and dose-response |
| DepMap | CRISPR essentiality across cancer cell lines |
| ChEMBL | Bioactivity records and compound counts |
See IMPLEMENTATION.md for detailed code and REFERENCE.md for complete tool parameter tables.
PATH 8: Literature & Research (Collision-Aware)
- Detect collisions - Check if gene symbol has non-biological meanings
- Build seed queries - Symbol in title with bio context, full name, UniProt accession
- Apply collision filter - Add NOT terms for off-topic meanings
- Expand via citations - For sparse targets (<30 papers), use citation network
- Classify by evidence tier - T1-T4 based on title/abstract keywords
Tools: PubMed_search_articles, PubMed_get_related, EuropePMC_search_articles, EuropePMC_get_citations, PubTator3_LiteratureSearch, OpenTargets_get_publications_by_target_ensemblID
Populates: Section 11 (Literature & Research Landscape)
See IMPLEMENTATION.md for collision-aware search code.
Retry Logic & Fallback Chains
| Primary Tool | Fallback 1 | Fallback 2 |
|---|---|---|
ChEMBL_get_target_activities |
GtoPdb_get_target_ligands |
OpenTargets drugs |
intact_get_interactions |
STRING_get_protein_interactions |
OpenTargets interactions |
GO_get_annotations_for_gene |
OpenTargets GO |
MyGene GO |
GTEx_get_median_gene_expression |
HPA_get_rna_expression |
Document as unavailable |
gnomad_get_gene_constraints |
OpenTargets constraint |
- |
DGIdb_get_drug_gene_interactions |
OpenTargets drugs |
GtoPdb |
NEVER silently skip failed tools. Always document failures and fallbacks in the report.
Completeness Audit (REQUIRED before finalizing)
Run the checklist in EVIDENCE_GRADING.md before finalizing any report:
- Data minimums met for PPIs, expression, diseases, constraints, druggability
- Negative results documented explicitly
- T1-T4 grades in Executive Summary, Disease Associations, Key Papers, Recommendations
- Every data point has source attribution
Report Template
Create [TARGET]_target_report.md with all 15 sections initialized. See REPORT_FORMAT.md for the full template with section headers, table formats, and completeness checklist.
Initial file structure:
## 1. Executive Summary ## 9. Druggability & Pharmacology
## 2. Target Identifiers ## 10. Safety Profile
## 3. Basic Information ## 11. Literature & Research
## 4. Structural Biology ## 12. Competitive Landscape
## 5. Function & Pathways ## 13. Summary & Recommendations
## 6. Protein-Protein Interactions ## 14. Data Sources & Methodology
## 7. Expression Profile ## 15. Data Gaps & Limitations
## 8. Genetic Variation & Disease
Quick Reference: Tool Parameters
| Tool | Parameter | Notes |
|---|---|---|
Reactome_map_uniprot_to_pathways |
id |
NOT uniprot_id |
ensembl_get_xrefs |
id |
NOT gene_id |
GTEx_get_median_gene_expression |
gencode_id, operation |
Try versioned ID if empty |
OpenTargets_* |
ensemblId |
camelCase, not ensemblID |
STRING_get_protein_interactions |
protein_ids, species |
List format for IDs |
intact_get_interactions |
identifier |
UniProt accession |
Reference Files
| File | Contents |
|---|---|
| IMPLEMENTATION.md | Detailed code for identifier resolution, GPCR detection, each PATH implementation, retry logic |
| EVIDENCE_GRADING.md | T1-T4 tier definitions, citation format, completeness audit checklist, data minimums |
| REPORT_FORMAT.md | Full report template with all 15 sections, table formats, section-specific guidance |
| REFERENCE.md | Complete tool reference (225+ tools) organized by category with parameters |
| EXAMPLES.md | Worked examples: EGFR full profile, KRAS druggability, target comparison, CDK4 validation, Alzheimer's targets |