skills/wu-yc/labclaw/tooluniverse-protein-structure-retrieval

tooluniverse-protein-structure-retrieval

SKILL.md

Protein Structure Data Retrieval

Retrieve protein structures with proper disambiguation, quality assessment, and comprehensive metadata.

IMPORTANT: Always use English terms in tool calls (protein names, organism names), even if the user writes in another language. Only try original-language terms as a fallback if English returns no results. Respond in the user's language.

Workflow Overview

Phase 0: Clarify (if needed)
Phase 1: Disambiguate Protein Identity
Phase 2: Retrieve Structures (Internal)
Phase 3: Report Structure Profile

Phase 0: Clarification (When Needed)

Ask the user ONLY if:

  • Protein name matches multiple genes/families (e.g., "kinase" → which kinase?)
  • Organism not specified for conserved proteins
  • Intent unclear: need experimental structure vs AlphaFold prediction?

Skip clarification for:

  • Specific PDB IDs (4-character codes)
  • UniProt accessions
  • Unambiguous protein names with organism

Phase 1: Protein Disambiguation

1.1 Resolve Protein Identity

from tooluniverse import ToolUniverse
tu = ToolUniverse()
tu.load_tools()

# Strategy depends on input type
if user_provided_pdb_id:
    # Direct structure retrieval
    pdb_id = user_provided_pdb_id.upper()
    
elif user_provided_uniprot:
    # Get UniProt info, then search structures
    uniprot_id = user_provided_uniprot
    # Can also get AlphaFold structure
    af_structure = tu.tools.alphafold_get_structure_by_uniprot(
        uniprot_id=uniprot_id
    )
    
elif user_provided_protein_name:
    # Search by name
    result = tu.tools.search_structures_by_protein_name(
        protein_name=protein_name
    )

1.2 Identity Resolution Checklist

  • Protein name/gene identified
  • Organism confirmed
  • UniProt accession (if available)
  • Isoform/variant specified (if relevant)

1.3 Handle Naming Collisions

Common ambiguous terms:

Term Ambiguity Resolution
"kinase" Hundreds of kinases Ask which kinase (EGFR, CDK2, etc.)
"receptor" Many receptor types Specify receptor family
"protease" Multiple families Ask serine/cysteine/metallo/etc.
"hemoglobin" Clear Proceed (α/β chain specified if needed)
"insulin" Clear Proceed

Phase 2: Data Retrieval (Internal)

Retrieve all data silently. Do NOT narrate the search process.

2.1 Search Structures

# Search by protein name
result = tu.tools.search_structures_by_protein_name(
    protein_name=protein_name
)

# Filter results by quality
high_res = [
    entry for entry in result["data"]
    if entry.get("resolution") and entry["resolution"] < 2.5
]

2.2 Get Structure Details

For each relevant structure:

pdb_id = "4INS"

# Basic metadata
metadata = tu.tools.get_protein_metadata_by_pdb_id(pdb_id=pdb_id)

# Experimental details
exp_details = tu.tools.get_protein_experimental_details_by_pdb_id(
    pdb_id=pdb_id
)

# Resolution (if X-ray)
resolution = tu.tools.get_protein_resolution_by_pdb_id(pdb_id=pdb_id)

# Bound ligands
ligands = tu.tools.get_protein_ligands_by_pdb_id(pdb_id=pdb_id)

# Similar structures
similar = tu.tools.get_similar_structures_by_pdb_id(
    pdb_id=pdb_id,
    cutoff=2.0
)

2.3 PDBe Additional Data

# Entry summary
summary = tu.tools.pdbe_get_entry_summary(pdb_id=pdb_id)

# Molecular entities
molecules = tu.tools.pdbe_get_molecules(pdb_id=pdb_id)

# Binding sites
binding_sites = tu.tools.pdbe_get_binding_sites(pdb_id=pdb_id)

2.4 AlphaFold Predictions

# When no experimental structure exists, or for comparison
if uniprot_id:
    af_structure = tu.tools.alphafold_get_structure_by_uniprot(
        uniprot_id=uniprot_id
    )

Fallback Chains

Primary Fallback Notes
RCSB search PDBe search Regional availability
get_protein_metadata pdbe_get_entry_summary Alternative source
Experimental structure AlphaFold prediction No experimental structure
get_protein_ligands pdbe_get_binding_sites Ligand info unavailable

Phase 3: Report Structure Profile

Output Structure

Present as a Structure Profile Report. Hide search process.

# Protein Structure Profile: [Protein Name]

**Search Summary**
- Query: [protein name/PDB ID]
- Organism: [species]
- Structures Found: [N] experimental, [M] AlphaFold

---

## Best Available Structure

### [PDB ID]: [Title]

| Attribute | Value |
|-----------|-------|
| **PDB ID** | [pdb_id] |
| **UniProt** | [uniprot_id] |
| **Organism** | [species] |
| **Method** | X-ray / Cryo-EM / NMR |
| **Resolution** | [X.XX] Å |
| **Release Date** | [date] |

**Quality Assessment**: ●●● High / ●●○ Medium / ●○○ Low

### Experimental Details
| Parameter | Value |
|-----------|-------|
| **Method** | [X-ray crystallography] |
| **Resolution** | [1.9 Å] |
| **R-factor** | [0.18] |
| **R-free** | [0.21] |
| **Space Group** | [P 21 21 21] |

### Structure Composition
| Component | Count | Details |
|-----------|-------|---------|
| **Chains** | [N] | [A (enzyme), B (inhibitor)] |
| **Residues** | [N] | [coverage %] |
| **Ligands** | [N] | [list ligand names] |
| **Waters** | [N] | |
| **Metals** | [N] | [Zn, Mg, etc.] |

### Bound Ligands
| Ligand ID | Name | Type | Binding Site |
|-----------|------|------|--------------|
| [ATP] | Adenosine triphosphate | Substrate | Active site |
| [MG] | Magnesium ion | Cofactor | Catalytic |

### Binding Site Details
For drug discovery applications:

**Site 1: Active Site**
- Location: Chain A, residues 45-89
- Key residues: Asp45, Glu67, His89
- Pocket volume: [X] ų
- Druggability: High/Medium/Low

---

## Alternative Structures

Ranked by quality and relevance:

| Rank | PDB ID | Resolution | Method | Ligands | Notes |
|------|--------|------------|--------|---------|-------|
| 1 | [4INS] | 1.9 Å | X-ray | Zn | Best resolution |
| 2 | [3I40] | 2.1 Å | X-ray | Zn, phenol | With inhibitor |
| 3 | [1TRZ] | 2.3 Å | X-ray | None | Porcine |

---

## AlphaFold Prediction

### AF-[UniProt]-F1

| Attribute | Value |
|-----------|-------|
| **UniProt** | [uniprot_id] |
| **Model Version** | [v4] |
| **Confidence (pLDDT)** | [average score] |

**Confidence Distribution**:
- Very High (>90): [X]% of residues
- High (70-90): [X]% of residues
- Low (50-70): [X]% of residues
- Very Low (<50): [X]% of residues

**Use Cases**:
- ✓ Overall fold reliable
- ✓ Core domain structure
- ⚠ Loop regions uncertain
- ✗ Not suitable for binding site analysis

---

## Structure Comparison

| Property | [PDB_1] | [PDB_2] | AlphaFold |
|----------|---------|---------|-----------|
| Resolution | 1.9 Å | 2.5 Å | N/A (predicted) |
| Completeness | 98% | 85% | 100% |
| Ligands | Yes | No | No |
| Confidence | Experimental | Experimental | High (85 avg) |

---

## Download Links

### Coordinate Files
| Format | PDB ID | Link |
|--------|--------|------|
| PDB | [4INS] | [link] |
| mmCIF | [4INS] | [link] |
| AlphaFold | [UniProt] | [link] |

### Database Links
- RCSB PDB: https://www.rcsb.org/structure/[pdb_id]
- PDBe: https://www.ebi.ac.uk/pdbe/entry/pdb/[pdb_id]
- AlphaFold: https://alphafold.ebi.ac.uk/entry/[uniprot_id]

Retrieved: [date]

Quality Assessment Tiers

Experimental Structures

Tier Symbol Criteria
Excellent ●●●● X-ray <1.5Å, complete, R-free <0.22
High ●●●○ X-ray <2.0Å OR Cryo-EM <3.0Å
Good ●●○○ X-ray 2.0-3.0Å OR Cryo-EM 3.0-4.0Å
Moderate ●○○○ X-ray >3.0Å OR NMR ensemble
Low ○○○○ >4.0Å, incomplete, or problematic

Resolution Guide

Resolution Use Case
<1.5 Å Atomic detail, H-bond analysis
1.5-2.0 Å Drug design, mechanism studies
2.0-2.5 Å Structure-based design
2.5-3.5 Å Overall architecture, fold
>3.5 Å Domain arrangement only

AlphaFold Confidence

pLDDT Score Interpretation
>90 Very high confidence, experimental-like
70-90 Good backbone confidence
50-70 Uncertain, flexible regions
<50 Low confidence, likely disordered

Completeness Checklist

Every structure report MUST include:

For Specific PDB ID (Required)

  • PDB ID and title
  • Experimental method
  • Resolution (or N/A for NMR)
  • Organism
  • Quality assessment
  • Download links

For Protein Name Search (Required)

  • Search summary with result count
  • Top structures with quality ranking
  • Best structure recommendation
  • AlphaFold alternative (if no experimental structure)

Always Include

  • Ligand information (or "No ligands bound")
  • Data sources with links
  • Retrieval date

Common Use Cases

Drug Discovery Target

User: "Get structure for EGFR kinase with inhibitor" → Filter for ligand-bound structures, emphasize binding site

Model Building

User: "Find best template for homology modeling of protein X" → High-resolution structures, note sequence coverage

Structure Comparison

User: "Compare available SARS-CoV-2 main protease structures" → All structures with systematic comparison table

AlphaFold When No Experimental

User: "Structure of protein with UniProt P12345" → Check PDB first, then AlphaFold, note confidence


Error Handling

Error Response
"PDB ID not found" Verify 4-character format, check if obsoleted
"No structures for protein" Offer AlphaFold prediction, suggest similar proteins
"Download failed" Retry once, provide alternative link
"Resolution unavailable" Likely NMR/model, note in assessment

Tool Reference

RCSB PDB (Experimental Structures)

Tool Purpose
search_structures_by_protein_name Name-based search
get_protein_metadata_by_pdb_id Basic info
get_protein_experimental_details_by_pdb_id Method details
get_protein_resolution_by_pdb_id Quality metric
get_protein_ligands_by_pdb_id Bound molecules
download_pdb_structure_file Coordinate files
get_similar_structures_by_pdb_id Homologs

PDBe (European PDB)

Tool Purpose
pdbe_get_entry_summary Overview
pdbe_get_molecules Molecular entities
pdbe_get_experiment_info Experimental data
pdbe_get_binding_sites Ligand pockets

AlphaFold (Predictions)

Tool Purpose
alphafold_get_structure_by_uniprot Get prediction
alphafold_search_structures Search predictions
Weekly Installs
2
Repository
wu-yc/labclaw
GitHub Stars
646
First Seen
3 days ago
Installed on
amp2
cline2
opencode2
cursor2
kimi-cli2
codex2