tooluniverse-cancer-classification
Cancer Classification via OncoTree
Standardize cancer type nomenclature using the OncoTree ontology. Resolves free-text tumor descriptions to structured codes with UMLS/NCI cross-references, enabling downstream use in OncoKB variant annotation and GDC cohort selection.
When to Use
Apply when researcher asks about:
- "What is the OncoTree code for [tumor description]?"
- "Find all subtypes of [cancer type]"
- "What cancers originate in [tissue]?"
- "I need the tumor type code for OncoKB annotation"
- "What is the TCGA/COSMIC code for [cancer]?"
- "List all CNS/Brain cancer subtypes"
- "What NCI code corresponds to glioblastoma?"
Key Tools
| Tool | Purpose | Key Params |
|---|---|---|
OncoTree_search |
Free-text search for cancer types | query (tumor name or description) |
OncoTree_get_type |
Full details for a known OncoTree code | code (e.g., "LUAD", "AML") |
OncoTree_list_tissues |
List all 32 tissue categories | (no params) |
OncoKB_annotate_variant |
Variant annotation using OncoTree code | gene, variant, tumor_type |
GDC_get_mutation_frequency |
Pan-cancer mutation frequency (TCGA) | gene_symbol |
Workflow
Phase 1: Cancer Type Discovery
Start with free-text search to find matching OncoTree codes:
OncoTree_search(query="breast cancer")
-> Returns list: code, name, main_type, tissue, parent, level, external_references
Key response fields:
code: OncoTree code (e.g., "BRCA", "IBC") — use this in OncoKB callslevel: hierarchy depth (1=tissue, 2=main type, 3-5=subtypes)parent: parent node code for navigating the hierarchyexternal_references.UMLS: UMLS CUI listexternal_references.NCI: NCI thesaurus code list
Search tips:
- Broad terms ("lung cancer") return many results; narrow by tissue or level
- Use tissue-specific terms ("invasive breast carcinoma") for precise matching
- Acronyms work: query="GBM" finds glioblastoma, query="AML" finds leukemia types
Phase 2: Code Validation and Detail Retrieval
Once you have a candidate code, retrieve full details:
OncoTree_get_type(code="LUAD")
-> Returns: name, main_type, tissue, color, parent, level, history, external_references
Note: Not all codes are valid. "GBM" returns 404 — correct code is "GB" (Glioblastoma, IDH-Wildtype).
Always validate via OncoTree_get_type before using in downstream tools.
Phase 3: Tissue-Level Exploration
When the user wants all cancers in a tissue category:
OncoTree_list_tissues()
-> Returns 32 tissue names: "Breast", "CNS/Brain", "Lung", "Myeloid", ...
OncoTree_search(query="CNS/Brain")
-> All cancer types with tissue="CNS/Brain"
Phase 4: Downstream Use in Variant Annotation
Pass validated OncoTree code to OncoKB for cancer-type-specific therapeutic levels:
OncoKB_annotate_variant(gene="EGFR", variant="L858R", tumor_type="LUAD")
-> highestSensitiveLevel: "1" (FDA-approved therapy for this tumor+variant)
Without tumor_type, OncoKB returns pan-cancer levels which may be less specific.
Tool Parameter Reference
| Tool | Required | Optional | Notes |
|---|---|---|---|
OncoTree_search |
query |
— | Free text; returns list sorted by relevance |
OncoTree_get_type |
code |
— | Case-sensitive; "BRCA" not "brca". Returns 404 for invalid codes |
OncoTree_list_tissues |
— | — | No params; returns list of 32 tissue strings |
OncoKB_annotate_variant |
gene, variant |
tumor_type |
tumor_type is OncoTree code; omit for pan-cancer |
GDC_get_mutation_frequency |
gene_symbol |
— | Pan-cancer TCGA only; no per-subtype breakdown |
Common OncoTree Codes (verified working)
| Code | Name | Tissue |
|---|---|---|
BRCA |
Invasive Breast Carcinoma | Breast |
LUAD |
Lung Adenocarcinoma | Lung |
LUSC |
Lung Squamous Cell Carcinoma | Lung |
MEL |
Melanoma | Skin |
CRC |
Colorectal Cancer | Bowel |
PAAD |
Pancreatic Adenocarcinoma | Pancreas |
GBM |
(invalid — use GB) |
CNS/Brain |
GB |
Glioblastoma, IDH-Wildtype | CNS/Brain |
AML |
Acute Myeloid Leukemia | Myeloid |
PRAD |
Prostate Adenocarcinoma | Prostate |
Common Patterns
# Pattern: Resolve free-text to OncoTree code
results = OncoTree_search(query="pancreatic ductal adenocarcinoma")
# Pick result with lowest level number (most specific match)
code = results["data"][0]["code"] # e.g., "PAAD"
# Pattern: Get all subtypes within a main type
results = OncoTree_search(query="Glioma")
subtypes = [r for r in results["data"] if r["main_type"] == "Glioma"]
# Pattern: Validate code before OncoKB call
detail = OncoTree_get_type(code="GB")
if detail["status"] == "success":
OncoKB_annotate_variant(gene="IDH1", variant="R132H", tumor_type="GB")
Tumor Classification Reasoning (CRITICAL)
LOOK UP DON'T GUESS -- tumor classification determines treatment. Always verify codes and biomarker interpretation via tools rather than relying on memory.
Histological vs Molecular Classification
Tumors are classified on TWO axes -- both matter for treatment selection:
- Histological (what it looks like under microscope): adenocarcinoma, squamous, small cell, etc. This determines the OncoTree hierarchy level 3+.
- Molecular (what mutations/alterations drive it): EGFR-mutant, HER2-amplified, MSI-high, etc. This determines OncoKB therapeutic levels.
A tumor can be histologically identical to another but molecularly different, requiring different treatment. Example: two lung adenocarcinomas (both LUAD) but one is EGFR-mutant (targeted therapy) and another is KRAS-mutant (different targeted therapy). Always check both axes.
Biomarker Interpretation Strategy
When interpreting cancer biomarkers, use OncoKB for actionability:
- HER2: Positive = IHC 3+ or FISH-amplified. Use
OncoKB_annotate_variant(gene="ERBB2", variant="Amplification", tumor_type="BRCA")for therapeutic level - ER/PR: Positive = hormone-receptor positive breast cancer. Changes treatment class (endocrine therapy)
- Ki67: Proliferation index. High (>20%) suggests aggressive biology; used in breast cancer grading (Luminal A vs B)
- TMB (Tumor Mutational Burden): High TMB (>10 mut/Mb) predicts immunotherapy response across tumor types. Use
OncoKB_annotate_variant(gene="Other Biomarkers", variant="TMB-H") - MSI (Microsatellite Instability): MSI-High is FDA-approved biomarker for pembrolizumab pan-cancer. Use
OncoKB_annotate_variant(gene="Other Biomarkers", variant="MSI-H")
Staging vs Grading -- Different Concepts
- Stage (TNM): How far has it spread? T=tumor size, N=lymph nodes, M=metastasis. Stage I-IV. Determines prognosis and surgery eligibility.
- Grade: How abnormal do the cells look? Grade 1 (well-differentiated, slow) to Grade 3 (poorly-differentiated, aggressive). Determines aggressiveness.
- A Stage I, Grade 3 tumor (small but aggressive) has different implications than Stage III, Grade 1 (spread but slow-growing).
Actionability Assessment
After classifying the tumor, assess whether findings are clinically actionable:
- Level 1 (FDA-approved, specific tumor type): Immediate treatment implication. Example: EGFR L858R in LUAD
- Level 2 (Standard care): Strong evidence but context-dependent
- Level 3 (Compelling evidence): Clinical trial candidates
- Level 4 (Biological evidence): Research-stage only
- Always provide the OncoTree code to OncoKB -- without it, you get pan-cancer levels which may understate or overstate actionability for the specific tumor type
Reasoning Framework for Result Interpretation
Evidence Grading
| Grade | Criteria | Example |
|---|---|---|
| Confirmed | Exact OncoTree code validated via OncoTree_get_type, UMLS + NCI cross-refs present |
LUAD: validated, UMLS C0152013, NCI C3512 |
| Probable | OncoTree search returns match, but code not yet validated or missing cross-refs | Search for "cholangiocarcinoma" returns CHOL with partial external refs |
| Ambiguous | Multiple OncoTree codes match the description at different hierarchy levels | "Breast cancer" matches BRCA (invasive), BREAST (tissue), IBC (inflammatory) |
| Unresolved | No OncoTree match; tumor type too rare or novel for the ontology | Ultra-rare sarcoma subtype not in OncoTree |
Interpretation Guidance
- OncoTree code confidence: Always validate candidate codes with
OncoTree_get_typebefore downstream use. Some common acronyms (e.g., "GBM") are NOT valid OncoTree codes (correct code is "GB"). A validated code with UMLS and NCI cross-references is highest confidence. - UMLS/NCI cross-reference priority: For standardized reporting, NCI Thesaurus codes are preferred for cancer-specific contexts (used by caDSR, GDC). UMLS CUIs are broader (cross-disease) and useful for literature mining. When both are available, report both; when only one exists, NCI is preferred for oncology workflows.
- Tissue hierarchy interpretation: OncoTree levels represent specificity: Level 1 = tissue of origin (e.g., "Lung"), Level 2 = main cancer type (e.g., "Non-Small Cell Lung Cancer"), Level 3+ = histological subtypes (e.g., "Lung Adenocarcinoma"). For OncoKB variant annotation, use the most specific (deepest) level that accurately describes the tumor. For cohort-level analysis (e.g., TCGA), the Level 2-3 code is typically appropriate.
- OncoKB tumor type impact: Providing a tumor type code to OncoKB can change the therapeutic level (e.g., EGFR L858R is Level 1 in LUAD but Level 3B pan-cancer). Always use the validated OncoTree code for the patient's specific tumor type.
- Deprecated or renamed codes: OncoTree evolves across versions. The
historyfield inOncoTree_get_typeresponse shows prior names. Always use the current code.
Synthesis Questions
- Does the chosen OncoTree code represent the most specific histological subtype, or could a more precise code provide better therapeutic annotation in OncoKB?
- When the free-text tumor description maps to multiple OncoTree codes, which hierarchy level best balances specificity and coverage for the analysis goal (variant annotation vs cohort selection)?
- Are the UMLS/NCI cross-references consistent with external classifications (WHO, ICD-O), or are there discrepancies that need resolution?
Fallback Chains
| Primary | Fallback | When |
|---|---|---|
OncoTree_get_type(code="GBM") |
OncoTree_search(query="glioblastoma") |
404 for common aliases |
OncoTree_search (no results) |
OncoTree_list_tissues + tissue-level search |
Very rare/novel tumor types |
| OncoTree code for OncoKB | Omit tumor_type param |
Code not recognized by OncoKB |