tooluniverse-plant-genomics
Plant Genomics & Biology
Pipeline for investigating plant genes, metabolic pathways, species taxonomy, and comparative plant biology using ToolUniverse tools.
Reasoning Strategy
Plant genomes are large (wheat is ~17 Gb, vs. 3 Gb for human) and often polyploid — wheat is hexaploid (AABBDD), meaning there are three homeologous copies of most genes. When comparing plant genes to Arabidopsis, always account for whole-genome duplications: a single Arabidopsis gene may have 2–4 paralogs in a crop species, all potentially with diverged functions. Gene families are massively expanded in plants relative to animals (e.g., receptor-like kinases, cytochrome P450s, transcription factors) — a BLAST hit does not mean functional equivalence. Arabidopsis thaliana is the primary model, but its small genome and rapid life cycle mean some features (wood formation, nitrogen fixation symbiosis, C4 photosynthesis) are absent and must be studied in other species.
LOOK UP DON'T GUESS: Do not assume gene function by sequence similarity alone in polyploid species; look up functional validation evidence via UniProt (reviewed entries) or PlantReactome. Do not assume KEGG organism codes — use the table or query kegg_search_pathway with the species name to confirm availability.
Key principles:
- Plant-specific pathways — photosynthesis, secondary metabolism, hormone signaling are unique to plants
- PlantReactome as foundation — curated plant pathway database with cross-species coverage (Oryza, Arabidopsis, Zea mays, etc.)
- Ensembl Plants for genomics — use Ensembl with plant species names for gene lookup and annotation
- KEGG for metabolism — KEGG has plant-specific organism codes (ath=Arabidopsis, osa=rice, zma=maize)
- Evidence grading — T1: functional validation (mutant phenotype), T2: expression/localization data, T3: ortholog-based prediction, T4: computational annotation only
When to Use
- "What pathway is [plant gene] involved in?"
- "Find genes in the flavonoid biosynthesis pathway"
- "Compare [gene] across Arabidopsis and rice"
- "What species is [plant name]?"
- "Plant hormone signaling pathways"
- "Photosynthesis gene annotation"
Not this skill: For general pathway analysis (human/mouse), use tooluniverse-systems-biology. For phylogenetics, use tooluniverse-phylogenetics.
Core Tools
| Tool | Use For |
|---|---|
PlantReactome_search_pathways |
Search plant-specific pathways by keyword |
PlantReactome_get_pathway |
Get pathway details (genes, reactions, species) |
PlantReactome_list_species |
List all species covered by PlantReactome |
POWO_search_plants |
Search Plants of the World Online (taxonomy, distribution) |
ensembl_lookup_gene |
Gene lookup — use with plant species (e.g., species="arabidopsis_thaliana") |
kegg_search_pathway |
Search KEGG pathways (use plant organism codes: ath, osa, zma) |
KEGG_get_pathway_genes |
Get genes in a plant pathway (e.g., pathway_id="ath00941" for flavonoid in Arabidopsis) |
UniProt_search |
Search plant protein sequences (add taxonomy_id:3702 for Arabidopsis) |
UniProt_get_function_by_accession |
Get protein function annotation |
PubMed_search_articles |
Plant biology literature |
EnsemblCompara_get_orthologues |
Cross-species plant gene comparison |
Workflow
Phase 0: Species & Gene Identification
Species name → POWO taxonomy; Gene symbol → Ensembl/UniProt IDs
|
Phase 1: Gene Function & Annotation
UniProt function, Ensembl annotation, InterPro domains
|
Phase 2: Pathway Analysis
PlantReactome → plant-specific pathways; KEGG → metabolism
|
Phase 3: Cross-Species Comparison
Ensembl Compara → orthologs in other plant species
|
Phase 4: Literature & Report
PubMed → published studies; synthesis
Phase 1: Gene Function
# Look up an Arabidopsis gene
ensembl_lookup_gene(gene_symbol="CHS", species="arabidopsis_thaliana")
# Get protein function
UniProt_search(query="CHS AND taxonomy_id:3702 AND reviewed:true")
Phase 2: Plant Pathway Analysis
Key plant-specific KEGG pathways:
| Pathway | KEGG ID (Arabidopsis) | Biological Significance |
|---|---|---|
| Photosynthesis | ath00195 | Light reactions, electron transport |
| Carbon fixation (Calvin cycle) | ath00710 | CO2 → sugar |
| Flavonoid biosynthesis | ath00941 | UV protection, pigmentation, defense |
| Carotenoid biosynthesis | ath00906 | Photoprotection, vitamin A precursors |
| Auxin signaling | ath04075 | Growth, tropisms |
| Brassinosteroid signaling | ath04712 | Cell elongation, stress response |
| Circadian rhythm (plant) | ath04712 | Photoperiod, flowering time |
| Terpenoid backbone | ath00900 | Secondary metabolite precursors |
| Starch/sucrose metabolism | ath00500 | Carbon partitioning |
| Nitrogen metabolism | ath00910 | Nitrogen assimilation |
# Search PlantReactome for flavonoid pathway
PlantReactome_search_pathways(query="flavonoid")
# Get genes in Arabidopsis flavonoid biosynthesis
KEGG_get_pathway_genes(pathway_id="ath00941")
Phase 3: Species Comparison
KEGG organism codes for major crops:
| Species | Code | Common Name |
|---|---|---|
| Arabidopsis thaliana | ath | Thale cress (model plant) |
| Oryza sativa | osa | Rice |
| Zea mays | zma | Maize/corn |
| Triticum aestivum | tae | Wheat |
| Glycine max | gmx | Soybean |
| Solanum lycopersicum | sly | Tomato |
| Nicotiana tabacum | nta | Tobacco |
| Medicago truncatula | mtr | Barrel medic (legume model) |
Phase 4: Interpretation Framework
Evidence grading: T1 = mutant phenotype confirms function; T2 = expression/localization data; T3 = ortholog has validated function in model species; T4 = computational annotation only (domain/GO term). Prioritize T1/T2 evidence; treat T3/T4 as hypotheses requiring further validation.
Synthesis Questions
- Is the gene plant-specific or conserved? (Plant-specific genes often in secondary metabolism; conserved genes in primary metabolism)
- Which tissues/developmental stages express it? (Root vs shoot vs flower vs seed)
- Is there a crop improvement application? (Yield, stress tolerance, nutritional quality)
- What regulatory mechanisms control it? (Hormone-responsive, light-regulated, circadian)
- Are there natural variants with known phenotypes? (Accession diversity in Arabidopsis 1001 Genomes)
Limitations
- No TAIR tool — The Arabidopsis Information Resource has no public REST API. Use Ensembl Plants and UniProt as alternatives for Arabidopsis gene data.
- PlantReactome coverage — Focused on Oryza sativa (rice) with cross-references to Arabidopsis. Not all plant species equally covered.
- No crop breeding tools — This skill covers gene/pathway analysis, not marker-assisted selection or breeding simulation.
- POWO is taxonomy-focused — Plants of the World Online provides species identification and distribution, not genomics data.