tooluniverse-noncoding-rna
Non-Coding RNA Analysis
Pipeline for identifying, annotating, and interpreting non-coding RNAs and their biological roles. Covers microRNAs (miRNAs), long non-coding RNAs (lncRNAs), and other ncRNA classes.
Key principles:
- Class determines function — miRNAs repress mRNA translation; lncRNAs have diverse mechanisms (scaffolds, guides, decoys, enhancers); rRNAs/tRNAs are structural
- Targets matter more than the ncRNA itself — for miRNAs, the regulated mRNA targets determine the phenotype
- Expression context is critical — ncRNAs are highly tissue/cell-type specific
- Conservation indicates function — deeply conserved ncRNAs (miR-let-7, MALAT1) have well-established roles
- Evidence grading — T1: validated targets (reporter assay, CLIP-seq), T2: high-confidence computational prediction, T3: expression correlation, T4: sequence-based prediction only
Type-based reasoning — look up, don't guess: Non-coding RNA function depends on type: miRNA silences target mRNAs (look up targets in miRTarBase/TargetScan), lncRNA has diverse functions (scaffolding, guiding, decoying — check literature for the specific lncRNA), circRNA may sponge miRNAs.
For any ncRNA query: first identify the class from the name/sequence, then select the appropriate evidence source. Do not assume function based on name alone — a gene named "LINC" may have a characterized mechanism, or none at all. Always search PubMed for the specific ncRNA before interpreting. For miRNAs, validated targets (T1) from miRTarBase outweigh any computational prediction — a predicted target with no experimental support is a hypothesis, not a finding. For lncRNAs, mechanism is almost always determined by experimental studies; use PubMed_search_articles with the lncRNA name + "mechanism" or "function" to find relevant evidence. For circRNAs, miRNA sponging is the most common proposed mechanism but is frequently over-claimed — look for CLIP-seq or reporter assay evidence before asserting it.
When to Use
- "What are the targets of miR-21?"
- "Find lncRNAs associated with breast cancer"
- "Is this lncRNA conserved across species?"
- "What miRNAs regulate TP53?"
- "Annotate these non-coding RNA IDs"
- "Which miRNAs are biomarkers for [disease]?"
Not this skill: For mRNA expression analysis, use tooluniverse-rnaseq-deseq2. For CRISPR screens, use tooluniverse-crispr-screen-analysis.
Core Tools
| Tool | Use For |
|---|---|
miRBase_search_mirna |
Search miRNAs by name, accession, or sequence |
miRBase_get_mirna |
Detailed miRNA info (sequence, genomic location, family) |
miRBase_get_mature_mirna |
Mature miRNA sequences and annotations |
PubMed_search_articles |
Search for validated miRNA targets in literature (e.g., "miR-21 target validation") |
LNCipedia_search_lncrna |
Search lncRNAs by name, gene symbol, or transcript ID |
LNCipedia_get_lncrna |
Detailed lncRNA transcript info (sequence, structure, conservation) |
LNCipedia_get_lncrna_xrefs |
lncRNA gene info with all transcript variants |
LNCipedia_search_ncrna_by_type |
List all transcripts for a lncRNA gene |
LNCipedia_get_lncrna_publications |
lncRNA sequence (FASTA format) |
RNAcentral_search |
Search all ncRNA types across databases |
RNAcentral_get_rna |
Detailed ncRNA annotations from 40+ databases |
Rfam_get_family |
RNA family details (structure, alignment, species distribution) |
Rfam_search |
Search RNA families by keyword |
DisGeNET_search_gene |
ncRNA-disease associations |
PubMed_search_articles |
ncRNA literature |
GTEx_get_median_gene_expression |
Tissue expression of ncRNA genes |
Workflow
Phase 0: ncRNA Identity & Classification
Name/ID → miRBase/LNCipedia/RNAcentral → class, sequence, genomic location
|
Phase 1: Target & Interaction Analysis
miRNA → target mRNAs; lncRNA → interacting proteins/RNAs/chromatin
|
Phase 2: Expression & Tissue Specificity
GTEx/GEO → where is it expressed? Tissue-specific or ubiquitous?
|
Phase 3: Disease Associations
DisGeNET/PubMed/CTD → ncRNA-disease links with evidence
|
Phase 4: Functional Interpretation
Pathway enrichment of targets → biological role → clinical significance
Phase 0: ncRNA Identity & Classification
ncRNA classes by size and database:
- miRNA (~22 nt, miRBase): Post-transcriptional silencing via 3'UTR binding
- lncRNA (>200 nt, LNCipedia): Diverse — chromatin remodeling, transcription regulation, miRNA sponges
- rRNA (120-5000 nt, RNAcentral/Rfam): Ribosome components
- tRNA (~76 nt, RNAcentral): Amino acid delivery
- snoRNA (60-300 nt, Rfam): rRNA modification (methylation, pseudouridylation)
- snRNA (~150 nt, Rfam): Spliceosome components
- piRNA (26-31 nt, RNAcentral): Transposon silencing in germline
- circRNA (variable, RNAcentral): miRNA sponges, protein scaffolds (experimental evidence required)
Identification workflow:
- Name starts with
miR-orhsa-mir-→ search miRBase - Name starts with
LINC,MALAT,HOTAIR,XIST, or ends in-AS1→ search LNCipedia - Any ncRNA type → search RNAcentral (aggregates all databases)
- RNA family question → search Rfam
Phase 1: Target & Interaction Analysis
For miRNAs — the targets determine the biology:
NOTE: There is no dedicated miRNA target lookup tool in ToolUniverse. To find miRNA targets:
- Literature search (most reliable):
PubMed_search_articles(query="miR-21 target validation luciferase") - Cross-references:
miRBase_get_mirna_xrefs(accession="MIMAT0000076")— may link to external target databases - Known targets for well-studied miRNAs: Use the reference table below, then validate via STRING/Reactome
- For novel miRNAs: Search PubMed for "[miRNA] target" and extract validated targets from papers
Well-studied miRNA targets (for common oncomiRs/tumor suppressors):
- miR-21: PTEN, PDCD4, TPM1, RECK, SPRY1, SPRY2, BTG2
- miR-155: SOCS1, SHIP1, AID, TP53INP1
- miR-122: SLC7A1, ADAM17 (also HCV IRES cofactor)
- let-7: RAS, HMGA2, MYC, LIN28
Target interpretation framework:
- Validated (T1): Luciferase reporter, CLIP-seq, degradome-seq — base conclusions on these
- High-confidence prediction (T2): TargetScan conserved sites, DIANA-microT score > 0.9 — support validated findings
- Prediction only (T3-T4): miRanda, PicTar, RNA22 — hypothesis generation only; do not report as findings
For lncRNAs — the mechanism varies:
| lncRNA Mechanism | Example | How to Investigate |
|---|---|---|
| Chromatin modifier | HOTAIR, XIST | Check interacting proteins (PRC2, LSD1) via PubMed |
| Transcription regulator | NEAT1, MEG3 | Check nearby genes (cis-regulation) via genomic location |
| miRNA sponge | MALAT1, circRNAs | Search for miRNA binding sites |
| Scaffold | NKILA, BCAR4 | Check protein interactions |
| Enhancer RNA | eRNAs | Check ENCODE enhancer annotations |
Phase 2: Expression & Tissue Specificity
GTEx_get_median_gene_expression(gene_symbol="MIR21") # miRNA host gene expression
# Note: GTEx measures RNA-seq; miRNA expression may need miRNA-seq data from GEO
Interpretation: Tissue-restricted ncRNAs are often functionally important in that tissue. Ubiquitous ncRNAs (like MALAT1) tend to have housekeeping roles.
Phase 3: Disease Associations
DisGeNET_search_gene(query="MIR21") # miR-21 disease associations
PubMed_search_articles(query="miR-21 biomarker cancer")
Key ncRNA-disease associations (well-established T1 examples — always verify via DisGeNET or PubMed for the specific ncRNA):
- miR-21: OncomiR in multiple cancers; targets PTEN, PDCD4, TPM1 (hundreds of T1 studies)
- miR-155: B-cell lymphoma, inflammation — immune regulation
- miR-122: Hepatitis C liver disease — HCV replication cofactor; therapeutic target (miravirsen)
- let-7 family: Lung cancer, stem cell differentiation — tumor suppressor targeting RAS, HMGA2
- HOTAIR: Breast/colorectal cancer — recruits PRC2, promotes metastasis
- MALAT1: Lung cancer/metastasis — splicing regulation
- XIST: X-inactivation, cancer — chromatin silencing
- H19: Beckwith-Wiedemann syndrome, cancer — imprinted lncRNA, miR-675 host
- ANRIL: CVD, diabetes, cancer — CDKN2A/B locus regulation (GWAS-validated)
Phase 4: Functional Interpretation
After identifying miRNA targets (Phase 1), run pathway enrichment:
# Collect validated target gene symbols
targets = ["PTEN", "PDCD4", "TPM1", "RECK", "SPRY1"] # miR-21 targets
# Pathway enrichment
ReactomeAnalysis_pathway_enrichment(identifiers="PTEN PDCD4 TPM1 RECK SPRY1")
STRING_get_network(identifiers="PTEN\rPDCD4\rTPM1\rRECK\rSPRY1", species=9606)
Interpretation: If miR-21 targets are enriched in apoptosis and PI3K-AKT signaling → miR-21 is an oncomiR that promotes survival by simultaneously suppressing multiple tumor suppressors.
Report structure:
- ncRNA Identity — class, sequence, genomic location, conservation
- Targets/Interactions — validated targets with evidence grades
- Expression Profile — tissue specificity, disease-specific expression changes
- Disease Associations — evidence-graded disease links
- Pathway Analysis — enriched pathways among targets
- Mechanistic Model — how this ncRNA contributes to disease biology
- Clinical Potential — biomarker utility, therapeutic target potential (antagomirs, ASOs)
Limitations
Computational Procedure: TargetScan Predicted Targets (Download-and-Process)
TargetScan provides the best computational miRNA target predictions but has no REST API. Download and process locally:
# Step 1: Download TargetScan predicted targets (one-time, ~10MB zipped)
# URL: https://www.targetscan.org/vert_80/vert_80_data_download/Summary_Counts.default_predictions.txt.zip
import pandas as pd
import zipfile, io, requests
url = "https://www.targetscan.org/vert_80/vert_80_data_download/Summary_Counts.default_predictions.txt.zip"
resp = requests.get(url, timeout=60)
with zipfile.ZipFile(io.BytesIO(resp.content)) as z:
fname = z.namelist()[0]
df = pd.read_csv(z.open(fname), sep='\t')
# Step 2: Query for a specific miRNA family
mirna = "miR-21-5p" # or "miR-21/590-5p" (TargetScan uses family names)
targets = df[df['miRNA Family'].str.contains("miR-21", case=False, na=False)]
# Step 3: Rank by cumulative weighted context++ score
targets_ranked = targets.sort_values('Cumulative weighted context++ score', ascending=True)
print(f"Top 20 predicted targets of {mirna}:")
for _, row in targets_ranked.head(20).iterrows():
print(f" {row['Target Gene']:10s} score={row['Cumulative weighted context++ score']:.3f} "
f"sites={row['Total num conserved sites']}")
Interpretation: More negative context++ score = stronger predicted repression. Conserved sites (>1) are higher confidence.
Computational Procedure: miRTarBase Validated Targets (Download-and-Process)
miRTarBase has Cloudflare protection blocking programmatic access. Use the R/Bioconductor data package or bulk download:
# Option 1: Download from miRTarBase bulk export (requires browser download first)
# Go to: https://mirtarbase.cuhk.edu.cn/~miRTarBase/miRTarBase_2025/
# Download: hsa_MTI.xlsx (human miRNA-target interactions)
# Option 2: Use the GitHub data dump
# https://github.com/jorainer/mirtarbase — R package with cached data
# Once you have the file:
import pandas as pd
mti = pd.read_excel("hsa_MTI.xlsx") # or read_csv if TSV
# Filter for your miRNA
mir21_targets = mti[mti['miRNA'].str.contains('hsa-miR-21', case=False, na=False)]
print(f"miR-21 validated targets: {len(mir21_targets)}")
# Filter by evidence strength
strong = mir21_targets[mir21_targets['Support Type'].str.contains(
'Luciferase|Reporter|Western|CLIP', case=False, na=False
)]
print(f" Strong evidence (reporter/CLIP): {len(strong)}")
for _, row in strong.head(10).iterrows():
print(f" {row['Target Gene']:10s} — {row['Support Type']}")
When download is not available: Use the built-in reference table in Phase 1 for well-studied miRNAs, or search PubMed for validated targets.
Limitations
- miRNA target prediction is noisy — even the best algorithms have >50% false positive rates; always prioritize experimentally validated targets
- lncRNA function is poorly characterized — only ~5% of annotated lncRNAs have known functions
- Expression measurement varies — miRNA-seq, RNA-seq, and microarray capture different ncRNA classes; check the assay type
- Species differences — miRNAs are often conserved but lncRNAs are frequently species-specific; cross-species lncRNA comparisons are unreliable