bioinformatics-visualization
Installation
SKILL.md
Bioinformatics Visualization
iTOL Dataset Formats and Troubleshooting
Choosing the Right Dataset Type
DATASET_BINARY (Recommended for markers/symbols):
- More reliable than DATASET_SYMBOL
- All species must be listed with binary values (0 or 1)
- Simpler format, better iTOL compatibility
- Use for: presence/absence markers, technology indicators, categorical highlights
Format example:
DATASET_BINARY
SEPARATOR TAB
DATASET_LABEL CLR Technology
COLOR #ff0000
LEGEND_TITLE Sequencing Technology
LEGEND_SHAPES 2
LEGEND_COLORS #ff0000
LEGEND_LABELS CLR (PacBio)
FIELD_SHAPES 2
FIELD_COLORS #ff0000
FIELD_LABELS CLR
DATA
Species_name_1 1
Species_name_2 0
Species_name_3 1
DATASET_SYMBOL (Less reliable):
- Can be finicky about format
- Per-species shape/size/color specifications complex
- May not display correctly even with valid format
- Avoid unless BINARY doesn't meet needs
DATASET_COLORSTRIP (Good for gradients):
- Reliable for color gradients (e.g., temporal data, continuous values)
- Only species with data need to be listed
- Good for non-binary categorical or continuous data
Common iTOL Errors and Fixes
Error: "Unknown variable 'SYMBOL_SHAPE'"
- Cause: Mixing global symbol settings with per-species data
- Fix: Switch to DATASET_BINARY format
Error: "Invalid color '1' for node X"
- Cause: DATASET_SYMBOL data format mismatch
- Fix: Use DATASET_BINARY instead, format:
species<tab>0_or_1
Symbols not appearing on tree:
- Likely cause: DATASET_SYMBOL format issues
- Fix: Convert to DATASET_BINARY
- Verify: Check that all species in config exist in tree file
Species Name Compatibility
Critical: Species names must match exactly between tree and annotation files
Common issues:
- Case sensitivity: "Alca Torda" vs "Alca_torda"
- Spaces vs underscores: Always use underscores in tree format
- Subspecies names: Handle three-part names carefully
Fix for case sensitivity:
# Convert scientific names to tree format with case normalization
df['species_tree'] = df['scientific_name'].str.replace(' ', '_')
# Fix uppercase after underscore (Alca_Torda -> Alca_torda)
df['species_tree'] = df['species_tree'].str.replace(
r'_([A-Z])',
lambda m: '_' + m.group(1).lower(),
regex=True
)
Validation pattern:
# Always validate species compatibility
import re
# Extract species from tree
with open('tree.nwk') as f:
tree_content = f.read()
tree_species = set(re.findall(r'([A-Z][a-z]+_[a-z]+)', tree_content))
# Check config species
config_species = set(df['species_tree'])
missing = config_species - tree_species
if missing:
print(f"Species in config but not in tree: {missing}")
Color Gradients for Temporal Data
Effective color schemes:
Temporal progression (old → new):
- Light Yellow → Dark Red (ColorBrewer YlOrRd)
- Clearly shows progression from past to present
- Example:
#ffffcc(2019) →#b10026(2025)
Avoid:
- Blue → Yellow → Red (confusing middle point)
- Diverging palettes for sequential data
ColorBrewer palettes for sequential data:
- YlOrRd: Yellow-Orange-Red (temporal, intensity)
- YlGn: Yellow-Green (growth, vegetation)
- PuBuGn: Purple-Blue-Green (water, depth)
Debugging Workflow
- Generate config file
- Upload to iTOL (https://itol.embl.de)
- If errors: Save error messages to file
- Check format: BINARY vs SYMBOL vs COLORSTRIP
- Validate species names: Match against tree file
- Test with minimal dataset: 5-10 species first
- Switch formats if needed: SYMBOL → BINARY usually works
Related Skills
- data-visualization: General visualization best practices
- bioinformatics/fundamentals: Core bioinformatics concepts
- bioinformatics/phylogenetics: Phylogenetic analysis workflows
Weekly Installs
3
Repository
delphine-l/claude_globalGitHub Stars
11
First Seen
8 days ago
Security Audits
Installed on
claude-code3
opencode2
gemini-cli2
deepagents2
antigravity2
github-copilot2