galaxy-tool-wrapping
Galaxy Tool Wrapping Expert
Expert knowledge for developing Galaxy tool wrappers. Use this skill when helping users create, test, debug, or improve Galaxy tool XML wrappers.
Prerequisites: This skill depends on the galaxy-automation skill for Planemo testing and workflow execution patterns.
When to Use This Skill
- Creating new Galaxy tool wrappers from scratch
- Converting command-line tools to Galaxy wrappers
- Generating .shed.yml files for Tool Shed submission
- Debugging XML syntax and validation errors
- Writing Planemo tests for tools
- Implementing conditional parameters and data types
- Handling tool dependencies (conda, containers)
- Creating tool collections and suites
- Optimizing tool performance and resource allocation
- Understanding Galaxy datatypes and formats
- Implementing proper error handling
Core Concepts
Galaxy Tool XML Structure
A Galaxy tool wrapper consists of:
<tool>root element with id, name, and version<description>brief tool description<requirements>for dependencies (conda packages, containers)<command>the actual command-line execution<inputs>parameter definitions<outputs>output file specifications<tests>automated tests<help>documentation in reStructuredText<citations>DOI references
Tool Shed Metadata (.shed.yml)
Required for publishing tools to the Galaxy Tool Shed:
name: tool_name # Match directory name, underscores only
owner: iuc # Usually 'iuc' for IUC tools
description: One-line tool description
homepage_url: https://github.com/tool/repo
long_description: |
Multi-line detailed description.
Can include features, use cases, and tool suite contents.
remote_repository_url: https://github.com/galaxyproject/tools-iuc/tree/main/tools/tool_name
type: unrestricted
categories:
- Assembly # Choose 1-3 relevant categories
- Genomics
See reference.md for comprehensive .shed.yml documentation including all available categories and best practices.
Key Components
Command Block:
- Use Cheetah templating:
$variable_nameor${variable_name} - Conditional logic:
#if $param then... #end if - Loop constructs:
#for $item in $collection... #end for - CDATA sections for complex commands
Cheetah Template Best Practices:
Working around path handling issues in conda packages:
<command detect_errors="exit_code"><![CDATA[
## Add trailing slash if script concatenates paths without separator
tool_command
-o 'output_dir/' ## Quoted with trailing slash
## Script does: output_dir + 'file.txt' → 'output_dir/file.txt' ✓
## Without slash: output_dir + 'file.txt' → 'output_dirfile.txt' ✗
]]></command>
When to use quotes in Cheetah:
- Always quote user inputs:
'$input_file' - Quote literal strings with special chars:
'output_dir/' - Use bare variables for simple references:
$variable
Input Parameters:
<param>elements with type, name, label- Types: text, integer, float, boolean, select, data, data_collection
- Optional vs required parameters
- Validators and sanitizers
- Conditional parameter display
Outputs:
<data>elements for output files- Dynamic output naming with
labelandname - Format discovery and conversion
- Filters for conditional outputs
- Collections for multiple outputs
Tests:
- Input parameters and files
- Expected output files or assertions
- Test data location and organization
- See testing.md for detailed testing strategies including large file handling
Best Practices
- Always include tests - Planemo won't pass without them
- Use semantic versioning - Increment tool version on changes
- Specify exact dependencies - Pin conda package versions
- Add clear help text - Document all parameters
- Handle errors gracefully - Check exit codes, validate inputs
- Use collections - For multiple related files
- Follow IUC standards - If contributing to intergalactic utilities commission
- Plan for large output files - Before creating tests, check expected output sizes. If over 1MB, use assertion-based tests (
has_size,has_line) instead of full file comparison (see testing.md)
Common Planemo Commands
# Test tool locally
planemo test tool.xml
# Serve tool in local Galaxy
planemo serve tool.xml
# Lint tool for best practices
planemo lint tool.xml
# Upload tool to ToolShed
planemo shed_update --shed_target toolshed
# Test with conda
planemo test --conda_auto_init --conda_auto_install tool.xml
Output Routing with Symlinks
When a tool writes output to a filename it constructs internally (not $output), use
symlinks in the command block to route the file to Galaxy's output variable.
Pattern: Symlink before command execution
<command detect_errors="exit_code"><![CDATA[
## Create symlink so tool output lands where Galaxy expects it
ln -s '$output_variable' 'expected_tool_output_name' &&
tool_command --input '$input' -o 'expected_tool_output_name'
]]></command>
Pattern: Prefix-based output naming
Some tools use --out-prefix where the output filename is prefix + input_filename.
The tool constructs the filename internally, so you must predict it and symlink:
<command><![CDATA[
#set $mangled_input = re.sub(r"[^\w\-\s]", "_", str($input.element_identifier)) + "." + str($input.ext)
ln -s '$input' '$mangled_input' &&
ln -s '$output_var' 'myprefix${mangled_input}' &&
tool_command --input-reads '$mangled_input' -p myprefix
]]></command>
Key points:
- Symlink is created before running the tool -- the tool writes through it
- Must match the exact filename the tool will produce
- For prefix mode: output =
prefix + getFileName(input), so mangle the input name to match
Using format_source for dynamic output formats
When output format should match the input format (e.g., subsampled reads):
<data name="subsampled_outfile" format_source="input_reads" label="Subsampled reads">
<filter>output_options["output_type"]["type_selector"] == "subsampled_reads"</filter>
</data>
This is preferable to change_format when the output is always the same format as input.
Use change_format when the user explicitly selects the output format.
XML Template Example
<tool id="tool_id" name="Tool Name" version="1.0.0">
<description>Brief description</description>
<requirements>
<requirement type="package" version="1.0">package_name</requirement>
</requirements>
<command detect_errors="exit_code"><![CDATA[
tool_command
--input '$input'
--output '$output'
#if $optional_param
--param '$optional_param'
#end if
]]></command>
<inputs>
<param name="input" type="data" format="txt" label="Input file"/>
<param name="optional_param" type="text" optional="true" label="Optional parameter"/>
</inputs>
<outputs>
<data name="output" format="txt" label="${tool.name} on ${on_string}"/>
</outputs>
<tests>
<test>
<param name="input" value="test_input.txt"/>
<output name="output" file="expected_output.txt"/>
</test>
</tests>
<help><![CDATA[
**What it does**
Describe what the tool does.
**Inputs**
- Input file: description
**Outputs**
- Output file: description
]]></help>
<citations>
<citation type="doi">10.1234/example.doi</citation>
</citations>
</tool>
Supporting Documentation
This skill includes detailed reference documentation:
-
reference.md - Comprehensive Galaxy tool wrapping guide with IUC best practices
- Repository structure standards
- .shed.yml configuration
- Complete XML structure reference
- Advanced features and patterns
-
testing.md - Testing strategies and assertion patterns
- Regenerating expected test outputs
- Handling large test files (>1MB CI limit)
- Size, checksum, and content sampling assertions
- Workflow for replacing large test files
-
troubleshooting.md - Practical troubleshooting guide
- Reading tool_test_output.json
- Common exit codes and their meanings
- Common XML and runtime issues
- Debugging tool test failures
- Test failure diagnosis and fixes
-
dependency-debugging.md - Dependency conflict resolution
- Using
planemo mullfor diagnosis - Conda solver error interpretation
- macOS testing considerations
- Version conflict workflows
- Using
These files provide deep technical details that complement the core concepts above.
Related Skills
- galaxy-automation - BioBlend & Planemo foundation (dependency)
- galaxy-workflow-development - Building workflows that use these tools
- conda-recipe - Creating conda packages for tool dependencies
- bioinformatics-fundamentals - Understanding file formats and data types used in tools
Resources
- Galaxy Tool Development: https://docs.galaxyproject.org/en/latest/dev/
- Planemo Documentation: https://planemo.readthedocs.io/
- IUC Standards: https://galaxy-iuc-standards.readthedocs.io/
- Galaxy Training: https://training.galaxyproject.org/