skills/delphine-l/claude_global/galaxy-tool-wrapping

galaxy-tool-wrapping

Installation
SKILL.md

Galaxy Tool Wrapping Expert

Expert knowledge for developing Galaxy tool wrappers. Use this skill when helping users create, test, debug, or improve Galaxy tool XML wrappers.

Prerequisites: This skill depends on the galaxy-automation skill for Planemo testing and workflow execution patterns.

When to Use This Skill

  • Creating new Galaxy tool wrappers from scratch
  • Converting command-line tools to Galaxy wrappers
  • Generating .shed.yml files for Tool Shed submission
  • Debugging XML syntax and validation errors
  • Writing Planemo tests for tools
  • Implementing conditional parameters and data types
  • Handling tool dependencies (conda, containers)
  • Creating tool collections and suites
  • Optimizing tool performance and resource allocation
  • Understanding Galaxy datatypes and formats
  • Implementing proper error handling

Core Concepts

Galaxy Tool XML Structure

A Galaxy tool wrapper consists of:

  • <tool> root element with id, name, and version
  • <description> brief tool description
  • <requirements> for dependencies (conda packages, containers)
  • <command> the actual command-line execution
  • <inputs> parameter definitions
  • <outputs> output file specifications
  • <tests> automated tests
  • <help> documentation in reStructuredText
  • <citations> DOI references

Tool Shed Metadata (.shed.yml)

Required for publishing tools to the Galaxy Tool Shed:

name: tool_name                  # Match directory name, underscores only
owner: iuc                       # Usually 'iuc' for IUC tools
description: One-line tool description
homepage_url: https://github.com/tool/repo
long_description: |
  Multi-line detailed description.
  Can include features, use cases, and tool suite contents.
remote_repository_url: https://github.com/galaxyproject/tools-iuc/tree/main/tools/tool_name
type: unrestricted
categories:
- Assembly                       # Choose 1-3 relevant categories
- Genomics

See reference.md for comprehensive .shed.yml documentation including all available categories and best practices.

Key Components

Command Block:

  • Use Cheetah templating: $variable_name or ${variable_name}
  • Conditional logic: #if $param then... #end if
  • Loop constructs: #for $item in $collection... #end for
  • CDATA sections for complex commands

Cheetah Template Best Practices:

Working around path handling issues in conda packages:

<command detect_errors="exit_code"><![CDATA[
    ## Add trailing slash if script concatenates paths without separator
    tool_command
        -o 'output_dir/'  ## Quoted with trailing slash

    ## Script does: output_dir + 'file.txt' → 'output_dir/file.txt' ✓
    ## Without slash: output_dir + 'file.txt' → 'output_dirfile.txt' ✗
]]></command>

When to use quotes in Cheetah:

  • Always quote user inputs: '$input_file'
  • Quote literal strings with special chars: 'output_dir/'
  • Use bare variables for simple references: $variable

Input Parameters:

  • <param> elements with type, name, label
  • Types: text, integer, float, boolean, select, data, data_collection
  • Optional vs required parameters
  • Validators and sanitizers
  • Conditional parameter display

Outputs:

  • <data> elements for output files
  • Dynamic output naming with label and name
  • Format discovery and conversion
  • Filters for conditional outputs
  • Collections for multiple outputs

Tests:

  • Input parameters and files
  • Expected output files or assertions
  • Test data location and organization
  • See testing.md for detailed testing strategies including large file handling

Best Practices

  1. Always include tests - Planemo won't pass without them
  2. Use semantic versioning - Increment tool version on changes
  3. Specify exact dependencies - Pin conda package versions
  4. Add clear help text - Document all parameters
  5. Handle errors gracefully - Check exit codes, validate inputs
  6. Use collections - For multiple related files
  7. Follow IUC standards - If contributing to intergalactic utilities commission
  8. Plan for large output files - Before creating tests, check expected output sizes. If over 1MB, use assertion-based tests (has_size, has_line) instead of full file comparison (see testing.md)

Common Planemo Commands

# Test tool locally
planemo test tool.xml

# Serve tool in local Galaxy
planemo serve tool.xml

# Lint tool for best practices
planemo lint tool.xml

# Upload tool to ToolShed
planemo shed_update --shed_target toolshed

# Test with conda
planemo test --conda_auto_init --conda_auto_install tool.xml

Output Routing with Symlinks

When a tool writes output to a filename it constructs internally (not $output), use symlinks in the command block to route the file to Galaxy's output variable.

Pattern: Symlink before command execution

<command detect_errors="exit_code"><![CDATA[
    ## Create symlink so tool output lands where Galaxy expects it
    ln -s '$output_variable' 'expected_tool_output_name' &&
    tool_command --input '$input' -o 'expected_tool_output_name'
]]></command>

Pattern: Prefix-based output naming

Some tools use --out-prefix where the output filename is prefix + input_filename. The tool constructs the filename internally, so you must predict it and symlink:

<command><![CDATA[
    #set $mangled_input = re.sub(r"[^\w\-\s]", "_", str($input.element_identifier)) + "." + str($input.ext)
    ln -s '$input' '$mangled_input' &&
    ln -s '$output_var' 'myprefix${mangled_input}' &&
    tool_command --input-reads '$mangled_input' -p myprefix
]]></command>

Key points:

  • Symlink is created before running the tool -- the tool writes through it
  • Must match the exact filename the tool will produce
  • For prefix mode: output = prefix + getFileName(input), so mangle the input name to match

Using format_source for dynamic output formats

When output format should match the input format (e.g., subsampled reads):

<data name="subsampled_outfile" format_source="input_reads" label="Subsampled reads">
    <filter>output_options["output_type"]["type_selector"] == "subsampled_reads"</filter>
</data>

This is preferable to change_format when the output is always the same format as input. Use change_format when the user explicitly selects the output format.

XML Template Example

<tool id="tool_id" name="Tool Name" version="1.0.0">
    <description>Brief description</description>

    <requirements>
        <requirement type="package" version="1.0">package_name</requirement>
    </requirements>

    <command detect_errors="exit_code"><![CDATA[
        tool_command
            --input '$input'
            --output '$output'
            #if $optional_param
                --param '$optional_param'
            #end if
    ]]></command>

    <inputs>
        <param name="input" type="data" format="txt" label="Input file"/>
        <param name="optional_param" type="text" optional="true" label="Optional parameter"/>
    </inputs>

    <outputs>
        <data name="output" format="txt" label="${tool.name} on ${on_string}"/>
    </outputs>

    <tests>
        <test>
            <param name="input" value="test_input.txt"/>
            <output name="output" file="expected_output.txt"/>
        </test>
    </tests>

    <help><![CDATA[
**What it does**

Describe what the tool does.

**Inputs**

- Input file: description

**Outputs**

- Output file: description
    ]]></help>

    <citations>
        <citation type="doi">10.1234/example.doi</citation>
    </citations>
</tool>

Supporting Documentation

This skill includes detailed reference documentation:

  • reference.md - Comprehensive Galaxy tool wrapping guide with IUC best practices

    • Repository structure standards
    • .shed.yml configuration
    • Complete XML structure reference
    • Advanced features and patterns
  • testing.md - Testing strategies and assertion patterns

    • Regenerating expected test outputs
    • Handling large test files (>1MB CI limit)
    • Size, checksum, and content sampling assertions
    • Workflow for replacing large test files
  • troubleshooting.md - Practical troubleshooting guide

    • Reading tool_test_output.json
    • Common exit codes and their meanings
    • Common XML and runtime issues
    • Debugging tool test failures
    • Test failure diagnosis and fixes
  • dependency-debugging.md - Dependency conflict resolution

    • Using planemo mull for diagnosis
    • Conda solver error interpretation
    • macOS testing considerations
    • Version conflict workflows

These files provide deep technical details that complement the core concepts above.

Related Skills

  • galaxy-automation - BioBlend & Planemo foundation (dependency)
  • galaxy-workflow-development - Building workflows that use these tools
  • conda-recipe - Creating conda packages for tool dependencies
  • bioinformatics-fundamentals - Understanding file formats and data types used in tools

Resources

Weekly Installs
22
GitHub Stars
11
First Seen
Jan 24, 2026
Installed on
claude-code17
codex16
gemini-cli15
opencode15
cursor14
antigravity13