architecture-design
Architecture Design - ML Project Template
This skill defines the standard code architecture for machine learning projects based on the template structure. When modifying or extending code, follow these patterns to maintain consistency.
Overview
The project follows a modular, extensible architecture with clear separation of concerns. Each module (data, model, trainer, analysis) is independently organized using factory and registry patterns for maximum flexibility.
When to Use
Use this skill when:
- Creating a new Dataset class that needs
@register_dataset - Creating a new Model class that needs
@register_model - Creating a new module directory with
__init__.pyfactory wiring - Initializing a new ML project structure from scratch
- Adding new component types such as Augmentation, CollateFunction, or Metrics
When Not to Use
Do not use this skill when:
- Modifying existing functions or methods
- Fixing bugs in existing code
- Adding helper functions or utilities
- Refactoring without adding new registrable components
- Making simple code changes to a single file
- Modifying configuration files
- Reading or understanding existing code
Key indicator: if the task does not require a @register_* decorator or a Factory pattern, skip this skill.
Core Design Patterns
Factory Pattern
Each module uses a factory to create instances dynamically:
# Example from data_module/dataset/__init__.py
DATASET_FACTORY: Dict = {}
def DatasetFactory(data_name: str):
dataset = DATASET_FACTORY.get(data_name, None)
if dataset is None:
print(f"{data_name} dataset is not implementation, use simple dataset")
dataset = DATASET_FACTORY.get('simple')
return dataset
For detailed guidance, refer to references/factory_pattern.md.
Registry Pattern
Components register themselves via decorators:
# Example from data_module/dataset/simple_dataset.py
@register_dataset("simple")
class SimpleDataset(Dataset):
def __init__(self, data):
self.data = data
For detailed guidance, refer to references/registry_pattern.md.
Auto-Import Pattern
Modules automatically discover and import submodules:
# Example from data_module/dataset/__init__.py
models_dir = os.path.dirname(__file__)
import_modules(models_dir, "src.data_module.dataset")
For detailed guidance, refer to references/auto_import.md.
Directory Structure
project/
├── run/
│ ├── pipeline/ # Main workflow scripts
│ │ ├── training/ # Training pipelines
│ │ ├── prepare_data/ # Data preparation pipelines
│ │ └── analysis/ # Analysis pipelines
│ └── conf/ # Hydra configuration files
│ ├── training/ # Training configs
│ ├── dataset/ # Dataset configs
│ ├── model/ # Model configs
│ ├── prepare_data/ # Data prep configs
│ └── analysis/ # Analysis configs
│
├── src/
│ ├── data_module/ # Data processing module
│ │ ├── dataset/ # Dataset implementations
│ │ ├── augmentation/ # Data augmentation
│ │ ├── collate_fn/ # Collate functions
│ │ ├── compute_metrics/ # Metrics computation
│ │ ├── prepare_data/ # Data preparation logic
│ │ ├── data_func/ # Data utility functions
│ │ └── utils.py # Module-specific utilities
│ │
│ ├── model_module/ # Model implementations
│ │ ├── brain_decoder/ # Brain decoder models
│ │ └── model/ # Alternative model location
│ │
│ ├── trainer_module/ # Training logic
│ ├── analysis_module/ # Analysis and evaluation
│ ├── llm/ # LLM-related code
│ └── utils/ # Shared utilities
│
├── data/
│ ├── raw/ # Original, immutable data
│ ├── processed/ # Cleaned, transformed data
│ └── external/ # Third-party data
│
├── outputs/
│ ├── logs/ # Training and evaluation logs
│ ├── checkpoints/ # Model checkpoints
│ ├── tables/ # Result tables
│ └── figures/ # Plots and visualizations
│
├── pyproject.toml # Project configuration
├── uv.lock # Dependency lock file
├── TODO.md # Task tracking
├── README.md # Project documentation
└── .gitignore # Git ignore rules
For detailed directory structure with file descriptions, refer to references/structure.md.
Module Organization
Creating a New Dataset
When adding a new dataset:
- Create file in
src/data_module/dataset/ - Use
@register_dataset("name")decorator - Inherit from
torch.utils.data.Dataset - Implement
__init__,__len__,__getitem__
from torch.utils.data import Dataset
from typing import Dict
import torch
from src.data_module.dataset import register_dataset
@register_dataset("custom")
class CustomDataset(Dataset):
def __init__(self, data):
self.data = data
def __len__(self):
return len(self.data)
def __getitem__(self, i: int) -> Dict[str, torch.Tensor]:
return self.data[i]
Creating a New Model
CRITICAL: Models use config-driven pattern
When adding a new model:
- Create file in
src/model_module/model/or appropriate module subdirectory - Use
@register_model('ModelName')decorator __init__accepts ONLYcfgparameter - all hyperparameters come from configforward()returns dict:{"loss": loss, "labels": labels, "logits": logits}- Handle training vs inference modes using
self.training
from src.model_module.brain_decoder import register_model
@register_model('MyModel')
class MyModel(nn.Module):
def __init__(self, cfg):
super().__init__()
self.cfg = cfg
self.task = cfg.dataset.task
# ALL parameters from cfg
self.hidden_dim = cfg.model.hidden_dim
self.output_dim = cfg.dataset.target_size[cfg.dataset.task]
def forward(self, x, labels=None, **kwargs):
if self.training:
# Training logic
pass
else:
# Inference logic
pass
return {"loss": loss, "labels": labels, "logits": logits}
Adding Data Augmentation
When adding augmentation:
- Create file in
src/data_module/augmentation/ - Implement transformation function
- Register with factory if needed
Code Style Guidelines
For comprehensive style guidelines, refer to references/code_style.md.
Key principles:
- Always use type hints for function signatures
- Follow import order: standard library → third-party → local
- Module
__init__.pyfiles contain factory/registry logic - Model classes must be config-driven
Configuration Management
The project uses Hydra for configuration management:
- Config files in
run/conf/organize by module - Each stage (training, analysis) has its own config structure
- Use YAML files for all configuration
When Working on This Project
Before Modifying Code
- Read the relevant module's factory/registry pattern
- Check existing implementations for consistency
- Follow the established directory structure
- Use registration decorators for new components
Adding New Features
- Determine which module the feature belongs to
- Check if similar functionality exists
- Follow factory/registry pattern if creating new component types
- Add configuration files if needed
- Update documentation
Code Review Checklist
- Uses factory/registry pattern appropriately
- Follows module directory structure
- Has proper type annotations
- Imports are correctly ordered
- Registration decorator is used
- Configuration files are added if needed
Additional Resources
Reference Files
For detailed information, consult:
references/structure.md- Detailed directory structure with file descriptionsreferences/factory_pattern.md- Factory pattern in-depth explanationreferences/registry_pattern.md- Registry pattern in-depth explanationreferences/auto_import.md- Auto-import pattern in-depth explanationreferences/code_style.md- Comprehensive code style guidelines
Example Files
Working examples in examples/:
examples/custom_dataset.py- Custom dataset implementationexamples/custom_model.py- Custom model implementationexamples/augmentation_example.py- Data augmentation exampleexamples/config_example.yaml- Configuration file exampleexamples/pipeline_example.sh- Pipeline script example
More from galaxy-dawn/claude-scholar
paper-self-review
This skill should be used when the user asks to "review paper quality", "check paper completeness", "validate paper structure", "self-review before submission", or mentions systematic paper quality checking. Provides comprehensive quality assurance checklist for academic papers.
186citation-verification
This skill provides reference guidance for citation verification in academic writing. Use when the user asks about "citation verification best practices", "how to verify references", "preventing fake citations", or needs guidance on citation accuracy. This skill supports ml-paper-writing by providing detailed verification principles and common error patterns.
164ui-ux-pro-max
This skill should be used when the user asks to design or review a UI, create a landing page or dashboard, choose colors or typography, improve accessibility, or implement polished frontend interfaces with a clear design system.
132ml-paper-writing
Write publication-ready ML/AI papers for NeurIPS, ICML, ICLR, ACL, AAAI, COLM. Use when drafting papers from research repos, conducting literature reviews, finding related work, verifying citations, or preparing camera-ready submissions. Includes LaTeX templates, citation verification workflows, and paper discovery/evaluation criteria.
127review-response
Systematic review response workflow from comment analysis to professional rebuttal writing. Use when the user asks to "write rebuttal", "respond to reviewers", "draft review response", or "analyze review comments". Improves paper acceptance rates.
127results-analysis
This skill should be used when the user asks to "analyze experimental results", "run strict statistical analysis", "compare model performance", "generate scientific figures", "check significance", "do ablation analysis", or mentions interpreting experiment data with rigorous statistics and visualization. It focuses on strict analysis bundles, not Results-section prose.
121