skills/mckruz/comfyui-expert/comfyui-lora-training

comfyui-lora-training

SKILL.md

ComfyUI LoRA Training

Guide the user through dataset preparation, training configuration, and evaluation for character LoRAs.

When to Train vs Zero-Shot

Scenario Recommendation
Need absolute consistency across many images Train LoRA
Building a character series or ongoing project Train LoRA
Quick one-off generation Use zero-shot (InstantID/PuLID)
Limited references (1-5 images) Use zero-shot
Testing concepts Use zero-shot first, train if committing

Training Pipeline

1. DATASET PREP
   |-- Collect/generate 15-30 reference images
   |-- Preprocess (crop, resize, diversify styles)
   |-- Caption with trigger word + descriptions
   |
2. CONFIGURE TRAINING
   |-- Select training tool (Kohya/AI-Toolkit/FluxGym)
   |-- Set hyperparameters based on model type
   |-- Configure checkpointing
   |
3. TRAIN
   |-- Monitor loss curve
   |-- Save checkpoints every 250-500 steps
   |
4. EVALUATE
   |-- Test each checkpoint with identical prompts
   |-- Check identity accuracy, flexibility, overfitting
   |-- Select best checkpoint
   |
5. INTEGRATE
   |-- Copy to ComfyUI models/loras/
   |-- Update character profile with trigger word + strength
   |-- Test in full workflow (LoRA + identity method)

Dataset Preparation

Image Requirements

Aspect Minimum Optimal Maximum
Count 10-15 20-30 50+
Resolution 512x512 1024x1024 -
Format PNG/high JPEG PNG -

Content Diversity Checklist

  • Multiple angles (front, 3/4, profile, back)
  • Various expressions (neutral, smile, serious, laugh, etc.)
  • Different lighting conditions (studio, natural, dramatic)
  • Varied backgrounds (or transparent/solid)
  • Multiple outfits/contexts
  • Some close-ups, some medium shots
  • If from 3D renders: include style variations (see below)

Preprocessing 3D Renders

Problem: Training directly on 3D renders bakes in the "3D" aesthetic.

Solution: Generate style variations first:

  1. Run each render through img2img with varied style prompts
  2. Mix: 60% style variations, 40% original renders
  3. This teaches identity, not style

Style prompts for variation:

"photorealistic portrait, dslr photo"
"oil painting portrait"
"digital illustration"
"pencil sketch"
"watercolor portrait"

Captioning Rules

Trigger word: ALWAYS use a unique token as first word.

  • Good: sage_character, ohwx_sage, sks_person
  • Bad: woman, redhead, character (too generic)

Caption structure:

{trigger}, {subject type}, {clothing}, {pose}, {setting}, {lighting}, {style}

DO NOT describe face features (let the model learn them):

  • Bad: "woman with green eyes, freckles, auburn hair, defined cheekbones"
  • Good: "sage_character, woman, indoor portrait, wearing blue sweater"

DO describe everything else: clothing, pose, background, lighting, expression.

Folder Structure

dataset/{character_name}/{repeats}_{trigger_word}/
  001.png + 001.txt
  002.png + 002.txt
  ...

Folder naming: 10_sage_character = each image repeated 10x per epoch.

Training Configurations

FLUX LoRA (AI-Toolkit) - Recommended

network:
  type: lora
  linear: 16              # Rank (16-32 for characters)
  linear_alpha: 16         # Alpha = rank for FLUX

train:
  batch_size: 1
  gradient_accumulation_steps: 4
  steps: 1500              # FLUX converges faster
  lr: 4e-4                 # Higher than SDXL
  optimizer: adamw8bit
  dtype: bf16

datasets:
  - resolution: [1024]
    caption_ext: "txt"

sample:
  sample_every: 250
  prompts:
    - "{trigger}, photorealistic portrait"

FLUX training notes:

  • Converges 2-3x faster than SDXL
  • 1000-2000 steps usually sufficient
  • Watch for overfitting (quality plateaus early)
  • 24GB VRAM for standard, 9GB with NF4 quantization (SimpleTuner)

SDXL LoRA (Kohya_ss) - Proven

pretrained_model: "RealVisXL_V5.0.safetensors"
network_dim: 32            # Rank (16-64)
network_alpha: 16          # Usually dim/2
resolution: "1024,1024"
train_batch_size: 1
gradient_accumulation_steps: 4
learning_rate: 0.0001      # 1e-4
lr_scheduler: "cosine_with_restarts"
lr_scheduler_num_cycles: 3
max_train_epochs: 10
optimizer_type: "AdamW8bit"
mixed_precision: "bf16"
enable_bucket: true
min_snr_gamma: 5

Step calculation:

total_steps = (images x repeats x epochs) / batch_size
Target: 1500-3000 steps for SDXL
Example: 20 images x 10 repeats x 5 epochs / 1 = 1000 steps

Low VRAM Training (FluxGym / SimpleTuner)

For 12-16GB VRAM:

use_8bit_adam: true
gradient_checkpointing: true
cache_latents_to_disk: true
max_data_loader_n_workers: 0
train_batch_size: 1
gradient_accumulation_steps: 8
quantize_base_model: nf4    # SimpleTuner only

Evaluation Protocol

Test Each Checkpoint

Use identical prompts across all checkpoints:

Prompt 1: "{trigger}, photorealistic portrait, neutral expression"
Prompt 2: "{trigger}, photorealistic portrait, smiling, outdoor"
Prompt 3: "{trigger}, wearing formal suit, standing, office"
Prompt 4: "a person standing in a park"  (WITHOUT trigger - should NOT produce character)

Quality Indicators

Good training:

  • Character recognizable from trigger word alone
  • Responds to different prompts/contexts
  • Doesn't always produce same pose/expression
  • Prompt 4 does NOT produce the character

Overfitting signs:

  • Same exact pose/expression regardless of prompt
  • Training backgrounds appearing in outputs
  • Ignores clothing/setting prompts
  • Prompt 4 produces the character (too strong)

Best Epoch Selection

If using sample_every: 250 with 1500 steps:

  • Checkpoint 250: Usually underfit
  • Checkpoint 500-750: Often sweet spot for FLUX
  • Checkpoint 1000-1500: May be overfitting

Compare visually and select the checkpoint with best identity + prompt flexibility balance.

Post-Training Integration

  1. Copy best checkpoint to {ComfyUI}/models/loras/
  2. Update character profile:
    lora:
      trained: true
      model_file: "sage_character_flux.safetensors"
      trigger_word: "sage_character"
      best_strength: 0.8
    
  3. Test in full workflow: LoRA (0.7-0.9) + PuLID/IP-Adapter (0.5-0.7)
  4. Record successful settings in character's generation_history

Combining LoRA with Zero-Shot Methods

Best practice: LoRA as base identity, zero-shot for enhancement.

[Load Checkpoint] → [Load LoRA (0.7-0.9)] → [Apply PuLID/IP-Adapter (0.5-0.7)] → [Generate]

Lower weights on both prevents conflict while reinforcing identity.

Troubleshooting

Issue Solution
LoRA not activating Check trigger word spelling, ensure loaded before KSampler
Identity drift at angles Add more angle variety to dataset, reduce network_dim
Overfitting Reduce epochs, increase dataset, lower network_dim
Style contamination Better caption diversity, don't describe style in captions
Poor quality/artifacts Check training images for compression, reduce LR

Reference

  • references/lora-training.md - Full parameter reference
  • references/models.md - Training tool download links
  • Character profiles in projects/ for trigger words and reference images
Weekly Installs
10
GitHub Stars
19
First Seen
11 days ago
Installed on
gemini-cli9
github-copilot9
codex9
kimi-cli9
cursor9
amp9