ComfyUI LoRA Training

Guide the user through dataset preparation, training configuration, and evaluation for character LoRAs.

When to Train vs Zero-Shot

Scenario	Recommendation
Need absolute consistency across many images	Train LoRA
Building a character series or ongoing project	Train LoRA
Quick one-off generation	Use zero-shot (InstantID/PuLID)
Limited references (1-5 images)	Use zero-shot
Testing concepts	Use zero-shot first, train if committing

Training Pipeline

1. DATASET PREP
   |-- Collect/generate 15-30 reference images
   |-- Preprocess (crop, resize, diversify styles)
   |-- Caption with trigger word + descriptions
   |
2. CONFIGURE TRAINING
   |-- Select training tool (Kohya/AI-Toolkit/FluxGym)
   |-- Set hyperparameters based on model type
   |-- Configure checkpointing
   |
3. TRAIN
   |-- Monitor loss curve
   |-- Save checkpoints every 250-500 steps
   |
4. EVALUATE
   |-- Test each checkpoint with identical prompts
   |-- Check identity accuracy, flexibility, overfitting
   |-- Select best checkpoint
   |
5. INTEGRATE
   |-- Copy to ComfyUI models/loras/
   |-- Update character profile with trigger word + strength
   |-- Test in full workflow (LoRA + identity method)

Dataset Preparation

Image Requirements

Aspect	Minimum	Optimal	Maximum
Count	10-15	20-30	50+
Resolution	512x512	1024x1024	-
Format	PNG/high JPEG	PNG	-

Content Diversity Checklist

Multiple angles (front, 3/4, profile, back)
Various expressions (neutral, smile, serious, laugh, etc.)
Different lighting conditions (studio, natural, dramatic)
Varied backgrounds (or transparent/solid)
Multiple outfits/contexts
Some close-ups, some medium shots
If from 3D renders: include style variations (see below)

Preprocessing 3D Renders

Problem: Training directly on 3D renders bakes in the "3D" aesthetic.

Solution: Generate style variations first:

Run each render through img2img with varied style prompts
Mix: 60% style variations, 40% original renders
This teaches identity, not style

Style prompts for variation:

"photorealistic portrait, dslr photo"
"oil painting portrait"
"digital illustration"
"pencil sketch"
"watercolor portrait"

Captioning Rules

Trigger word: ALWAYS use a unique token as first word.

Good: sage_character, ohwx_sage, sks_person
Bad: woman, redhead, character (too generic)

Caption structure:

{trigger}, {subject type}, {clothing}, {pose}, {setting}, {lighting}, {style}

DO NOT describe face features (let the model learn them):

Bad: "woman with green eyes, freckles, auburn hair, defined cheekbones"
Good: "sage_character, woman, indoor portrait, wearing blue sweater"

DO describe everything else: clothing, pose, background, lighting, expression.

Folder Structure

dataset/{character_name}/{repeats}_{trigger_word}/
  001.png + 001.txt
  002.png + 002.txt
  ...

Folder naming: 10_sage_character = each image repeated 10x per epoch.

Training Configurations

FLUX LoRA (AI-Toolkit) - Recommended

network:
  type: lora
  linear: 16              # Rank (16-32 for characters)
  linear_alpha: 16         # Alpha = rank for FLUX

train:
  batch_size: 1
  gradient_accumulation_steps: 4
  steps: 1500              # FLUX converges faster
  lr: 4e-4                 # Higher than SDXL
  optimizer: adamw8bit
  dtype: bf16

datasets:
  - resolution: [1024]
    caption_ext: "txt"

sample:
  sample_every: 250
  prompts:
    - "{trigger}, photorealistic portrait"

FLUX training notes:

Converges 2-3x faster than SDXL
1000-2000 steps usually sufficient
Watch for overfitting (quality plateaus early)
24GB VRAM for standard, 9GB with NF4 quantization (SimpleTuner)

SDXL LoRA (Kohya_ss) - Proven

pretrained_model: "RealVisXL_V5.0.safetensors"
network_dim: 32            # Rank (16-64)
network_alpha: 16          # Usually dim/2
resolution: "1024,1024"
train_batch_size: 1
gradient_accumulation_steps: 4
learning_rate: 0.0001      # 1e-4
lr_scheduler: "cosine_with_restarts"
lr_scheduler_num_cycles: 3
max_train_epochs: 10
optimizer_type: "AdamW8bit"
mixed_precision: "bf16"
enable_bucket: true
min_snr_gamma: 5

Step calculation:

total_steps = (images x repeats x epochs) / batch_size
Target: 1500-3000 steps for SDXL
Example: 20 images x 10 repeats x 5 epochs / 1 = 1000 steps

Low VRAM Training (FluxGym / SimpleTuner)

For 12-16GB VRAM:

use_8bit_adam: true
gradient_checkpointing: true
cache_latents_to_disk: true
max_data_loader_n_workers: 0
train_batch_size: 1
gradient_accumulation_steps: 8
quantize_base_model: nf4    # SimpleTuner only

Evaluation Protocol

Test Each Checkpoint

Use identical prompts across all checkpoints:

Prompt 1: "{trigger}, photorealistic portrait, neutral expression"
Prompt 2: "{trigger}, photorealistic portrait, smiling, outdoor"
Prompt 3: "{trigger}, wearing formal suit, standing, office"
Prompt 4: "a person standing in a park"  (WITHOUT trigger - should NOT produce character)

Quality Indicators

Good training:

Character recognizable from trigger word alone
Responds to different prompts/contexts
Doesn't always produce same pose/expression
Prompt 4 does NOT produce the character

Overfitting signs:

Same exact pose/expression regardless of prompt
Training backgrounds appearing in outputs
Ignores clothing/setting prompts
Prompt 4 produces the character (too strong)

Best Epoch Selection

If using sample_every: 250 with 1500 steps:

Checkpoint 250: Usually underfit
Checkpoint 500-750: Often sweet spot for FLUX
Checkpoint 1000-1500: May be overfitting

Compare visually and select the checkpoint with best identity + prompt flexibility balance.

Post-Training Integration

Copy best checkpoint to {ComfyUI}/models/loras/

Update character profile:

lora:
  trained: true
  model_file: "sage_character_flux.safetensors"
  trigger_word: "sage_character"
  best_strength: 0.8

Test in full workflow: LoRA (0.7-0.9) + PuLID/IP-Adapter (0.5-0.7)
Record successful settings in character's generation_history

Combining LoRA with Zero-Shot Methods

Best practice: LoRA as base identity, zero-shot for enhancement.

[Load Checkpoint] → [Load LoRA (0.7-0.9)] → [Apply PuLID/IP-Adapter (0.5-0.7)] → [Generate]

Lower weights on both prevents conflict while reinforcing identity.

Troubleshooting

Issue	Solution
LoRA not activating	Check trigger word spelling, ensure loaded before KSampler
Identity drift at angles	Add more angle variety to dataset, reduce network_dim
Overfitting	Reduce epochs, increase dataset, lower network_dim
Style contamination	Better caption diversity, don't describe style in captions
Poor quality/artifacts	Check training images for compression, reduce LR

Reference

references/lora-training.md - Full parameter reference
references/models.md - Training tool download links
Character profiles in projects/ for trigger words and reference images

comfyui-lora-training