Diffusion Finetuning Myself (vassi.life)

0 points 2 days ago ago | visit original

🤖 AI Summary

A hobby project documented the author’s experiments finetuning an open-source latent diffusion model (Flux.1-dev, ~12B) so it would reliably generate images of the author. Rather than training from scratch, they used existing tooling (SimpleTuner) and tried three finetuning strategies: DreamBooth/full finetune, LoRA, and another full finetune with lower LR. Data was small (3–50 face images), captions were auto-generated with BLIP and manually adjusted, and training ran on a single A100. Key runs: a DreamBooth run with an accidentally high LR (1e-3) severely overfit; a LoRA (rank 32, LR 1e-4, batch 8, 2k steps) trained overnight produced the best balance of likeness and generalization; a conservative full finetune (1e-7) produced little change. Generation used only ~20 decoding steps (suboptimal), causing artifacts. For the AI/ML community this is a compact, practical comparison of personalization strategies under realistic constraints (single GPU, small dataset). It highlights common pitfalls: catastrophic forgetting from full finetunes, strong sensitivity to learning rate, dataset and caption bias (ambient lighting or clothing getting memorized), and style-transfer limitations when finetuning only on realistic photos. Useful technical takeaways include favoring LoRA for lightweight personalization, interleaving base-model “grounding” images to regularize training, careful caption design, and that tooling (SimpleTuner/accelerate) can be brittle. The author plans to experiment with grounding datasets, combining LoRAs, and testing newer models (e.g., Qwen Image).

Loading comments...

loading comments...