🤖 AI Summary
Photoroom has open‑sourced PRX, a text‑to‑image model family and the full training pipeline, releasing weights and checkpoints in Hugging Face Diffusers under an Apache‑2.0 license. The release includes multiple variants (base, SFT, distilled) and VAEs at 256/512 px, plus a preview 1024‑px checkpoint. PRX is available as a ready pipeline (PRXPipeline) for immediate use and demoing, and the team is publishing a multi‑part blog series that documents architecture choices, acceleration tricks, and post‑training recipes to make the entire process reproducible, not just the final weights.
Technically, the 1024‑px preview is a 1.3B‑parameter model trained ~1.7M steps in under 10 days on 32 H200 GPUs using REPA with DINOv2 features, Flux VAE, and T5‑Gemma text embeddings. Photoroom explored architectures (DiT, UViT, MMDiT, DiT‑Air, and their PRX MMDiT‑like variant), and training/tools like REPA‑E, contrastive flow matching, TREAD, Uniform ROPE, Immiscible, and the Muon optimizer; post‑pretraining work includes LADD distillation, supervised fine‑tuning and DPO. For researchers and practitioners this matters because it provides an openly documented, efficient training recipe and benchmarked design space for high‑resolution diffusion models, accelerating reproducible research, community audits, and downstream adaptation/alignment experiments.
Loading comments...
login to comment
loading comments...
no comments yet