Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing (arxiv.org)

🤖 AI Summary
Pico-Banana-400K is a newly released, large-scale dataset aimed at advancing text-guided image editing: it contains 400K real-image edit examples derived from OpenImages where a multimodal model (Nano-Banana) generated diverse edit pairs guided by natural-language instructions. The authors emphasize quality and diversity over prior synthetic collections by applying a fine-grained editing taxonomy, MLLM-based (multimodal LLM) quality scoring to enforce content preservation and instruction faithfulness, and careful curation. The release also includes three targeted subsets: a 72K multi-turn set for sequential editing and planning research, a 56K preference set for alignment and reward-model training, and paired long-short instruction examples to support instruction rewriting and summarization. For the AI/ML community this matters because it provides a large, real-image foundation for training and benchmarking next-generation text-to-edit models and alignment techniques—filling a gap left by smaller or fully synthetic datasets. Key technical implications include better supervised training for multi-step editing, stronger evaluation data for instruction-following and faithfulness metrics, and high-quality preference data for fine-tuning reward models. By combining scale, curated diversity, and multi-turn/ preference splits, Pico-Banana-400K lowers a practical barrier to developing models that robustly interpret and execute complex, iterative image-editing instructions.
Loading comments...
loading comments...