Apple: Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing (www.arxiv.org)

🤖 AI Summary
Apple released Pico-Banana-400K, a large-scale, shareable dataset of roughly 400K text-guided image-edit pairs built from real OpenImages photos to accelerate instruction-based image editing research. The dataset was generated with Nano‑Banana (Gemini-family diffusion editor) and quality-filtered using Gemini‑2.5‑Pro as an automated judge, with manual curation and retries for failed attempts; production cost is reported at ≈$100K. Pico‑Banana‑400K addresses a major bottleneck—lack of high-quality, diverse, real-image editing data—by providing a systematically curated resource that reduces domain shift and supports reproducible training and benchmarking for the community. Technically, the collection comprises ~258K single‑turn supervised fine‑tuning examples, 56K preference pairs for alignment/reward-model training, and 72K multi‑turn sequences (2–5 edits) for iterative/planful editing research, organized into a 35‑type taxonomy across eight categories (pixel/photometric, object semantic, scene composition, stylistic, text/symbol, human‑centric, scale, spatial). Each example includes dual instruction formats (detailed training prompts from Gemini‑2.5‑Flash and concise user‑style rewrites via Qwen2.5‑7B‑Instruct), and automated multi‑dimensional scoring (instruction compliance, edit quality, content preservation, technical quality). By also preserving negative/failed edits, Pico‑Banana‑400K enables robustness testing, preference learning (e.g., DPO), instruction rewriting, and multi‑turn evaluation—making it a pragmatic foundation for next‑generation multimodal editing models.
Loading comments...
loading comments...