Pico-Banana-400k (github.com)

0 points 273 days ago ago | visit original

🤖 AI Summary

Pico-Banana-400K is a new large-scale dataset of roughly 400K text–image–edit triplets built to accelerate research in text-guided image editing. Each sample pairs an Open Images source image with a concise, human-like edit instruction (generated by Gemini-2.5-Flash) and an edited result produced by the Nano-Banana model. The collection covers 35 edit operations across eight semantic categories (object-level, scene composition, human-centric, stylistic, text & symbol, pixel/photometric, scale & perspective, spatial/layout) and image resolutions from 512–1024 px. Data splits include ~257K single-turn successful SFT examples, ~72K successful multi-turn examples, and ~56K “failure” cases retained for preference/reward learning. The edits were filtered by an automatic quality pipeline using Gemini-2.5-Pro that scores Instruction Compliance (40%), Editing Realism (25%), Preservation Balance (20%) and Technical Quality (15%), with a ~0.7 pass threshold. The dataset’s scale, semantic breadth and structured quality control make it useful for supervised fine-tuning, preference-model training, and multi-turn conversational editing research. Retaining failed edits as negative examples supports robust preference and reward modeling, while the Gemini→Nano-Banana pipeline exemplifies a reproducible multimodal generation+evaluation workflow researchers can build on. Hosted on Apple’s CDN and released under CC BY-NC-ND 4.0 (source images under Open Images CC BY 2.0), Pico-Banana-400K provides downloadable manifests and mapping tools to access original imagery under license constraints.

Loading comments...

loading comments...