🤖 AI Summary
A new benchmark evaluates image-guided generative-editing models on one-shot transfers that must combine a “fit pic” (the subject/garment) with a “query” flat‑lay pose while preserving identity-defining details—text, logos, printed graphics and complex patterns—and using a templated text prompt. Tasks span graphic reconstruction, pattern reconstruction, small-segment enhancement and multi-image fusion; all models were tested under controlled conditions (same prompt, best-of-3 generations, 1024×1024 JPEG, square crops, identical test images). Nano Banana Pro led with 8/12 passing cases (Nano Banana 7/12), followed by GPT Image‑1 (4), Seedream 4 (3) and Qwen (2). The benchmark biases toward consistency over isolated best outputs to stress true one-shot performance.
Technically, the Nano Banana family excels at preserving logos, patterns and pose transfer in single-shot edits, showing strong color fidelity and clean background transfer; failure modes include minor edge sharpening, placement/scale errors and multi-angle text reconstruction problems. GPT Image‑1 showed strength in multi-image alignment but weaker graphic/text fidelity, suggesting it may benefit from few-shot inputs. Key implications: current top models are production‑viable for one-shot e‑commerce edits but trade quality for cost/latency (Nano Banana Pro’s advantage likely offset by higher inference cost), and robust few‑shot/multi‑image handling remains the biggest unmet need. Future work should expand few‑shot and multi‑image tests to better reflect real app scenarios.
Loading comments...
login to comment
loading comments...
no comments yet