
Apple has released Pico-Banana-400K, a massive, high-quality dataset of nearly 400,000 image editing examples. The new dataset, detailed in an academic paper posted on October 23, 2025, was built by Apple researchers including Yusu Qian, Jialing Tong, and Zhe Gan. This matters because the AI community has been held back by a lack of large-scale, open, and realistic datasets. Most previous datasets were either synthetic, low-quality, or built with proprietary models. Apple’s new resource, which is built from real photographs, is designed to be a robust foundation for training the next generation of text-guided image editing models, from simple touch-ups to complex, multi-step creative projects.
How Pico-Banana-400K was builtInstead of the old, expensive method of paying humans to manually edit hundreds of thousands of images, Apple’s team created a sophisticated, automated pipeline using other powerful AI models. . First, they sourced real photographs from the OpenImages collection. Then, they used Google’s Nano-Banana model to generate a diverse range of edits based on a comprehensive taxonomy of 35 different edit types, from “change color” to “apply seasonal transformation.”
But here’s the clever part: to ensure quality, they used another AI, Gemini-2.5-Pro, as an automated “judge.” This AI judge scored every single edit on four criteria: Instruction Compliance (40%), Seamlessness (25%), Preservation Balance (20%), and Technical Quality (15%). Edits that scored above a 0.7 threshold were labeled “successful.” Edits that failed were kept as “negative examples.” This process creates a high-quality dataset without a single human annotator, at a total cost of about $100,000.
More than just single editsThe real power of Pico-Banana-400K isn’t just its size; it’s the specialized subsets designed to solve complex research problems. The full dataset includes:
By analyzing the “success rates” of its own pipeline, the Apple team also created a clear map of what AI image editors are good at and where they still fail. Global edits like “add a vintage filter” (90% success) are easy. Object-level edits like “remove this car” (83% success) are pretty good. But edits requiring precise spatial control or symbolic understanding remain “brittle” and are now open problems for researchers to solve.
The hardest tasks? Relocating an object (59% success), changing a font (57% success), and generating caricatures (58% success). By open-sourcing this dataset, Apple is essentially giving the entire AI community a high-quality “gym” to train their models and a clear list of challenges to tackle next.