A Deep Research Agent for Curating Vision Datasets (arxiv.org)

0 points 2 hours ago ago | visit original

🤖 AI Summary

Researchers introduced Labeling Copilot, a “deep research agent” that automates large-scale, domain-specific dataset curation for computer vision by orchestrating a multimodal large language model and specialized tools. The agent handles discovery, synthesis, and annotation in a single pipeline: Calibrated Discovery finds in-distribution examples from massive unlabeled pools; Controllable Synthesis generates filtered rare-case data; and Consensus Annotation coordinates multiple foundation models with a novel consensus scheme (non-maximum suppression plus voting) to produce accurate labels. Framed as the first end-to-end data curation agent, it targets the main bottleneck in deploying robust vision systems—balancing quality, diversity, and cost when mining web-scale data. The paper reports large-scale validations showing concrete gains: Consensus Annotation yields an average of 14.2 candidate proposals per image on dense COCO (vs. 7.4 ground-truth objects) and achieves a final annotation mAP of 37.1%. On Open Images, the system discovered 903 new bounding-box categories, expanding labeled capacity to over 1,500 classes. Calibrated Discovery—tested at 10M-sample scale—uses an active learning strategy that is up to 40× more computationally efficient than comparable methods. Taken together, these results demonstrate that an agentic, multi-step workflow can scale dataset curation, improve rare-class coverage, and materially reduce compute and annotation costs for industrial-scale vision pipelines.

Loading comments...

loading comments...