🤖 AI Summary
Researchers introduced Labeling Copilot, a “deep research agent” that automates large-scale, domain-specific dataset curation for computer vision by orchestrating a multimodal large language model and specialized tools. The agent handles discovery, synthesis, and annotation in a single pipeline: Calibrated Discovery finds in-distribution examples from massive unlabeled pools; Controllable Synthesis generates filtered rare-case data; and Consensus Annotation coordinates multiple foundation models with a novel consensus scheme (non-maximum suppression plus voting) to produce accurate labels. Framed as the first end-to-end data curation agent, it targets the main bottleneck in deploying robust vision systems—balancing quality, diversity, and cost when mining web-scale data.
The paper reports large-scale validations showing concrete gains: Consensus Annotation yields an average of 14.2 candidate proposals per image on dense COCO (vs. 7.4 ground-truth objects) and achieves a final annotation mAP of 37.1%. On Open Images, the system discovered 903 new bounding-box categories, expanding labeled capacity to over 1,500 classes. Calibrated Discovery—tested at 10M-sample scale—uses an active learning strategy that is up to 40× more computationally efficient than comparable methods. Taken together, these results demonstrate that an agentic, multi-step workflow can scale dataset curation, improve rare-class coverage, and materially reduce compute and annotation costs for industrial-scale vision pipelines.
Loading comments...
login to comment
loading comments...
no comments yet