RefCOCO-M: a refreshed RefCOCO segementation dataset with better data (huggingface.co)

🤖 AI Summary
RefCOCO-M is a refreshed version of the RefCOCO-style referring-expression segmentation benchmark that rehomes and cleans RefCOCO annotations on COCO_train2014 images. The release bundles per-instance COCO-style metadata (image IDs, category/supercategory, bounding boxes), multiple referring sentences per instance, and pixel masks encoded with COCO RLE (counts + size). Sample entries show diverse object classes (people, cats, furniture, kitchenware) with precise bboxes and high-resolution RLE masks, indicating a focus on pixel-accurate supervision for referring segmentation tasks. Why this matters: noisy or inconsistent masks and mismatched text–mask pairs have long limited progress on referring segmentation and multimodal grounding. RefCOCO-M’s improvements—cleaner, more consistent segmentation masks, corrected boxes and instance IDs, and standardized annotation fields—reduce label noise and improve measurement fidelity for IoU/mIoU and grounding metrics. Practically, this makes training and benchmarking vision–language segmentation models more reliable and reproducible, supports pixel-level supervision for transformer- and CNN-based architectures, and eases integration with existing COCO/RefCOCO splits thanks to retained image IDs and formats. Researchers should see more stable evaluation curves and better transfer when using RefCOCO-M for referring expression understanding and related multimodal tasks.
Loading comments...
loading comments...