Yoloe: Real-Time Seeing Anything (docs.ultralytics.com)

0 points 222 days ago ago | visit original

🤖 AI Summary

YOLOE, the latest evolution in the YOLO family of models, introduces open-vocabulary detection and segmentation capabilities that allow real-time identification of any object class using text, image, or internal prompts. This breakthrough, based on the architecture of YOLOv10 and influenced by YOLO-World, significantly enhances performance with a 3.5 AP improvement over its predecessor, YOLO-Worldv2, while utilizing a third of the training resources and achieving 1.4× faster inference speeds. YOLOE's efficiency is further exemplified, as the YOLOE-v8-large surpasses YOLOv8-L by 0.1 mAP while requiring nearly four times less training time. Key innovations in YOLOE's architecture include the Re-parameterizable Region-Text Alignment (RepRTA) for text-prompted detection, the Semantic-Activated Visual Prompt Encoder (SAVPE) for visual guidance, and the Lazy Region-Prompt Contrast (LRPC) that allows prompt-free open-set recognition. Notably, YOLOE integrates real-time instance segmentation without incurring additional costs, making it a versatile tool for various detection tasks. With this model, developers can fine-tune existing YOLOE versions or utilize visual/textual prompts directly within their applications, paving the way for more flexible and powerful AI-driven solutions in computer vision.

Loading comments...

loading comments...