🤖 AI Summary
Roboflow released an open‑source pipeline that detects, tracks and identifies NBA players in game video by combining multiple state‑of‑the‑art vision models. The project addresses core challenges—motion blur, occlusion, near‑identical uniforms and moving cameras—by chaining RF‑DETR‑S for multi‑class detection (fine‑tuned on 10 classes), SAM2 for segmentation‑based tracking (prompted with RF‑DETR boxes and cleaned of disconnected mask artifacts), SigLIP embeddings + UMAP + K‑means for unsupervised team clustering, and OCR/number classification via a fine‑tuned SmolVLM2 and a ResNet‑32 classifier.
Key technical results and practicalities: player crops are center‑cropped and embedded with SigLIP to group teams without manual labels; SmolVLM2 fine‑tuned on a 3.6k jersey dataset rose from 56%→86% accuracy, while a ResNet‑32 classifier hit 93% on the same test set. Numbers are paired to SAM2 masks using Intersection over Smaller area (IoS ≥ 0.9), and identities are stabilized by sampling every 5 frames and requiring three consecutive identical predictions. The pipeline runs at ~1–2 FPS on an NVIDIA T4 (SAM2 is the main bottleneck). By open‑sourcing the code, the project provides a practical reference for sports analytics and a blueprint for combining detection, tracking, vision‑language embeddings and classic classifiers in hard, real‑world CV tasks.
Loading comments...
login to comment
loading comments...
no comments yet