🤖 AI Summary
Researchers introduce Generative Sign-description Prompts (GSP), a novel way to integrate generative LLMs into skeleton-based sign language recognition (SLR). GSP uses retrieval-augmented generation with domain-specific LLMs and expert-validated knowledge bases to produce multipart, redundancy‑reduced textual descriptions of signs. Those descriptions feed a Multi-positive Contrastive (MC) learning framework that aligns part-specific skeleton features with multiple text positives via a text-conditioned multi-positive alignment and hierarchical part contrastive loss. The pipeline—comprising a part-specific skeleton encoder, text encoder, and composite training objective—yields state-of-the-art accuracy on isolated benchmarks (97.1% on Chinese SLR-500 and 97.07% on Turkish AUTSL) while keeping inference entirely skeleton-based for real-time efficiency.
This approach is significant because it grounds skeleton representations in fine-grained, semantically rich text generated by LLMs, improving disambiguation of visually similar signs and enabling many-to-many mappings between motions and descriptions. Key implications: (1) LLM-generated prompts act as multiple positive samples in contrastive learning, boosting robustness and generalization across languages; (2) the LLM/text encoder expense is a one-time offline cost used only in training, preserving lightweight inference; (3) limitations include evaluation on isolated signs (not continuous SLR), sensitivity to pose-estimation quality, and limited cross-linguistic testing. Future work aims to extend to continuous sign recognition and robustness under real-world pose noise.
Loading comments...
login to comment
loading comments...
no comments yet