Amazon Nova Multimodal Embeddings (aws.amazon.com)

0 points 2 days ago ago | visit original

🤖 AI Summary

Amazon announced Nova Multimodal Embeddings, a unified embedding model on Amazon Bedrock that converts text, documents, images, video and audio into a single semantic space to power cross‑modal retrieval, agentic RAG, and semantic search. Unlike modality‑specific encoders, Nova lets teams index and query mixed‑media content (e.g., product images, brochures with interleaved text/graphics, or audio‑video recordings) with leading out‑of‑the‑box accuracy, reducing the need to stitch together separate pipelines or custom fusion layers. Technically, Nova supports up to 8K token context, text in ~200 languages, synchronous and asynchronous APIs (required for videos >25MB), and segmentation/chunking for long text, audio or video (examples show 15s video segments). It offers four embedding output dimensions trained with Matryoshka Representation Learning (MRL) to trade off latency vs. accuracy (example dimension: 3072), and embeddingPurpose presets (GENERIC_INDEX, DOCUMENT_RETRIEVAL) to optimize indexing vs. query embeddings. The model integrates with AWS primitives like S3 Vectors for scalable vector storage and Bedrock runtime for invocation, making it practical to build multimodal RAG systems that retrieve related content across modalities and feed it to downstream LLM agents. For ML teams, Nova simplifies crossmodal search and retrieval at scale while preserving fine‑grained temporal/structural segmentation for long‑form assets.

Loading comments...

loading comments...