EmbeddingGemma: Architecture and Recipe (developers.googleblog.com)

🤖 AI Summary
EmbeddingGemma is a new embedding model built by adapting a pretrained 300M-parameter Gemma 3 into an encoder-focused model (via the T5Gemma conversion) and initializing the embedding model from that encoder. It’s exposed as a SentenceTransformers-style pipeline (max seq length 2048): an encoder-only transformer producing 768-d token vectors with bidirectional attention, mean pooling to a fixed vector, a two-stage linear projection (768 → 3072 → 768), and final Euclidean normalization (cheaper than RMSNorm). The result is an off-the-shelf embedding generator for search, retrieval-augmented generation (RAG), and semantic similarity. Technically, EmbeddingGemma is trained with a three-part loss: an NCE-style contrastive loss with hard negatives to teach fine-grained similarity distinctions; a dispersion/regularizer to spread embeddings across space (improving robustness to quantization and ANN search); and L2 distillation from a larger Gemini Embedding teacher to transfer capability. It also uses MRL (multi-resolution learning): losses are applied not only to the full 768-d vector but to nested prefixes (512, 256, 128) so truncated embeddings retain high quality. That makes a single model serve multiple latency/size trade-offs, inheriting world knowledge from the base Gemma without full retraining—beneficial for efficient, quantization-friendly production retrieval and on-device scenarios.
Loading comments...
loading comments...