Innovative Recommendation Applications Using Two Tower Embeddings at Uber (www.uber.com)

🤖 AI Summary
Uber’s ML team announced the production rollout of Two-Tower Embeddings (TTE) in Michelangelo to power recommendations—starting with Eats homefeed—after about a year of development. The move replaces a brittle, city-wise DeepMF approach (thousands of Spark jobs) with a single, scalable TTE pipeline that yielded major performance and maintenance wins. Practically, Uber now precomputes store/item embeddings offline and serves eater/query embeddings in real time, using Approximate Nearest Neighbor search via Uber’s SIA inverted index for fast retrieval. This reduced expensive O(q * M) real-time DL inference to O(M) offline + O(q) realtime model calls and fast dot-product ANN lookups, enabling personalized retrieval in ~hundreds of milliseconds. Technically, the model uses a query tower (user/context) and an item tower (store/item/geo), trained jointly on engagement labels with similarity metrics (dot product/cosine) and in-batch negatives to optimize recall@large_N. Key design choices include spatial indexing to respect geographic constraints, localization (0/1-hop) to allow pre-sampling and disk-backed features instead of in-memory graph processing (avoiding GNN complexity), and straightforward extensibility for numeric and NLP features. Beyond retrieval, TTE can act as a lightweight final ranker and generate transferrable embeddings for downstream tasks—accelerating reuse and opening broader productization across Eats, Groceries, Maps and more.
Loading comments...
loading comments...