How to Train an LLM-RecSys Hybrid for Steerable Recs (eugeneyan.com)

🤖 AI Summary
Researchers built a “bilingual” LLM–recommender hybrid by extending an LLM’s vocabulary with semantic ID tokens (e.g., <|sid_0|>, <|sid_1|>), then continuing pretraining and finetuning on item metadata and user sequences so the model natively generates catalog IDs as well as natural language. The result can recommend catalog items from historical interactions, be steered via conversational prompts, explain choices and even name bundles—combining the catalog-awareness and precision of recommender systems with the steerability and reasoning of LLMs. The team demoed the idea on Amazon Video Games Reviews (137k→66k products after filtering; ~737k behavior records → ~78.6k user sequences, mean length 6.5), noting the prototype is small and finetuning-limited so prompting still matters. Technically, semantic IDs are produced using a Residual Quantized VAE (RQ‑VAE): item metadata embeddings (from Qwen3‑Embedding‑0.6B, 1,024‑d) are hierarchically quantized into a sequence of discrete tokens (one per codebook level), so similar items share common prefixes and form a tree-like space. Loss = recon + quantization (codebook + commitment with β). Practical issues included ~10% collisions with a 3-level, 256‑code design, fixed by adding a fourth level for uniqueness. They compared semantic‑ID SASRec baselines and finetuned Qwen3‑8B to accept semantic IDs directly. Tradeoffs: this approach enables steerable, explainable recs and unified search/chat/recs, but currently won’t match the raw precision of optimized multi-stage recsys without larger scale finetuning and infrastructure.
Loading comments...
loading comments...