Scaling Embeddings with Feast and KubeRay (feast.dev)

0 points 5 hours ago ago | visit original

🤖 AI Summary

Feast has deepened its Ray integration to make distributed embedding generation a first-class transformation, and added Kubernetes-ready deployment via KubeRay. The release includes a Ray RAG template that scaffolds a complete retrieval-augmented generation pipeline—parallel document processing, distributed embedding computation, and vector search—so teams can process millions of documents in production. This makes embedding generation transparent within Feast feature pipelines, turning one of the heaviest ML workloads into a managed, scalable step that supports low-latency retrieval for RAG systems. Technically, Feast leverages Ray Data and Ray workers to partition datasets, run a user-defined EmbeddingProcessor (example uses SentenceTransformer "all-MiniLM-L6-v2"), and map_batches with tunable params (e.g., max_workers=8, batch_size=2500). Feature definitions use BatchFeatureView with mode="ray" and can store vector-indexed embeddings for online retrieval; retrieval is exposed via FeatureStore calls that accept a query embedding and return top-k results. Deployment modes include local dev, remote Ray clusters (ray_address), and Kubernetes via KubeRay, with config knobs for storage_path and parallelism. Install with pip install feast[ray] and bootstrap via feast init -t ray_rag. The architecture separates Ray compute from the Ray offline store, enabling independent scaling of I/O and compute—practical for production ML infra where resource efficiency and low latency matter.

Loading comments...

loading comments...