Show HN: OSS implementation of Test Time Diffusion that runs on a 24gb GPU (github.com)

0 points 8 hours ago ago | visit original

🤖 AI Summary

TTD-RAG is an open-source implementation of the "Test-Time Diffusion" research-agent framework (TTD-DR), submitted to the MMU-RAG competition. The system treats long-form report generation as an iterative denoising pipeline: it produces an initial “noisy” draft, generates targeted search queries from the draft’s gaps, retrieves and reranks web chunks, synthesizes answers, and progressively refines the draft. Component-wise self-evolution (generate variants, critique, merge) is used to improve planning and synthesis steps. The repo includes a FastAPI service, vLLM-based model serving for throughput/low latency, and Docker/Docker Compose deployment; it is explicitly designed to run on a single NVIDIA GPU with 24GB VRAM. Technically notable choices include using Qwen/Qwen3-4B-Instruct as the generator and tomaarsen/Qwen3-Reranker-0.6B-seq-cls for chunk reranking, the FineWeb Search API for retrieval, and SSE streaming for dynamic, real-time evaluation (plus a static /evaluate endpoint) to meet competition rules. The architecture emphasizes reducing hallucination and information loss by guiding retrieval from an evolving draft, which makes it well suited for multi-hop, evidence-heavy queries. The repo also provides local_test.py and ECR push instructions, making it turnkey for researchers who want a competition-compliant, open implementation of test-time diffusion that runs on commodity 24GB GPUs.

Loading comments...

loading comments...