ST-Raptor requires no additional fine-tuning (github.com)

🤖 AI Summary
ST-Raptor is an open-source system for question answering over semi-structured tables (Excel, HTML, Markdown, CSV) that requires only an Excel-formatted table and a natural-language question—no additional fine-tuning. It couples a vision-language model (VLM) with a HO-Tree tree-construction algorithm to parse diverse layouts, plugs into different LLMs for reasoning, and uses a two-stage validation mechanism to improve answer reliability. The authors also release SSTQA, a curated benchmark (102 tables, 764 questions drawn from 2,031 real-world tables) plus a raw dataset of 2k+ tables covering scenarios like HR, finance, inventory, academic and application forms. On SSTQA and related benchmarks ST-Raptor leads prior methods: e.g., SSTQA accuracy 72.39% and ROUGE-L 52.19% versus GPT-4o (~62.12% / 43.86) and other VLM/LLM baselines. The reference model configuration uses Deepseek-V3 (LLM) + InternVL2.5 (VLM) + Multilingual-E5 embeddings, which the paper notes can need ~160 GB GPU memory but can be replaced with APIs or smaller models. Significance: ST-Raptor shows that robust table QA across heterogeneous, semi-structured layouts is achievable without costly task-specific fine-tuning, lowering deployment barriers for enterprise and research use cases and enabling more reliable extraction and reasoning from messy real-world tables.
Loading comments...
loading comments...