Lightweight, highly accurate line and paragraph detection (arxiv.org)

🤖 AI Summary
The authors propose a unified, two-level approach for document layout parsing that jointly detects text lines and paragraphs using a graph convolutional network (GCN). Starting from word-level text detection boxes, the model treats lines as clusters of boxes and paragraphs as clusters of lines, forming a two-level tree that captures document layout. A GCN predicts pairwise relations between detected text boxes; those relation scores are then used to assemble line clusters and aggregate lines into paragraphs. The pipeline is designed to be lightweight and efficient while maintaining high accuracy, and the paper reports state-of-the-art quality for paragraph detection on public benchmarks and varied real-world images. This work is significant because it moves beyond isolated line/paragraph heuristics or separate models by modelling hierarchical, relational structure directly with a GCN—improving robustness to layout variation and reducing pipeline complexity. Key technical implications include leveraging pairwise relation prediction for clustering, producing an explicit tree representation useful for downstream OCR, information extraction, and document understanding, and offering a practical, efficient module that can slot into end-to-end document analysis systems. The approach also suggests broader applicability of graph-based, hierarchical clustering for other structured layout tasks.
Loading comments...
loading comments...