End to End Table Transformer

최윤영, 김태훈, 김남욱, 이태희, 조성호

학회/저널

International Conference on Document Analysis and Recognition (ICDAR)

년도

2024년

연구분야

Foundation Models

Abstract

Table extraction (TE) task in document images is important in deep learning for conveying structured information. TE was decomposed into three subtasks: table detection (TD), table structure recognition (TSR), and functional analysis (FA). Most of previous research focused on developing models specifically tailored to each of these tasks, leading to challenges in computational cost, model size, and performance limitations. Transformer-based object detection models are being successfully applied to TE subtasks, yet they face inherent challenges due to the one-to-one set matching approach for detecting objects. This approach assigns only a few queries as positive samples, diminishing the training efficacy of these samples and leading to a performance bottleneck. Therefore, prior research in the object detection field has introduced modifications to the Detection Transformer (DETR), adding additional queries and training schemes that improve performance. In this work, we introduce the End-to-end Table Transformer (ETT), a specialized transformer-based object detection model designed for high-performing TE from document images with single model. Our model comprises three key components: a backbone, the Deformable DETR (DDETR) model, and the novel layout analysis module with table layout loss. This layout analysis module leverages explicit relationships between table objects to enhance the table extraction task performance in multi tables in images. We conduct rigorous experiments to assess the efficacy of our proposed model against table extraction benchmark datasets, comparing it with other DETR variants, including vanilla DETR, DDETR, and H-DETR. Empirical evaluations highlight that our model efficiently secures state-of-the-art results in TE task.

논문보기