Document Analysis
Abstract
This proposal addresses aspects of text recognition around multiple projects:
(1) Handwriting recognition (HTR) remains challenging due to each person's different writing style and language-related recognition difficulties. Several aspects of HTR will be explored. One path aims to design an efficient HTR model by extending the current state-of-the-art methods (Convolutional Recurrent Neural Network) with new methods of multi-task learning. Previously we focused on modern HTR in English+Portuguese, and historical HTR in Norwegian and Cipher manuscripts. We will continue our work on adding new regularization and extend our method beyond the line level to the page level.
A second path is to perform recognition with limited samples. To address the data limitation, we will use ML to generate large-scale training data. We aim to enhance the performance of architectures with a large number of parameters.
A third path will visualize hidden layers of neural networks for HTR. Using visualization techniques, we aim to reveal how the network processes and learns features at different layers. By identifying key components that contribute to HTR performance we can enhance critical areas while minimizing less important ones. Our target is to improve recognition accuracy while reducing the number of parameters.
The fourth path aims to convert historical documents to printed text using ML. We will explore state-of-the-art image-to-image translation models and evaluate their effectiveness.
(2) Open-set text recognition, which aims to address the recognition problem of transcribing samples with unknown characters from various languages and scripts, also facing diversities in writing directions and styles. We had 8%-line accuracy boost and gained edges over commercial LLMs on minor scripts. Phase 3 will focus on improving routing techniques and adding more diverse architectures.
(3) VQA Project: In this phase, we successfully developed the hierarchical structure extraction module, specifically a semi-supervised learning pipeline for detection, which also applies to UAV images. It is qualitatively tested to work on historical documents and quantitatively tested on the Nordic UAV task, where it boosts YOLO11n for over 20% mAP. The next stage is to establish quantitative testing on historical image and integrating the extraction module to the VQA pipeline.
(4) Authorship analysis project: Authorship analysis is the process of examining the characteristic features of a piece of text to conclude its authorship which has its roots in linguistic research called stylometry. The authorship analysis has several related fields of research such as author attribution, author verification, author profiling and authorship detection. This proposal aims to expand author attribution and verification to prominent authors from previous centuries, taking into account authorial style, linguistic features, vocabulary, and other relevant characteristics. We will use ML and DL models for authorship analysis. LLMs will be leveraged for the analysis purpose. We will address open questions in author attribution problem specifically in historic documents and will try to find the authors of unknown manuscripts and documents using ML, DL and LLMs.