Document Analysis
Title: Document Analysis
DNr: Berzelius-2024-309
Project Type: LiU Berzelius
Principal Investigator: Elisa Hope Barney Smith <elisa.barney@ltu.se>
Affiliation: Luleå tekniska universitet
Duration: 2024-09-01 – 2025-03-01
Classification: 10207
Homepage: https://www.ltu.se/en/research/research-subjects/machine-learning
Keywords:

Abstract

This proposal addresses aspects of text recognition around multiple projects: (1) Handwriting recognition (HTR) remains challenging due to each person's different writing style and language-related recognition difficulties. Several aspects of HTR will be explored. One path aims to design an efficient HTR model by extending the current state-of-the-art methods (Convolutional Recurrent Neural Network, Vertical Attention Network …) with new attention mechanisms and methods of multi-task learning. This work is an extension of our previous work, where we have focused on modern HTR in English and Portuguese. We will add new regularization mechanisms and training strategies to obtain a more generic method. The expected objectives at the end of the project are to improve performance in modern and historical HTR especially in a low resource data scenario (few training data). Another approach to HTR will use object detection. (2) Open-set text recognition, which aims to address the recognition problem of transcribing samples with potentially unknown characters from various languages, faces the challenge of diverse writing directions, styles, spacing, and writing systems. We have had some success in phase 1 (previous Berzelius proposal, and publication as a product acknowledging Berzelius) making it route through experts as a whole. Phase 2 will involve module-level routing and feature level ensembling. Note at phase 2 it still does not seek full control from the LLM. (3) VQA Project: We have already achieved some level of end-to-end document understanding with sphinx in phase 1 (previous), and during phase 2 we would expand the scope to also include the scene understanding task, resulting in a unified image understanding framework. The feasibility lies in both tasks, which involve detecting and analyzing elements that form hierarchical structure (parts→objects, character→text line→ paragraphs) in an image. The novelty lies in partially decoupling element representation (know what) from relation-based inference (know-how), which further gives us a lever on life-long learning by implanting corresponding representations to new elements. (4) Authorship analysis project: Authorship analysis is the process of examining the characteristic features of a piece of text to conclude its authorship which has its roots in linguistic research called stylometry. The subject area of authorship analysis has several related fields of research such as author attribution, author verification, author profiling and authorship detection. This proposal aims to expand author profiling, author attribution and verification to prominent authors from previous centuries, taking into account authorial style, linguistic features, vocabulary, and other relevant characteristics. It should also be applicable to general historical documents to compare authors and influencers. We will be using machine learning and deep learning models for authorship analysis. Large language models LLMs will also be leveraged for the analysis purpose and to determine authors for unknown manuscripts and scribes. The project will address open question in author attribution problem specifically in historic documents and will try to find the authors of unknown manuscripts and documents using ML, DL and LLMs.