TinyML - Document Analysis
Title: TinyML - Document Analysis
DNr: Berzelius-2024-464
Project Type: LiU Berzelius
Principal Investigator: Hui Han <hui.han@ltu.se>
Affiliation: Luleå tekniska universitet
Duration: 2024-11-27 – 2025-06-01
Classification: 20206
Keywords:

Abstract

With the advent of digitization, automated document image analysis (DIA) is essential, especially for processing complex documents. These documents present unique challenges due to variations in background, fonts, and layouts. Traditional data collection methods are costly and time-intensive, often failing to provide the extensive annotations needed to effectively train Deep Learning (DL) models for comprehensive analysis and understanding tasks. This project places Tiny Machine Learning (TinyML) at the forefront, exploring how advancements in DL, particularly with large-scale models like Diffusion Models (DM) and Masked Autoencoders (MAEs), can enhance TinyML deployment for DIA. TinyML holds immense promise for enabling ML tasks on resource-constrained devices such as mobile phones, embedded systems, and IoT hardware. However, deploying DL models, which are often computationally intensive and memory demanding, poses significant challenges. By studying and addressing these bottlenecks, this project aims to bridge the gap, adapting large-scale models for TinyML applications. The project will leverage DM and MAEs to tackle challenging DIA tasks such as text recognition, document generation, and layout analysis. These models, while effective, are inherently computationally intensive. Their limitations on resource-constrained devices present an opportunity to study bottlenecks and guide the development of compressed versions tailored for TinyML. To achieve this, the project will use cutting-edge techniques such as pruning, quantization, and knowledge distillation. Pruning reduces model size by eliminating redundant parameters, while quantization lowers the precision of weights and activations, enabling efficient computation on edge devices. Knowledge distillation trains smaller, efficient "student" models to replicate the functionality of larger models. Additionally, split computing strategies will be explored, offloading resource-intensive computations to powerful servers while enabling lightweight processing on edge devices. DM will play a crucial role in generating synthetic datasets that mimic the complexities of real-world documents. These datasets will form the foundation for training and fine-tuning TinyML-compatible models, ensuring efficiency on tasks like OCR, real-time handwriting enhancement, and layout correction. This dual approach—leveraging large models to generate data and optimizing smaller, efficient models—ensures that TinyML deployments are practical and task-specific. Extensive evaluations will validate the feasibility and effectiveness of deploying these optimized models on TinyML platforms. Metrics such as accuracy, latency, memory usage, and energy efficiency will be analyzed across various edge devices, including microcontroller configurations. These evaluations will demonstrate the potential of TinyML to handle complex DIA tasks while highlighting pathways for improving scalability in real-world environments. This project seeks to deepen our understanding of large-scale DL models and their limitations, using these insights to pioneer innovative approaches for TinyML. By addressing computational challenges and adapting advanced models for edge deployment, the project will contribute significantly to DIA advancements and the growing field of TinyML. Findings will be shared at high-impact conferences (e.g., CVPR, ICCV), offering scalable, efficient solutions that democratize access to advanced document processing technologies. This work not only expands the frontiers of TinyML but ensures its benefits are accessible to a broader audience, particularly in resource-limited settings.