Exploring and Enhancing Trustworthy Machine Learning Techniques
Title: Exploring and Enhancing Trustworthy Machine Learning Techniques
DNr: Berzelius-2025-200
Project Type: LiU Berzelius
Principal Investigator: Genghua Dong <genghua@kth.se>
Affiliation: Kungliga Tekniska högskolan
Duration: 2025-06-16 – 2026-01-01
Classification: 10210
Keywords:

Abstract

As machine learning technologies become increasingly prevalent across diverse sectors, their reliability within decision-making processes has surfaced as a pivotal area for research and implementation. This study is dedicated to exploring and enhancing the theoretical underpinnings, methodologies, and real-world applications of Trustworthy Machine Learning (TML). To achieve TML, various strategies are employed, including enhancing the interpretability of ML models, quantifying uncertainty and formulating fairness algorithms, incorporating privacy-preserving techniques, and bolstering robustness and security. Nonetheless, these strategies often present a dilemma between improving trustworthiness and predictive performance, or they might require additional computational resources. From a practical point of view, lots of embeddings of input examples should be extracted during experiments. This process requires frequent inferences of the models. To validate our techniques, it is crucial to employ real-world datasets, such as medical images, traffic images, etc., which are large at either the dataset level or the single example level. Therefore, GPU resources are necessary for researchers to conduct experiments efficiently. With the support from the Berzelius GPU cluster, our goal is to formulate TML frameworks in mainly two directions. One approach is to increase the interpretation of foundation models, such as transformers. Another direction is to detect adversarial behavior related to AI security. As a result, interpretability and adversarial attack detection are complementary techniques that can collaboratively improve both the robustness and transparency of machine learning models. We hope our work can contribute to the real-world AI applications, such as medical analysis, fraud detection, etc., to increase their reliability and explainability in the current complex cyberspace.