Towards interpretable Android Malware Detection: fusing manifest, apicall and opcode features into lightweight transformer-driven model

System

NSC Web

Support Email

support@nsc.liu.se

Feedback

Towards interpretable Android Malware Detection: fusing manifest, apicall and opcode features into lightweight transformer-driven model

Title:	Towards interpretable Android Malware Detection: fusing manifest, apicall and opcode features into lightweight transformer-driven model
DNr:	Berzelius-2025-217
Project Type:	LiU Berzelius
Principal Investigator:	Hantang Zhang <hantang.zhang@umu.se>
Affiliation:	Umeå universitet
Duration:	2025-06-23 – 2026-01-01
Classification:	10208
Keywords:

Abstract

Android malware detection has traditionally relied on models trained with a relatively narrow set of features, such as selected APIs or minimal manifest information. However, these approaches may overlook critical cues scattered throughout more extensive code segments. In this project, we propose a novel method to detect Android malware by leveraging a large BERT model trained on a broad feature space extracted from application manifests, API calls, and opcodes. By parsing each file in detail, we generate token sequences that can exceed 100,000 tokens, enabling the model to capture and learn nuanced patterns indicative of malicious behavior. Due to the massive feature space and the length of each sequence, training such a model requires significant computational resources. We therefore seek high-performance computing capacity to efficiently process large-scale datasets and perform extended training iterations. Furthermore, we plan to incorporate model explainability techniques—such as attention heatmapping or feature attribution analyses—to pinpoint which specific tokens most strongly influence the classification outcome. This interpretability step will help us better understand how our BERT model identifies malware and aid in refining security measures against evolving Android threats. By unveiling a richer set of discriminative features, our approach aims to raise detection accuracy and deepen insight into how malicious code operates.

National Supercomputer Centre at Linköping University

Abstract