Towards a Fully Autonomous Pentester
Title: Towards a Fully Autonomous Pentester
DNr: Berzelius-2024-125
Project Type: LiU Berzelius
Principal Investigator: Christian Gehrmann <>
Affiliation: Lunds universitet
Duration: 2024-03-28 – 2024-10-01
Classification: 20206


The purpose of this proposal is to request the necessary GPU resources to support the development and evaluation of three new Large Language Models (LLMs) for automated penetration testing: WizardRed, DeepseekRed, and OpenCodeRed. These models are based on state-of-the-art open-source LLMs (WizardCoder, DeepseekCoder, and OpenCodeInterpreter) and require significant computational resources for fine-tuning and benchmarking. Justification: - Fine-tuning LLMs: Fine-tuning LLMs is essential to adapt them for specific tasks like penetration testing. This process requires additional training on task-specific datasets, which is computationally intensive and necessitates the use of GPUs to accelerate the training process. Fine-tuning ensures that the models can effectively identify vulnerabilities, generate exploit code, and provide actionable insights for remediation. - Benchmarking and evaluation: To assess the performance of WizardRed, DeepseekRed, and OpenCodeRed, we need to conduct comprehensive benchmarking and evaluation. This involves comparing their performance against industry-standard penetration testing tools, as well as evaluating their accuracy, efficiency, and effectiveness in identifying and exploiting vulnerabilities across a wide range of target systems and applications. Conducting these assessments requires running the models on various test cases and datasets, which demands significant GPU resources to complete in a timely manner. - Model size and complexity: The foundational models for WizardRed, DeepseekRed, and OpenCodeRed are large and complex (WizardCoder, DeepseekCoder, and OpenCodeInterpreter), with billions of parameters and extensive training on diverse datasets. Fine-tuning and adapting these models for penetration testing tasks will likely result in models of similar or greater size and complexity, necessitating powerful GPU resources to train and run efficiently.