DiffusionAttack: A study of adversarial editing to stable diffusion model
Title: DiffusionAttack: A study of adversarial editing to stable diffusion model
DNr: Berzelius-2024-315
Project Type: LiU Berzelius
Principal Investigator: Minxing Liu <minxing.liu@liu.se>
Affiliation: Linköpings universitet
Duration: 2024-09-01 – 2025-03-01
Classification: 10201
Keywords:

Abstract

Our primary goal is to develop "DiffusionAttack", an aggressive strategy designed to perform unauthorized concept editing within Stable Diffusion models. This project seeks to build upon the innovative principles of Stable Diffusion and Google's Null Text Inversion Editing by employing sophisticated editing techniques in the latent space that enable precise manipulation of text-to-image synthesis outputs. With the advent of text-to-image diffusion models like Null Text Inversion Editing, the ability to generate hyper-realistic images based on a few reference pictures has been revolutionized. Also, this capability opens the door to new forms of digital manipulation and content alteration, raising concerns over the authenticity of digital media. "DiffusionAttack" aims to exploit these models for controlled editing purposes, pushing the boundaries of content creation while highlighting the need for robust defenses against potential misuse. Upon completion, we expect to unveil an advanced system capable of performing subtle yet impactful modifications in the latent space of images. This system will demonstrate how targeted adjustments can significantly alter the output of models like Null Text Inversion Editing, enabling users to create or modify content with unprecedented precision. Our approach will be compatible with various iterations of text-to-image models, showcasing its versatility and potential for widespread application. The project's findings will be documented in a detailed research article. To achieve these objectives, we will utilize the Berzelius computing resources for intensive experimentation with latent space editing techniques. This will involve the development of novel algorithms for latent space manipulation, leveraging adversarial training methods and optimization strategies. Our evaluation will span multiple publicly available datasets to ensure the effectiveness of our approach across different text-to-image models. The technological framework for this project will be based on Python and machine learning libraries like Pytorch and Tensorflow, incorporating code from leading research entities such as Stability AI, Google Research, and Microsoft Research for foundational insights.