Researchers at the Massachusetts Institute of Technology (MIT) have proposed an approach to mitigate the risks of malicious artificial-intelligence-base image editing, which involves immunising images to make them resistant to manipulation. Such immunisation relies on the ‘injection of imperceptible adversarial perturbations designed to disrupt the operation of the targeted diffusion models, forcing them to generate unrealistic images’.
The researchers note that the procedure does have certain limitations, but suggest possible ways to address them. For instance, malicious actors could try to remove the disruptive effect created by the immunisation method by cropping the image, adding filters to it, or applying a rotation. One way to address such challenges could be to develop adversarial perturbations that can withstand a broad range of image modifications and noise manipulations.
Similarly, the proposed method may not be effective against future versions of image editing AI models. Here, the proposed response is to ‘encourage— or compel—via policy means, a collaboration between organizations that develop large diffusion models, end-users, as well as data hosting and dissemination platforms. Specifically, this collaboration would involve the developers providing APIs that allow the users and platforms to immunize their images against manipulation by the diffusion models the developers create. Importantly, these APIs should guarantee “forward compatibility”, i.e., effectiveness of the offered immunization against models developed in the future. This can be accomplished by planting, when training such future models, the current immunizing adversarial perturbations as backdoors.’