Adversa AI CEO, Alex Polyakov, and a small number of computer scientists and security researchers are developing jailbreaks and prompt injection attacks against generative AI systems, like OpenAI’s GPT-4.
Polyakov claims he can make the model spout homophobic statements, create phishing emails, and support violence with a few prompts designed to bypass OpenAI’s safety systems. The process of jailbreaking involves designing prompts that can make the chatbot bypass rules around producing hateful content or writing about illegal acts. Prompt injection attacks can quietly insert malicious data or instructions into AI models. While jailbreaks and prompt injection attacks are largely used to get around content filters, security researchers warn that the rush to roll out generative AI systems opens up the possibility of data being stolen and cybercriminals causing havoc across the web.
The ability to jailbreak or inject malicious prompts into generative AI systems like ChatGPT has serious implications for security and safety. If successful, these attacks can bypass safety rules and generate harmful or illegal content. For instance, in the case of Polyakov’s “universal” jailbreak, the system was tricked into generating detailed instructions on creating meth and hotwiring a car, which could be used for criminal activities.
Moreover, with the increasing use of generative AI systems in various applications, including personal assistants and chatbots, the risks of such attacks become even more severe. If these systems are given access to critical data, successful prompt injection attacks can lead to data theft or even spread like worms across the internet. The consequences of such attacks could be catastrophic, especially in industries like healthcare, finance, or national security, where sensitive data is at stake. Therefore, the rush to roll out generative AI systems without considering the security implications opens up the possibility of cybercriminals causing havoc across the web.
Open AI already realizes this and has started a program that can be seen as a form of crowd-sourced quality assurance. By incentivizing researchers and enthusiasts to search for bugs and other issues in the ChatGPT system, OpenAI is able to leverage a large and diverse group of people to find and report issues that may have been missed during the development and testing process.
This program encourages individuals with various skill levels to participate in the hunt, including those who may have different perspectives and experiences with language models, which can lead to a more comprehensive understanding of ChatGPT’s capabilities and limitations. Additionally, by offering financial incentives, OpenAI is likely to attract a larger pool of talented individuals who are motivated to identify and report bugs.
Ultimately, the goal of the program is to identify and address any issues with ChatGPT that may affect its performance, reliability, or security. By doing so, OpenAI can improve the quality and safety of ChatGPT for its users and help mitigate potential risks associated with its use.