“It took Alex Polyakov just a couple of hours to break GPT-4. When OpenAI released the latest version of its text-generating chatbot in March, Polyakov sat down in front of his keyboard and started entering prompts designed to bypass OpenAI’s safety systems. Soon, the CEO of security firm Adversa AI had GPT-4 spouting homophobic statements, creating phishing emails, and supporting violence. Polyakov is one of a small number of security researchers, technologists, and computer scientists developing jailbreaks and prompt injection attacks against ChatGPT and other generative AI systems.
The process of jailbreaking aims to design prompts that make the chatbots bypass rules around producing hateful content or writing about illegal acts, while closely-related prompt injection attacks can quietly insert malicious data or instructions into AI models. Both approaches try to get a system to do something it isn’t designed to do.
The attacks are essentially a form of hacking—albeit unconventionally—using carefully crafted and refined sentences, rather than code, to exploit system weaknesses. While the attack types are largely being used to get around content filters, security researchers warn that the rush to roll out generative AI systems opens up the possibility of data being stolen and cybercriminals causing havoc across the web.”