Tag: jailbreaking

New Research Demonstrates Automated Jailbreaking of Large Language Model Chatbots

26 Jan 2024

⋅

Ng S.T. Chong

While LLMs promise helpful conversation, they may have hidden vulnerabilities that can be exploited. For example, manipulating the prompts could lead them to reveal sensitive information or say unethical, inappropriate, or harmful things against their usage policies. This is called a jailbreak attack, essentially an attempt to bypass the model’s security measures and gain unauthorized…
Strings of Nonsense Convince AI Chatbots to Abandon Ethical Rules

14 Apr 2023

⋅

Ng S.T. Chong

Continuing previous coverage of development in AI systems, I wanted to share a study and demo from Carnegie Mellon University in Pittsburgh, Pennsylvania and the Center for AI Safety in San Francisco, California revealing a new spin on how chatbot safeguards are susceptible to attacks. AI chatbots like OpenAI’s ChatGPT, Google’s Bard, and Anthropic’s Claude don’t have…