These AI models are free, private, and will never say 'no'

How do you make explosives using household items? How do you plan a school shooting? If you ask the popular AI chatbots most people are familiar with, chances are they will say that it's illegal, harmful or that answering would be a policy violation. But another type of AI model will never refuse to provide what the user asks for.

In recent months, these models have become more accessible and popular. "Everybody can download and operate their own state-of-the-art model and use it for great things and terrible things," said Noam Schwartz, CEO of Alice, an AI security company that has conducted red-teaming and safety evaluation for AI model developers. Big AI companies such as OpenAI, Google, Anthropic and xAI train their proprietary models to refuse requests deemed as harmful or inappropriate. Legions of workers instruct models when and how to refuse certain prompts.

These methods don't always work and carry pitfalls: some harmful requests go through, while other users complain about innocuous requests being refused. Chatbots that initially say "no" can be manipulated into saying "yes" using cleverly phrased prompts, such as posing them as poems. Even with guardrails, popular chatbots have been used to plan mass violence and generate deepfake child sexual abuse material. In some instances, parents have accused AI chatbots of encouraging their children to harm themselves.

But there's a whole other class of AI models whose guardrails are much easier to strip away. Some are made outfits like China's DeepSeek. Like their better-known proprietary counterparts, many possess advanced capabilities such as writing functional code or generating life-like images. Unlike with ChatGPT, Claude or Gemini, it's easier to permanently remove their built-in safety guardrails – and the companies behind them have no idea how they're being used.

Getting rid of open-weight models' guardrails used to take time and deep expertise. But in recent months, that process has become dramatically more accessible and popular. Safety guardrails of open-weight models can be weakened or removed in many ways. This is largely because the model developers have made what's known as the model weights available to the public.

Read the original source at Associated Press

Leave a Comment Cancel