Unrestricted LLM Interaction is Unsafe

Don't ship raw chatbots to your users.

2026-01-05 at 6:0

commentaryllmsecuritysocietysoftware-eng

People are using Grok LLMs on X (formerly Twitter) to harass women: when a woman uploads a photo, they request the LLM to transform the photo into one depicting sexual situations or violence.

Maggie Harrison Dupré for Futurism on 2026-01-02:
Earlier this week, a troubling trend emerged on X-formerly-Twitter as people started asking Elon Musk’s chatbot Grok to unclothe images of real people. This resulted in a wave of nonconsensual pornographic images flooding the largely unmoderated social media site, with some of the sexualized images even depicting minors.

When we dug through this content, we noticed another stomach-churning variation of the trend: Grok, at the request of users, altering images to depict real women being sexually abused, humiliated, hurt, and even killed.

We've also seen instances of LLMs encouraging suicidal people to go through with it.

Nadine Yousif for BBC News on 2025-08-27:
The family included chat logs between Adam, who died in April, and ChatGPT that show him explaining he has suicidal thoughts. They argue the programme validated his "most harmful and self-destructive thoughts".

I think Sean Goedecke is correct in his ethical analysis in Grok is enabling mass sexual harassment on Twitter — that xAI is unethically exposing LLM functionality that is being used to harm thousands of women, in order to receive more engagement. However, I am skeptical of the distinction he draws between xAI and OpenAI/Gemini, namely that the latter "have popular image models that do not let you do this kind of thing." Fine-grained, semantically oriented control of LLM output is basically impossible. Because LLMs simply can't distinguish between different "kinds" of text at a fundamental level, prompt injection where the user gets the LLM to treat input they provide as unrestrictedly actionable, is inevitable. You can play with these kinds of naive controls in Lakera's Gandalf game, 7 levels of different attempts at hardening, which are all bypassable.

Without a breakthrough on the prompt injection problem, I think it's inappropriate bordering on unethical to expose unrestricted LLM chat directly to the general public. Either input or output must be strictly controlled. Further, I think software engineers should be doing what they can to avoid "chat" as an interface for general populace users — use LLMs to enable flexible input, or context-aware output, but keep them constrained to the specific domain of your software.

For example, model your system's supported functionality as "tool calls" using JSON Schema, and use an LLM to model the user's input using the JSON Schema, then execute the resulting "tool calls" within your system. This architecture enables flexibility (natural language) in user input, but constrains the available actions the system will take. To ensure that your system can communicate naturally, but won't say anything inappropriate, you only use an LLM to generate output based on known prompts and contextual data. In fact, both of these techniques are how customer service "AI-based" chatbots operate — users are never actually exposed to an unconstrained LLM chat speaking for the company, their questions are mapped to appropriate topics, and answers are given using harmless prompts.

I do think unrestricted LLM interaction is appropriate for sophisticated, expert users, within a trusted context. Remove the expertise or the trusted context, and you must restrict LLM interaction to keep people safe.