
Following the release of GPT-5.5 last week, folks seen one thing humorous about OpenAI’s newest mannequin. In its Codex coding app, the corporate left a system immediate instructing GPT 5.5 to keep away from point out of goblins, gremlins and different creatures. Sure, you learn that proper. “By no means speak about goblins, gremlins, racoons, trolls, ogres, pigeons, or different animals or creatures except it’s completely and unambiguously related to the consumer’s question,” the immediate reads.
Apparently, sufficient folks started talking about ChatGPT’s creature obsession that OpenAI felt the necessity to present an accounting of where the goblins came from. In a weblog submit revealed Wednesday, the corporate explains it started to note a change in ChatGPT following the discharge of GPT-5.1 final November. After one security researcher requested OpenAI to incorporate the phrases “goblin” and “gremlin” in an investigation into the chatbot’s verbal ticks, the corporate discovered ChatGPT’s utilization of “goblin” elevated by 175 p.c after the discharge of GPT-5.1. In the meantime, “gremlin” utilization had risen by 52 p.c over that very same interval.
That is an precise line that was added to the official system immediate for Codex for GPT-5.5 by OpenAI. Normally the system immediate is as minimal as attainable, so I assume it might in any other case point out goblins loads.
AIs are bizarre.
— Ethan Mollick (@emollick.bsky.social) 2026-04-28T06:14:22.988Z
“A single ‘little goblin’ in a solution might be innocent, even charming. Throughout mannequin generations, although, the behavior turned arduous to overlook: the goblins stored multiplying, and we wanted to determine the place they got here from,” OpenAI says. After the release of GPT-5.4, the corporate (and some users) seen a good greater uptick in goblin references. At that time, an investigation was in a position to pinpoint what OpenAI describes as “the primary connection to the foundation trigger.”
For some time now, ChatGPT has included a personality feature that enables customers to customise the type and tone of the chatbot’s responses. Previous to March of this 12 months, one possibility folks may choose was “nerdy.” A part of the system immediate for that character learn as follows: “The world is advanced and unusual, and its strangeness have to be acknowledged, analyzed, and loved. Deal with weighty topics with out falling into the entice of self-seriousness.”
When OpenAI mapped goblin mentions to completely different ChatGPT personalities, it discovered the nerdy character was disproportionately answerable for utilizing that one phrase. Regardless of solely accounting for two.5 p.c of all ChatGPT responses, it made 66.7 p.c of all goblin mentions generated by the chatbot. Additional investigation revealed that reinforcement studying was guilty for the uptick in goblin and gremlin utilization. Particularly, OpenAI discovered {that a} single reward mechanism was answerable for instructing the nerdy character to constantly favor creature language.
“Throughout all datasets within the audit, the Nerdy character reward confirmed a transparent tendency to attain outputs to the identical downside with ‘goblin’ or ‘gremlin’ larger than outputs with out, with optimistic uplift in 76.2 p.c of datasets,” the corporate explains.
Subsequently, OpenAI discovered, on account of how reinforcement studying can work, that the nerdy character’s love of goblins had transferred to different components of its fashions. “The rewards had been utilized solely within the Nerdy situation, however reinforcement studying doesn’t assure that realized behaviors keep neatly scoped to the situation that produced them,” the corporate explains. “As soon as a mode tic is rewarded, later coaching can unfold or reinforce it elsewhere, particularly if these outputs are reused in supervised fine-tuning or desire information.”
OpenAI started coaching GPT-5.5 earlier than it recognized the reason for ChatGPT’s affinity for goblins, which is why there is a immediate instructing Codex to keep away from creature language. “Codex is, in any case, fairly nerdy,” OpenAI notes. In looking down ChatGPT’s goblins, the corporate notes it has devised new instruments to audit and repair mannequin conduct. If it was as much as me, I would not use these instruments. Hold AI bizarre, I say.