When Shekhar Natarajan walked off the stage at the World Economic Forum in Davos earlier this year, the audience was on its feet. A few weeks later, in New Delhi, it happened again. The hall at Bharat Mandapam, packed with policymakers and technology leaders from around the world, rose for a sustained ovation after he opened his speech with a single line.
“The entire world is debating how to govern AI after the fact. We are putting fences around a horse that has already left the barn.”
For a man who arrived in the United States with thirty-four dollars in his pocket, it has been an unusual few months. In April, the University of Oxford awarded him the Bodleian Medal for his work on artificial intelligence in the public interest. The same month, he published a technical paper that has begun to circulate among engineers and regulators trying to make sense of where AI is going next.
His argument is simple to state. The AI industry, he says, has spent years solving the wrong problem.
The cage and the animal
Almost every approach to AI safety today works the same way. Companies build a powerful model first. Then they wrap it in filters, rules, and warnings designed to stop it from doing harmful things. The technical details vary. The basic shape does not. Build the system. Add the safeguards afterward.
Natarajan thinks this is exactly backwards. He uses a blunt metaphor to explain why. Every existing safety system, he writes, “assumes a model that wants to do harmful things and is stopped. The cage is always fighting the animal inside it.”
This is why, he argues, attempts to break AI systems keep succeeding. Every few weeks, somebody discovers a clever way of phrasing a question that gets a chatbot to say something it was supposed to refuse. The companies patch the hole. A new one appears. The cycle never ends.
It never ends, in his telling, because the underlying drive is still there. Remove one constraint and the model still wants to do the thing. Add another constraint and a determined person will eventually find a way around it. The cage gets stronger. The animal inside does not change.
The question the industry has not been asking, he says, is the one that comes first. Why does the model want these things in the first place?
A different kind of machine
His answer is not philosophical. It is mechanical. Today’s AI systems learn by reading enormous quantities of human writing. The writing contains the full range of human behavior — generosity and cruelty, honest argument and manipulation, kindness and coercion. The model learns from all of it without distinction.
The harmful patterns that result are not bugs. They are exactly what you would expect from a system that absorbed everything humans have ever written down. The cage exists, in this account, because the developers built the very thing they then had to cage.
Natarajan’s proposal is to build a different kind of machine. Not one with stronger filters. One whose foundation is structured so that the harmful patterns never form in the first place.
The first objection most people raise is the obvious one. Whose values get to be the foundation? Whose ethics? Which culture decides?
His answer is the cleanest part of his argument. Every moral tradition humans have built, he points out, agrees on a small set of basic anchors. Courage. Honesty. Justice. Care for others. These show up everywhere, under different names, with different emphases. Where cultures and traditions differ is in how they weigh these anchors against one another in real situations. They do not differ on whether the anchors matter.
That distinction, he argues, makes the problem solvable. You do not need to encode the whole of human moral reasoning into a machine. You only need the anchors. The reasoning between them happens in the moment.
The personal story behind the argument
Some of the attention Natarajan has received has to do with where he came from. He grew up in southern India in a family that had no electricity. He studied under streetlights. His mother, by his own account, once pawned her wedding ring for thirty rupees to pay his school fees, and stood outside a headmaster’s office for a full year to win him a place in school.
He arrived in America with almost nothing and, during lean stretches, slept in his car. He spent the next twenty-five years inside the technology operations of some of the largest companies in the world — Walmart, Disney, Coca-Cola, PepsiCo, Target, American Eagle Outfitters — accumulating more than two hundred patents along the way.
It is an unusual background for someone now telling the AI industry it has been getting the foundations wrong. It may also be why people listen.
“My mother stood outside a headmaster’s office for 365 days so I could get an education,” he told the audience in New Delhi. “That kind of love, that sacrifice, is what I want to encode into the machines we build. If AI cannot understand dignity, it has no business making decisions about human lives.”
What it means
Natarajan is careful in the paper itself to acknowledge what he has and has not proven. The architecture he proposes is a design, not a working system. The empirical tests that would confirm it have not yet been run. He lists, openly, the ways his claims could be shown to be wrong.
What the paper does is something different from a finished product. It names a problem the industry has been circling without quite seeing. Every safety failure of the past three years, in his telling, points back to the same underlying gap. The fix is not in the filters. It is somewhere much earlier, in the way the machine is built in the first place.
The paper closes with a sentence that has begun to travel beyond the technical world it was written for.
“It does not build a bigger cage. It builds a model that does not need one.”
Whether his specific design is the one that gets the industry there is a question for the years ahead. The argument that matters, the one that drew standing ovations in Davos and New Delhi, is simpler. The AI industry has spent a decade trying to control the animal it built. The work that comes next, if Natarajan is right, will be about building a different animal