Artificial intelligence

What It Takes to Make an AI Assistant Remember the Conversation

By Gerrita Bikker

Posted on May 20, 2026

Generative AI has gone from novelty to enterprise infrastructure faster than almost any consumer technology before it. Traffic from generative AI tools to U.S. retail websites had climbed 4,700% year over year as of July 2025, a sign that user now expect to research products, weigh options, and decide through natural conversation rather than keyword search. That shift hands retailers a difficult engineering bill. A user who asks three follow-up questions expects the assistant to remember all three, instantly, while millions of other user do the same thing at the same moment.

Aditi Patodiya has spent her career inside that bill. A Senior Software Engineer with more than 10 years in large-scale distributed systems, she builds the backend services that let conversational AI hold a thread across many turns without slowing to a crawl. She is a Senior Member of the IEEE, a distinction reserved for engineers with a sustained record of professional accomplishment. Her focus sits at the layer most users never see: how a system stores, prunes, and moves the conversation history that makes an assistant feel intelligent.

Why conversational AI forgets

The reason an AI assistant struggles to remember is physical, not conceptual. Every time a user asks a follow-up, the model reprocesses the entire prior conversation, and that history lives in GPU memory as a structure called the key-value cache. A single 128,000-token context for one user can consume roughly 40 GB of memory on a 70-billion-parameter model, and that cost grows linearly with every additional user. Multiply it across potentially millions of simultaneous sessions and the hardware bill, along with the lag the user feels, climbs fast.

Patodiya took this on directly as the technical owner of a 90-day, leadership-mandated program, the company’s generative-AI assistant, built to close the highest-impact context defects in multi-turn conversations. She authored the program’s master design document and the four-class defect taxonomy, unprocessable, missing, incorrect, and other context, that the engineering, science, and product teams now use to classify all context-related work. The scale was unforgiving. In a single baseline week, the assistant handled close to 11.9 million conversations in the U.S., and roughly 18% of those were multi-turn conversations, the exact population where context failures tend to emerge. An annotator study measured a 24% defect rate on prior-conversation context in those multi-turn samples. Context handling ranked as the single largest customer-deterring issue in the company’s own user research, which meant the program was not a tuning exercise but a prerequisite for any feature that built on conversation history.

“The model is only as good as the context you hand it,” Aditi Patodiya says. “If the history arrives garbled or incomplete, the smartest model in the world still gives a confused answer. Most of the hard work is upstream of the model, in how you carry that history cleanly.”

Building a pipeline that carries context

Conversational AI has become a market worth defending. It reached an estimated $11.58 billion in 2024 and is projected to grow to $41.39 billion by 2030, a trajectory built on the assumption that these systems can sustain natural, multi-turn dialogue at scale. That assumption only holds if the plumbing underneath keeps the conversation intact as it travels between the many services that make up a production assistant.

Before Patodiya’s work, evidence moved between the assistant’s services as binary-encoded data, which the language model could not read, quietly dropped context along the way. She designed a unified, model-readable schema and drove its adoption across six interdependent backend services, acting as the single engineer accountable for reconciling contract proposals among seven partner teams. The new contracts cover evidence, hydration, multimodal content, and system-injected content under one design, and the rollout coordinated more than 80 engineering-days of cross-team effort into one coordinated pipeline.

“The interesting part was never any single service,” Patodiya notes. “It was getting seven teams to agree on one contract for how context should look, so nothing got lost in translation between them.”

The failures that hide in the edge cases

The failures that matter most in conversational systems rarely show up in a demo. They appear in the awkward middle of a real session, when a user abandons a query halfway, when an answer comes back empty, or when one service finishes a beat after another expects it. These partial states are where context silently corrupts, and they are far harder to design for than the clean path everyone tests first.

Patodiya built the pipeline to survive those moments. She implemented handling for partial-turn conversations where a user ends early, including placeholder semantics so an empty answer does not corrupt the next turn’s context, and she added gating logic with a fallback safety path so the new behavior could be switched off instantly if it misbehaved in production. That instinct for how systems break informs work she does beyond her day job. She is also a peer reviewer for the 2026 International Conference on Applied Artificial Intelligence, evaluating submissions for the same rigor she brings to her own designs.

“Most teams optimize for the happy path and treat the edge cases as cleanup later,” Patodiya observes. “I think that is backward. The edge cases are where customers actually lose trust, so that is where I start.”

Measuring what used to be invisible

For years, teams building conversational AI had no reliable way to measure whether the system was actually holding context. Quality was judged by spot checks and proxy signals, which meant a defect could affect a large share of conversations before anyone could prove it existed, let alone fix it. You cannot improve what you cannot see.

Part of Patodiya’s program was building that visibility. The work delivered end-to-end context tracing across the assistant’s services for the first time, integrated with the team’s root-cause analysis tooling, which largely eliminated reliance on manual annotation for diagnosis. It also created a baseline traffic dataset and a model-based evaluation method, giving the organization a repeatable way to measure context defect rates rather than guess at them. Moving the evidence-parsing logic out of the model’s prompt-construction code removed an entire class of silent context-loss bugs at the same time. With a number attached to the problem, the team could finally rank context defects against everything else competing for engineering time and prove when a fix had actually landed.

“Once you can measure a problem honestly, the politics around it fall away,” Patodiya explains. “The number tells you where to spend your time. Before that, everyone is just arguing from anecdotes.”

The foundation under the next wave

The hardest problem in applied AI right now is not building a capable model. It is running one at the scale of a global consumer product without the economics or the latency collapsing. Stateful, multi-turn conversation is the part of that problem that most directly shapes whether people trust an assistant enough to lean on it for everyday decisions.

The pipeline Patodiya designed has become the foundation on which other features depend. Because it dictates how conversation data is stored, moved, and streamed, the assistant’s later capabilities rely heavily on it, and the contracts she defined now govern how every team adds to the conversation. Her approach offers a working pattern for any organization trying to take a powerful model out of the lab and run it sustainably for hundreds of millions of users.

“We are past the question of whether AI can hold a conversation,” Patodiya reflects. “The real work now is making it remember reliably, for everyone, at once, without anyone noticing the machinery. That is the standard I am building toward.”