People can now order and receive an order by communicating with a virtual voice developed by Dhaval Hemant Shah for the large delivery company. He simplified the task not only for employees of establishments, but also for couriers, engineers, and even clients with disabilities.
In October 2025, DoorDash announced the expansion of its DashLink program, a logistics platform now used by major U.S. retailers, including Macy’s, Sephora, and PetSmart. Over the past year alone, the company DoorDash has grown by 24%, reaching $2.9 billion in revenue in the fourth quarter of 2024, and DashLink, the company’s new white-box platform for retail logistics, handles hundreds of thousands of deliveries per month. This is part of a global trend: according to McKinsey, the market for “last-mile logistics” and AI automation in delivery will grow by more than 30% until 2030.
When DoorDash launched its AI assistant for restaurants and the DashLink system for logistics, few people thought these projects would become a benchmark for the entire delivery industry. Behind their reliability and scale is an engineer who combines machine learning, voice technology, and business logic into a single system – Dhaval Hemant Shah. He is a fellow member of Hackathon Raptors and has a membership in IEEE. Learn more about how to create sustainable AI products and what engineers can learn from real businesses and each other in the interview.
– Dhaval, DoorDash recently announced the expansion of the DashLink platform, which, according to the company, already handles lots of applications. What is behind these numbers?
– DashLink is our “invisible engine” for delivering goods not only from restaurants, but also from major retailers like Macy’s, Sephora, and PetSmart. The platform now handles a large number of parcel deliveries every month, making it the fastest-growing new business line at DoorDash. Technically, it’s an event-driven orchestration system that connects the journey’s first, middle, and last mile from the warehouse to the customer’s door. We’ve built a shipment state machine with idempotent APIs, dead-letter queues, circuit breakers, and SLA monitoring, ensuring reliability even during peak traffic. Due to this, the platform can withstand peaks without loss and ensures high-quality discussions even on Black Friday, when there are several times more orders than on other days.
– How does the system manage to handle such a large number of requests and not fail?
– We have created an architecture that does not break under load. The key is resilience by design. Every failed request is retried automatically, and nothing gets lost. We maintain idempotency at every layer, and have built contract tests to validate APIs of all retail partners before production launches. We also use cost-aware dispatch policies to optimize routes and carrier selection, ensuring that deliveries remain fast and economically viable. Deep operational dashboards give us visibility into every parcel, from pickup to delivery confirmation.
– The DashLink system is truly unique. But among your developments there is also Voice Assistant, which has already been actively used by the company. Your Voice AI for the delivery service has become one of the first generative systems that actually work with customers. How did you manage to achieve a natural interaction and at the same time avoid the mistakes that often occur with chatbots?
– We realized that the restaurant environment is noisy, the conversation is short, and every second counts. Therefore, we implemented streaming speech recognition (streaming ASR) with noise reduction, and query comprehension (NLU) made it ‘aware’ of the menu of a particular restaurant. If a customer says “add a Coke to the burger”, the system knows for sure that this particular drink is available in this establishment. The dialog is handled by a module that follows the rules of business; for example, it always offers dessert or a drink. This resulted in 100% compliance with corporate upsell scenarios. If the model’s confidence drops, the conversation is instantly transferred to the person. Now, Voice AI is in pilot mode across various chains in the USA and processes hundreds of orders per day. We are planning to expand Voice AI to customer and dasher support use cases.
– How do you measure the effectiveness of such a solution and the implementation of this development for customers?
– Voice AI ensures absolute compliance with corporate sales rules. The customer may forget something, and the voice assistant will always prompt you on the menu and offer additional products. This ensures uniform quality of service at all points of the network and allows the customer to be heard and understood. In addition, the technology relieves the burden on staff: previously, one employee had to simultaneously take orders for a drive-thru and help in the kitchen; now, AI takes over receiving calls, and the team can focus on cooking. This increases throughput. That is the speed of customer service: there is a throughput growth of 15–25% during peak hours, and restaurants process more orders without increasing staff. This way, customers are queuing less.
And what is especially important to me is that Voice AI not only saves time, but also makes the DoorDash ecosystem more inclusive and accessible. We are already integrating voice control directly into the DoorDash app so that users with physical disabilities can place orders without touching the screen. This is a step towards a friendly interface where technology really helps people, rather than complicating interaction.
– During the period of development in the field of delivery, you were just speaking at conferences. At the NVIDIA GTC 2022 conference, you talked about scaling the DoorDash ML platform using GPU clusters, and at the Ray Summit 2023, you talked about distributed orchestration using Ray on Kubernetes. What exactly did you present at these events, and how did the professional community perceive it?
– At NVIDIA GTC, I showed how we managed to make our machine learning systems work faster and cheaper. We used GPU clusters, which are powerful computing servers that train artificial intelligence models. We’ve redesigned the way they share resources, and as a result, we’ve reduced the cost of training by about 30%, and models are getting into production faster and with fewer errors. And at the Ray Summit 2023, my focus was on distributed training helping speed up large model training jobs. When dozens of engineers are simultaneously training and launching models, it’s easy to lose common standards, and the results may vary. We have created a common platform where all experiments follow the same rules and can be easily verified or replicated. The community took this experience very warmly; it was important for many engineers to see how a company of the DoorDash scale solves the problem of combining speed, transparency, and sustainability of AI systems.
– You started sharing your experience long before these conferences. You came to DoorDash from the gaming industry, where you worked on speech synthesis and voice cloning at Electronic Arts, and it was innovative. How has this experience proved useful in building logistics and delivery systems?
– My experience at EA has taught me the speed of iteration. I worked on neural text-to-speech (TTS) and voice cloning models for NPCs. We added speaker embeddings so the AI could reproduce a specific actor’s voice from just a few samples. Developers could instantly test new dialogue lines, which shortened iteration cycles by weeks and cut recording costs by up to 40–50%. In game development, it is important that the scriptwriters can change cues and instantly hear the result, so we built flexible, retrained TTS models. I transferred this principle to DoorDash: you need to quickly test hypotheses, build prototypes, collect feedback, and implement improvements without breaking the system. This is no less critical in logistics and AI assistants.
– You worked at EA from 2018 to 2021. At that time, the world still did not know about the possibilities of AI, and even more so about the creation of artificial intelligent voices. Is it possible to say that at that time you were making a prototype of modern voice-AI systems?
– Yes, in many ways. At that time, it was a new direction. We didn’t just synthesize speech, but tried to control prosody, tempo, intonation, and emotional tone. I made tools where the scriptwriters could choose a “cheerful” or “tired” version of the line. This became part of Electronic Arts’ internal TTS toolkit, which later formed the basis of the company’s proprietary technology. Perhaps this was the beginning of transformations in other companies as well.
– Your unique developments have led you to recognition in the industry, and you have become an IEEE member and a fellow member of Hackathon Raptors, as well as actively engaged in mentoring. Why is this important to you, despite your busy schedule and difficult projects?
– Mentoring is part of my engineering philosophy. When you’re responsible for systems that millions of people use, you realize that quality doesn’t scale by itself – people scale it. I help junior engineers through code review and analysis of architectural solutions. It is important for me not just to correct a mistake, but to explain why this or that decision was made. This forms a collective mindset and raises the level of the whole team. In practice, this means fewer incidents, shorter development cycles, and greater system reliability. I am also involved in training new employees who enter complex fields such as Voice AI or logistics through collaboration, pair programming, and documentation of best practices. Knowledge transfer is a way to increase influence, and that’s what I see as the point of engineering.
– And finally, which technologies inspire you the most today? Where do you see the next wave of innovation?
– I am inspired by real-time systems where decisions are made in milliseconds and directly improve the human experience. It can be delivery, customer support, or medicine. The main thing is that technology is not only fast, but also reliable and inclusive. We are already seeing models that can hear, see, and speak. The next step is an AI that understands the context: what is appropriate, what is ethical, what is really useful to a person. I think the real innovations will not be so much in the architecture of models, but in the infrastructure of responsible AI: transparent, understandable, and safe for millions of users. That’s what I want to keep working on. I also want to influence and mentor junior engineers and help them grow in their careers. I would love to transition into an engineering manager role in the future and drive even more impact by guiding and managing multiple engineers.