As the global race toward full autonomy accelerates, it has never been more important to focus on software and AI infrastructure as companies are moving away from the R & D phase into the delivery phase. It is super important not to compromise on safety while optimizing every part of software delivery to squeeze those tiny profit margins especially for automobile companies.
Vijay Jaishanker has been on the forefront of this software development during his time as a software engineer at Woven by Toyota – a software first subsidiary of Toyota Motor Corporation. He has been working as a software engineer with a focus on platforms, infrastructure and ML Ops. Techbullion reached out to him after coming across his contributions in this domain through blog posts, publications and open source contributions. In this article, we discuss various ways to optimize autonomous vehicle model training time, choosing the right training platforms (what to keep in mind), how to optimize large scale parallel tasks and what to consider while performing inference on edge devices.
The High Cost of Cloud-Scale Training
One of the biggest challenges in building autonomous vehicle software is the sheer cost of training the AI models which power them. Depending on the size of your team, the training cost for your machine learning pipelines can quickly touch six or seven figures if we don’t approach it strategically. Yet, Vijay points out that most jobs which run on expensive GPU machines often do not need that much parallel processing power.
“Most teams default to GPUs due to the ease of maintenance and mere habit”, says Vijay. “But if you analyze the job you are running on the cloud thoroughly, you will often be able to split and schedule most of them onto CPU only clusters”.
This has been the central theme to Vijay’s work. In his recent article, he has described how cloud costs can easily spiral out of control when the teams throw GPU power at every stage of the workflow. He highlights common pitfalls like GPU endpoints, oversized orchestration clusters and underutilized hardware on the cloud instances often quietly drain budgets while adding very little value.
Streamlining Workflows with Flyte Map Tasks
When it comes to streamlining and scaling data pipelines for autonomy related use cases, Vijay points out that parallelization is not an option but the backbone of modern ML Ops. One of his key contributions in this domain is advancing the use of Flyte Map Tasks, a container-native alternative to Spark which simplifies distributed workflows without compromising performance while improving debugging.
“We migrated offline perception pipelines from Spark to Flyte map tasks and saw a massive reduction in debugging time,” Vijay recalls. “Our teams could iterate faster without the need to handle spark config files and easier to read linear logs.” For workflows that process millions of media files or telemetry data weekly in parallel, this approach has proven far easier to manage than Spark’s complex DAGs.
Vijay’s work on this gained wider recognition when he published an article on Flyte map tasks alongside his colleagues from Woven by Toyota. This piece was published on Flyte’s official blog and it quickly became a reference point for practitioners looking to optimize large-scale pipelines. The article was even shared by Ketan Umare, the founder of Flyte, on his linkedin page amplifying its reach across the MLOps community.
“Users were surprised at how often Spark was being used for workloads that really didn’t need it,” Vijay explains. “Adoption of Flyte map tasks became more organic once users realized this.” He has cemented his role as a thought leader by not only improving workflows at Woven but also influencing the broader ML Ops open-source community.
From Cloud to Edge: Parallelization at the Device Level
Once you have a complex autonomous driving model trained on the cloud, it does not end there. These models should be run reliably on vehicles in real time. To explore these challenges, Vijay built an open-source project showing how 3D object detection can be performed and scaled using low-cost hardware.
In his open source project, he demonstrated how Intel’s NCS2 (Neural Compute Stick) devices running the OpenVINO framework could run LIDAR-based object detection pipelines which are usually run on more powerful processors/ GPUs. His work proved that inference can be parallelized with multiple lightweight hardware devices to improve throughput while introducing resilience. When one of the sticks was down, this did not compromise the setup functionality.
“Edge parallelization is not just about speed, it is about resilience”, says Vijay. “In real-world autonomy use cases, you cannot afford to compromise the entire system due to a single failure as it can have fatal consequences.” The project was shared openly on GitHub and drew interest from engineers and researchers looking for affordable ways to experiment with edge inference.
Conclusion
The future of autonomous driving software shifts from building bigger and complex models onto making them more reliable, scalable and deployable. The infrastructure and ML Ops adopted by the companies working on this technology will determine how quickly and effectively they can deliver. Balancing cost, speed and resilience is a major challenge which is faced by these companies. Engineers who can bridge this gap are shaping this trajectory today. Vijay Jaishanker represents the new generation of engineers who make autonomy practical, not just possible.
