Business news

Diptanu Choudhury’s Journey with Tensorlake in Revolutionizing AI Infrastructure

Posted on August 16, 2024

Diptanu Choudhury’s journey into AI and distributed systems began long before founding Tensorlake. Born in India and educated in Electronics and Communication Engineering at the National Institute of Technology, Jalandhar, his fascination with technology started early. At Cisco, he honed his software engineering skills, tackling complex middleware problems and building a foundation in large-scale systems. His expertise grew at Netflix, LinkedIn, Facebook, and HashiCorp, where he faced numerous challenges that shaped his innovative approaches at Tensorlake.

At Netflix and HashiCorp, Diptanu co-led the development of large-scale distributed cluster schedulers, learning the intricacies of scaling and reliability. At LinkedIn, he emphasized high availability and robustness in scalable cluster schedulers, recalling, “Designing data systems is all about being reliable so that customers can trust you with their data.” His deep dive into AI/ML at Facebook as the tech lead for the real-time speech inference engine involved tackling the complexities of deploying deep learning models at scale, providing critical insights into optimizing neural network performance.

This blend of distributed systems and AI/ML expertise laid the groundwork for his visionary leadership at Tensorlake.

The birth and mission of Tensorlake

The genesis of Tensorlake began with Diptanu’s keen observations on the evolving landscape of data systems architecture. Having experienced the transitions from traditional RDBMS to NoSQL databases and then to Hadoop and Spark for analytics workloads, he identified a gap where these diverse worlds rarely intersected. The advent of Large Language Models (LLMs), which have the capability to analyze and make decisions based on arbitrary data, rather than just structured tabular data, further highlighted this need.

He remarked, “Applications went from using RDBMS systems in the mid-2000s to distributed SQL and NoSQL during the mid-2010s. With the advent of LLMs, which can reason over arbitrary data and make business decisions, we are looking at not only tabular data kept in traditional databases but also unstructured data like video, PDFs, and images.”

Guided by this insight, Tensorlake was born with a mission to bridge this chasm and empower developers by democratizing AI infrastructure through open-source systems. The company’s vision goes beyond merely storing data; it aims to make unstructured data directly usable for real-time applications. He emphasized, “At Tensorlake, we are building data infrastructure that makes unstructured data usable by applications serving consumers and enterprise users directly. Our systems are open source at the core, because we want to democratize infrastructure that developers can use to build AI systems.”

Tensorlake’s open-source ethos and commitment to versatility are at the heart of its approach, ensuring that developers worldwide have the tools to innovate and create sophisticated AI systems without the constraints imposed by proprietary solutions.

Experience-driven innovation on scalable systems

As earlier mentioned, Diptanu’s tenure at Facebook and LinkedIn has profoundly shaped his approach to building scalable systems at Tensorlake. At Facebook, he led the real-time speech inference engine, working on distributed inference for deep learning models. “While I worked on it, I understood the importance of having flexibility to choose the right hardware platform for running deep learning models,” he explains. This adaptability in hardware selection is a principle he brought to Tensorlake, where systems run on hybrid hardware with various accelerators, allowing developers to choose the optimal platform for each model.

At LinkedIn, Diptanu developed next-generation cluster schedulers for data systems serving production traffic, emphasizing high availability and reliability. “Designing data systems is all about being reliable so that customers can trust you with their data,” he emphasizes. This focus on reliability is now a cornerstone of Tensorlake’s systems.

A breakthrough moment at Facebook further honed his expertise. A highly effective language model he developed couldn’t be deployed due to its excessive parameters and prohibitive hardware costs. “My model didn’t make it to production as we would have to spend tens of millions of dollars to get enough hardware to run this slightly slower but more accurate model,” he recounts. This led him to explore neural network runtimes in PyTorch, implementing optimization techniques like layer fusion and dynamic quantization, preparing him to tackle complex AI infrastructure problems at Tensorlake.

Engineering challenges at Tensorlake

Developing Tensorlake’s structured extraction engine presented unique challenges, especially given its multi-modal nature. “Structured extraction is easy when the schema for extraction is well-known,” explains Diptanu. For straightforward tasks like identifying cups in images, a predefined schema with specific columns makes the process relatively simple. However, Tensorlake aims to handle diverse data types such as videos, PDFs, images, text, and audio, and extract various types of information from them. “We are building a multi-modal structured extraction engine, which means developers can ingest any type of data, and want to extract arbitrary information from them,” he adds.

To accommodate these diverse use cases, Diptanu and his team had to ensure the system could utilize arbitrary models and create a flexible API for data retrieval. This involved dynamically defining schemas and building a custom SQL interface on top of a storage system for semi-structured data. “The schema had to be defined on the fly, and we had to build a custom SQL interface on top of a storage system that stores semi-structured data,” he elaborates. This dynamic approach allows Tensorlake to support a wide range of applications, from identifying key faces in videos to detecting hate speech in audio, showcasing the flexibility and robustness of their innovative engine.

Leveraging distributed systems for AI

With extensive experience in large-scale cluster schedulers at Netflix, LinkedIn, and HashiCorp, combined with work on AI systems at Facebook, Diptanu is uniquely positioned to address the needs of LLM-based applications at Tensorlake. “I don’t think I would have built this company had I not worked on really large-scale cluster schedulers in the past and then on large-scale AI systems at Facebook,” Diptanu reflects. This blend of expertise in distributed systems and AI is crucial, as LLMs come with an unprecedented number of parameters and pose significant challenges for real-time applications due to their low latency requirements.

At Tensorlake, the integration of distributed systems with AI technologies allows for effective LLM inference, which is essential for real-time applications used by end-users. “It’s mostly a systems engineering problem, but without the foundations in AI, it’s hard to understand the problems,” Diptanu explains. The complexity lies not only in the inference process but also in the data systems that feed into LLMs, which resemble complex analytics systems with added demands for real-time and low-latency performance. By leveraging his background in both fields, Diptanu ensures that Tensorlake can meet these rigorous demands, creating robust and efficient solutions for AI-driven applications.

Advice for future AI innovators

Diptanu emphasizes the importance of mastering the fundamentals for aspiring engineers and entrepreneurs in AI infrastructure. “Learn the basics of how computers work,” he advises, highlighting the long-term benefits of understanding compilers, computer architecture, and operating systems. Although these areas might seem distant from building cutting-edge applications, they form the foundation for more advanced work. “If you want to work on AI systems, knowing the fundamental maths of deep learning helps a ton,” he adds, as tackling complex problems often involves delving into the architecture of models.

Diptanu also offers specific guidance for entrepreneurs: “Knowing the problems of customers or consumers and figuring out what to build that lasts for the next 10 years is super important to building a good company.” By combining deep technical knowledge with a keen awareness of customer needs, future AI innovators can pave the way for significant advancements in the field.

Future prospects and long-term vision

“AI Infrastructure is going to have to support real-time data systems that can handle petabytes of data not only for training but for inference as well,” he explains. The goal is to seamlessly integrate structured and unstructured data, supporting applications from video analysis to text extraction.

Diptanu envisions a transformative impact across industries such as healthcare and finance, driven by Tensorlake’s AI/ML advancements. “Every human will interact with assistants to be more productive, and thus AI systems have to remember information about their users,” he comments, highlighting the need for powerful, contextually aware AI systems. Tensorlake aims to streamline routine tasks and push the boundaries of previous software generations, leading the charge in enhancing human productivity.

AI’s broader impact on society

Diptanu believes that AI will be as transformative and beneficial to humanity as past technological breakthroughs. “AI will make us more productive, and a lot of work that doesn’t require a lot of thinking but execution will be automated with far more ease with AI than previous generations of software systems,” he explains. This increased productivity will allow humans to focus on more complex and creative tasks, while routine and repetitive jobs are efficiently handled by AI systems.

However, Diptanu acknowledges the challenges ahead in defining how these AI systems should operate to best serve humanity. By developing advanced AI infrastructure at Tensorlake, he aims to contribute to this positive impact. “There are challenges in front of us for sure; we have to define how these systems should work to help humans,” Diptanu notes. Tensorlake’s work in AI and distributed systems is geared toward creating tools that not only advance technological capabilities but also enhance societal productivity and well-being.

Diptanu’s journey from his early fascination with technology to his leadership at Tensorlake highlights his significant contributions to AI and distributed systems. By leveraging his extensive experience from leading tech companies, he has built Tensorlake into a pioneering force in AI infrastructure. His innovative approaches to handling diverse data types and integrating distributed systems with AI/ML technologies ensure that Tensorlake remains at the forefront of technological advancements. As Diptanu continues to drive Tensorlake forward, his vision of democratizing AI infrastructure and enhancing productivity across various industries is set to make a lasting impact on society, paving the way for a more efficient and technologically advanced future.

Read More From Techbullion And Businesnewswire.com