Aleksei Naumov, Lead AI Engineer at Terra Quantum—a European deeptech leader with over $100 million in funding—shares his journey from academic roots to industry leadership. With a degree in Physics from Lomonosov Moscow State University, Aleksei moved from university AI projects to a pivotal role in advancing AI technology.
Aleksei explains why the energy consumption of large language models (LLMs) could reach levels comparable to 160 companies the size of Meta and how model optimization can help prevent this outcome.
The interview explores Aleksei’s research experience, spanning from optimizing computer vision models to presenting a groundbreaking project on compression of LLMs at an IEEE conference in California. He offers a professional outlook on emerging AI trends and the future of AI optimization in the industry.
Aleksei Can you share your journey into AI and what initially attracted you to deep learning?
My journey into deep learning started during my university years. I earned a bachelor’s degree in physics from Lomonosov Moscow State University (ranked #37 in QS World University Rankings by Physics & Astronomy) with a specialization in robotics and applied mathematics. This academic background meant I frequently worked with data analysis and machine learning throughout my studies.
My first deep learning project was my bachelor’s thesis, where I developed an algorithm for automatic quadcopter landing using computer vision.
After graduating, I joined the Swiss company Terra Quantum in an AI research team. Eventually, I led the team, and we published several research projects in AI model optimization (including LLMs and computer vision) using tensor decomposition and tensor network methods. Our latest paper was recently published at the 2024 IEEE 7th International Conference on Multimedia Information Processing and Retrieval (MIPR), which I highly recommend reading.
Currently, I also lead a product development team specializing in large language models. Last year, I spoke about my journey in an interview with Michael Perelshtein (PhD in Quantum Physics and Director of Technology) and Artem Melnikov (Head of Applied Research): Interview Link (in Russian).
As someone specialized in efficient deep learning, how do you see the future of this field evolving?
Let me share my thoughts specifically regarding large language models (LLMs). Currently, companies and research labs developing LLMs (such as OpenAI, Meta, and Google) are in a race to create a universal, large-scale model that encompasses as much knowledge and capability as possible. This certainly drives innovation, but I don’t think this approach will remain dominant indefinitely.
Imagine a future where people genuinely depend on large language models (LLMs) in their daily routines—using them through chat interfaces, recommendation systems, and more—spending, let’s say, 5% of their time interacting with these technologies. This isn’t far-fetched. Using GPT-4 would require around 100 million H100 GPUs for this demand. The computational demand for this is massive, comparable to the entire capacity of around 160 companies like Meta.
Relying solely on massive models for every request is not energy-efficient. Whether we ask an LLM to solve a simple calculation like 2×2 or conduct complex research, we consume a similar amount of resources for both tasks (I’m oversimplifying, but that’s the general idea). Why consume so much energy for simple tasks when smaller models can handle them?
I anticipate that more LLM use cases will shift towards smaller, specialized models over time. I believe this shift will occur through knowledge distillation—transferring knowledge from larger models to smaller ones, which can then be used for specific tasks like copywriting, programming, or mathematics.
What innovations are you most proud of in your career so far, and what impact have they had on the field or the projects you’ve worked on?
I’m proud to lead a strong team specializing in AI and tensor networks, collaborating with some of the brightest minds in this field. There are two projects I’m especially proud of:
- TQCompressor: We developed an innovative method for compressing LLMs, reducing the size of GPT-2 by about 35% with minimal data loss. Additionally, due to improvements in our training method, we only used 3% of the original dataset, saving around 33 times in time, money, and resources. To contribute to AI research, we’ve made the algorithm and resulting TQCompressedGPT-2 model publicly available.
Aleksei Presenting “TQCompressor: Improving Tensor Decomposition Methods in Neural Networks via Permutations” project at IEEE MIPR 2024 Conference, San Jose, CA, USA.
- TetraAML: We created a comprehensive framework for optimizing computer vision models, addressing everything from model development to compression for efficient deployment on resource-limited devices. Our algorithm compressed the ResNet-18 model by 14.5 times with minimal quality loss.
What emerging AI trends excite you the most, and what role would you like to play in shaping these areas?
First, I’m excited that foundational LLM developers are increasingly focusing on on-device deployment use cases. For example, in one of its recent releases, Meta introduced Llama 3.2-1B and Llama 3.2-3B, specifically tailored for smartphone deployment, along with example applications like a mobile writing assistant: Meta Blog on Llama 3.2.
Second, I anticipate significant advancements in image and video generation. The recently released FLUX model for image generation has demonstrated incredible results and garnered strong enthusiasm within the AI community. The release of video generation model APIs by providers like Runway and Kling will finally allow AI developers worldwide to integrate video generation capabilities into their products. I expect this sector to experience tremendous growth, with new models and products emerging for applications in fields ranging from cinema to consumer apps and graphic design.
If you could envision AI five to ten years from now, what advancements or changes do you think will define the field?
Since my expertise is in efficient AI and on-device deployment, I’ll focus on my vision for the future in this area.
I foresee the mass adoption of on-device LLMs and the shift of LLM use cases from cloud GPUs to user devices. For users, this will mean enhanced data security and a fully integrated experience with their devices, making LLM capabilities as familiar as auto-correct is today. However, there will still be cases requiring cloud-based processing, either for known applications or new, yet-to-be-discovered scenarios.
I also anticipate the emergence of specialized AI hardware. This will include dedicated hardware for training clusters, GPU-like architectures for cloud AI inference, and specialized mobile chips for on-device AI. Additionally, I expect to see optimized hardware for generative neural networks tailored to image and video generation applications.
