As artificial intelligence (AI) and machine learning (ML) continue to advance, the demand for sophisticated silicon chips powering these technologies is rising. The complexity of these SoCs that run complex Neural, convolutional, Gen-AI type workloads requires advanced characterization and production methods to ensure reliability, performance, and scalability. Sriharsha Vinjamury, a principal engineer with the Solutions Engineering group at Arm Inc., is a leader in this field, making significant progress in ATE testing and the manufacturability of complex AI chips.
Key Challenges faced in the Industry:
Testing complex AI and ML-based chips presents several key challenges. One of the primary difficulties is ensuring comprehensive fault coverage due to the sheer number of transistors and the intricate architectures involved. Mark Papermaster, CTO of AMD, has noted that “as the complexity of chip architectures increases, the challenge of achieving comprehensive fault coverage becomes more significant.” Signal integrity and timing analysis are also critical, as even minor deviations can lead to significant performance issues. Lisa Su, CEO of AMD, emphasizes that “maintaining signal integrity and precise timing is crucial, as these factors directly impact the performance and reliability of AI chips.” Scalability in testing is another challenge, as AI chips require adaptable frameworks to manage increasing complexity and data loads. Jensen Huang, CEO of NVIDIA, points out that “the scalability of testing frameworks is essential to keep pace with the growing complexity of AI workloads.” Additionally, thermal management and power efficiency must be meticulously tested to prevent overheating and ensure reliable operation. Finally, real-time data analysis is essential for identifying and correcting potential failures quickly, which is particularly challenging given the fast-paced evolution of AI technology.
New Approaches to Chip Testing
Traditional methods of testing silicon chips struggle to keep up with the complexities of AI and ML applications. “Silicon chips designed for AI and ML are highly dense, containing over 50 billion transistors. This necessitates new and innovative testing approaches,” explains Sriharsha Vinjamury, a recognized expert in VLSI testing. In his article “Productising Complex Silicon Chips Using Partial-Binning Techniques” published in CMM magazine, Vinjamury discusses how chips used in Gen-AI, LLM, and DL often face high defectivity rates due to their immense complexity.
Vinjamury has developed advanced testing methodologies to overcome these challenges, including Partial-good(sometimes called partial Binning) Charz suites to speed up multi-core characterization validating complex ATE cards that support high power, on-chip analysis, and predictive frequency and voltage scaling. These innovative strategies utilize real-time data and adaptive pass/fail criteria, efficiently identifying and correcting issues, and are estimated to save test efficiency by over 20% and test costs by over 15%.
Incorporating partial-binning techniques, Vinjamury has reduced costs and accelerated the time-to-market for these complex silicon chips. His groundbreaking approaches are setting new industry standards, demonstrating a forward-thinking response to the demands of next-generation semiconductor technologies.
Using AI and Automation in Testing
Adding AI to testing has improved efficiency and accuracy, especially in ATE and SLT environments. “Using AI in the Test-Programs helps identify semi-random silicon defects and help shift-left the latent/Dormant Errors,” Vinjamury explains. By automating routine tests, engineers can focus on more complex issues, increasing overall efficiency.
AI-driven analytics predict potential failures and optimize testing by analyzing data patterns to find anomalies. This ensures that only the most critical tests are performed, optimizing coverage and reducing cycle times. “AI can Also improve fault coverage by dynamically adjusting test vectors based on real-time data,” Vinjamury adds, highlighting the benefits of speed and cost reduction.
Scalable Solutions for a Growing Industry
As AI and ML applications grow, so does the complexity and volume of silicon chips. Vinjamury has developed scalable testing frameworks to handle increased data loads and complexity without compromising accuracy. “Scalability is a Major concern in the Production of AI and ML Silicon. with growing transistor Density, defectivity grows, and getting viable yields is a huge challenge,” he notes. His modular testing approaches allow for incremental scaling, ensuring the testing infrastructure can evolve with the technology.
This scalability supports the industry’s growth, allowing for the seamless introduction of more advanced chips. Vinjamury’s work ensures that testing frameworks can grow and adapt with technological advancements, meeting increasing demands while maintaining high performance and reliability standards.
Ensuring Reliability and Performance
Vinjamury has established detailed metrics for evaluating the reliability and performance of these chips. “Reliability and performance are paramount for AI and ML silicon chips, where factors such as signal integrity, thermal management, and power efficiency play critical roles. Ensuring low latency and high throughput in tensor processing units (TPUs) and neural network accelerators is vital for maintaining optimal compute performance under varying workloads,” he says. He has developed comprehensive metrics to assess power consumption, processing speed, thermal performance, and error rates, ensuring that chips perform as expected under various conditions.
These metrics maintain the high standards required in AI and ML applications. Vinjamury’s focus on reliability and performance ensures that each chip meets and exceeds industry standards, setting high expectations for quality and dependability in AI and ML chip production.
Collaboration for Progress
Recognizing the importance of collaboration between industry and academia, Vinjamury has led joint research initiatives and partnerships that have driven advancements in testing methods. “Collaboration is key to progress,” he emphasizes. His efforts have developed new testing tools and techniques, keeping the industry ahead.
These collaborations have enhanced testing methods and fostered a community of shared knowledge and resources. Vinjamury has enabled new ideas and solutions that benefit the entire industry by bridging the gap between academic research and practical application.
Building a Legacy of Technical Expertise
With extensive experience in the VLSI space and roles at NXP, Qualcomm, Tesla, and Arm Inc., Vinjamury has developed advanced test strategies and robust frameworks for AI and ML silicon chips. His ability to foresee and address challenges in testing AI chips sets him apart in the industry.
At Qualcomm, he improved a solution that enhanced device segregation and improved throughput by over 30%. He also introduced a bridge-tester system called Introspect, bridging gaps between ATE and SLT platforms. These contributions reflect his expertise and practical thinking.
A Career of Expertise and Dedication
Starting his career with a master’s degree in electrical engineering, Vinjamury rapidly became a formidable presence in semiconductor testing. Over the past 12 years, he has led teams, built state-of-the-art test labs, and pioneered innovative processes at industry giants like Qualcomm, Tesla, and ARM, playing a pivotal role in the success of numerous high-profile projects.
Vinjamury’s unwavering dedication and sharp foresight distinguish him in the industry. His ability to anticipate challenges and devise practical, effective solutions has made him an invaluable asset to every team he’s part of. With a strong commitment to continuous learning and adaptation, Vinjamury remains at the cutting edge of semiconductor testing, solidifying his reputation as a key figure and thought leader.
Shaping the Future of Semiconductor Testing
The future of AI and ML silicon chips depends on efficient testing and production. Vinjamury’s insights and practical approach are critical for navigating these challenges. “We are on the brink of a new era in silicon chip testing and production,” he highlights. “By embracing new techniques and collaboration, we can ensure that these complex chips meet the high standards required for AI and ML applications, driving progress in many fields.”
Vinjamury’s contributions continue to shape the future of semiconductor technology, highlighting the importance of adapting to new realities in the VLSI space. His work and leadership underscore the critical role of new solutions and collaboration in advancing technology. His vision for the future includes a continuous pursuit of excellence and a commitment to fostering progress through collaborative efforts.
