Artificial intelligence

Streaming Entity Resolution at Web Scale: How Modern Intelligence Is Built From Billions of Noisy Signals

In the automation, personalization, and real-time decision-making world, companies depend on data infrastructure that can understand the world with precision. Behind the dashboards, models, and sales intelligence tools that enterprises use every day sits a technical problem most people never notice: stitching together billions of fragmented, inconsistent, and often contradictory signals into a single, reliable view of a company. The field calls this entity resolution, but its consequences reach far past the label. It affects whether a business makes the right call, whether an AI model behaves the way it should, and whether a data system bends or breaks under scale.

Few engineers have lived at this intersection of intelligence, distributed systems, and machine learning as long as Rohit Muthyala, a Principal Software Engineer at ZoomInfo and a seasoned data engineering leader. Today, he helps shape the real-time data backbone behind ZoomInfo’s global intelligence platform—an architecture that refreshes firmographic and technographic data in seconds rather than hours. His work blends production-grade system design with academic rigor, informed in part by peer-reviewed research like his IEEE-published paper, Data-Driven Job Search Engine Using Skills and Company Attribute Filters, which explored how structured signals can sharpen large-scale matching systems.

“Real business intelligence doesn’t start with dashboards,” Rohit says. “It starts much earlier—at the point where you decide what counts as the truth. If a system cannot unify its view of reality, everything built on top of it collapses.”

That view comes from years spent designing and scaling one of the industry’s most influential entity-resolution platforms, beginning during his time at EverString.

Turning Internet-Scale Chaos Into Consistent Truth

From May 2019 to November 2020, Rohit built EverString’s first production-grade web-scale ER platform—a system that ingested vendor feeds, scraped web data, processed unstructured signals, and transformed them into precise, reliable company profiles used across enterprise products. This was not incremental work. At the time, record unification was still largely batch-oriented, error-prone, and unable to support the rapid refresh cycles that customers increasingly demanded.

Rohit approached the problem by combining rigorous modeling with fault-tolerant, cloud-native system design. The platform he architected could normalize disparate global data sources, generate candidates intelligently to avoid untenable N-squared comparisons, score potential matches using supervised learning, and consolidate them through graph-based clustering that enforced strict safeguards to prevent erroneous merges. It also maintained complete provenance, an essential feature for enterprise auditability and regulatory compliance.

“The internet is full of near-duplicates, outdated pages, translated content, and conflicting attributes,” he says. “The job of entity resolution is not merely to decide what matches—but to decide what doesn’t. That’s where engineering judgment really shows up.”

Even more challenging was the requirement for continuous, incremental refresh. Legacy systems would rebuild their entire knowledge graph every few months; Rohit engineered pipelines—leveraging technologies such as Apache Hudi—that could incorporate new signals continuously. What once took months now happened in hours. What once could not be trusted became a foundation for revenue-generating products.

The result was transformative. The platform became the core of EverString’s intelligence stack and materially contributed to nearly $10 million in annual recurring revenue, supporting enterprise expansion and ultimately strengthening the company’s position ahead of its acquisition by ZoomInfo in 2020.

Redefining Real-Time Intelligence at Scale

At ZoomInfo, Rohit applied the same engineering rigor at an even greater scale. He designed real-time streaming pipelines capable of processing billions of signals through Kafka, Flink, Spark, and cloud-native orchestration layers. He led the transformation of monolithic systems into resilient microservices governed by clear SLI/SLO targets. He implemented observability for data freshness, completeness, and long-tail precision—critical guardrails for customers who depend on ZoomInfo for daily business decisions.

His work on the company’s streaming entity-resolution system enabled second-level updates and freed teams from slow, brittle full-refresh cycles. By aligning model evaluation, infrastructure design, and data governance, he helped lay the groundwork for a platform that can evolve continuously—absorbing new global signals without sacrificing accuracy.

“Real-time intelligence is not about speed alone,” Rohit says. “It’s about trust. A system updating every second is useless if it updates with the wrong truth.”

This ethos—precision before speed—shapes every architecture he touches.

Engineering the Future of Intelligence

As the demand for accurate, real-time intelligence grows, the systems Rohit designs are becoming essential infrastructure for the modern data economy. They enable global companies to target markets, understand customers, and power AI-driven automation with confidence. They unify billions of signals into a single coherent view of reality.

And they are built by engineers like Rohit who understand that intelligence is not just an output—it is an architecture.

“Every insight a customer sees,” he says, “begins as a messy signal somewhere in the world. Our job is to make it precise. Once you do that, everything else becomes possible.”

Beyond technical systems, Rohit is known for mentoring engineers, shaping engineering culture, and translating academic techniques into production-grade workflows. His earlier research contributions, including the paper Quantum Interference for Counting Clusters, influence how he designs clustering logic in real-world environments where ambiguity is the norm rather than the exception.

His work continues to push the boundaries of what real-time data systems can achieve, setting the stage for a future where enterprise intelligence is not just fast—but impeccably accurate, gracefully scalable, and deeply trustworthy.

Comments
To Top

Pin It on Pinterest

Share This