Latest News

AI-Driven Bidding Systems: Why Training Data Quality Now Determines Advertising Outcomes

A consumer submits a lead form in less than a minute, compares options across providers, and then leaves the browser. Days later, a transaction closes through a call center, a CRM workflow, or an in-person interaction. For the business, the outcome is clear. For the advertising system that initiated the interaction, the attribution is far less certain. The system must determine which auction influenced that decision, often without a direct or observable path connecting the two events.

This growing distance between real-time bidding decisions and delayed business outcomes has become a defining constraint in performance advertising. Machine learning systems now evaluate billions of auctions in milliseconds, estimating the likelihood that each impression will drive a conversion or generate value. These systems are highly optimized, continuously retraining on data to refine their predictions. Yet the effectiveness of this optimization no longer depends primarily on model sophistication. It is gated on whether the system can access accurate, complete, and timely signals that reflect real outcomes.

Shrey Hatle, a director of product and technical leader at PubMatic, and a judge for the America’s Conference on Information Systems (AMCIS) 2026 with over a decade of experience building large-scale performance advertising systems, has focused his work on strengthening this connection between outcome and optimization. Rather than treating bidding as an isolated decision layer, his approach centers on the integrity of the data feeding into those decisions. “Bidding systems are only as good as the input signals  they can learn from,” he explains. “When that feedback loop is incomplete, optimization becomes an approximation of value rather than a reflection of it.” His work has consistently addressed the structural gaps that prevent systems from learning effectively at scale.

The importance of this shift becomes clearer when considering the scale of automated advertising. Programmatic systems now account for more than 90% of digital display advertising spend in the United States, meaning that the majority of media investment is allocated through machine-driven decisions. At that scale, even small gaps in training data can compound into significant inefficiencies, as models optimize toward signals that only partially represent business outcomes. The challenge is no longer building systems that can make decisions quickly, but ensuring that those decisions are grounded in reliable data.

Model Capability vs Signal Reliability

Machine learning has fundamentally reshaped how advertising decisions are made. Modern bidding systems incorporate a wide range of contextual and behavioral inputs, including device characteristics, historical engagement patterns, and predicted conversion value. These models continuously update their parameters, learning from incoming data to improve accuracy and efficiency. From a technical perspective, the decisioning layer has reached a high level of maturity, capable of processing vast datasets and adapting in real time.

However, this progress has not been matched by equivalent improvements in signal quality. The data feeding these systems is often incomplete, delayed, or fragmented across multiple environments. Cross-device behavior introduces gaps in user journeys, while privacy constraints further limit the availability of persistent identifiers. In industries where conversions occur offline, the disconnect becomes even more pronounced, as the system may observe the initial interaction but fail to capture the outcome.

This creates a structural imbalance. Models are designed to optimize toward outcomes, yet they are frequently trained on proxy signals such as clicks or form submissions rather than actual events that matter to businesses. Over time, this reliance on incomplete signals can distort optimization, leading systems to prioritize actions that are measurable rather than those that are meaningful. “The model can evaluate millions of signals per second,” Hatle notes, “but it cannot compensate for signals that were never captured or never attributed to an outcome.” The limitation, therefore, is not computational. It is informational.

Rebuilding the Feedback Loop

Hatle’s work on Enhanced Conversions for Leads provides a concrete example of how this challenge can be addressed at scale. As Global Product Lead for lead generation measurement and bidding systems, he led the design and global launch of a privacy-resilient infrastructure that fundamentally improved how offline conversion signals are captured and integrated into machine learning models.

Lead generation represents a significant portion of digital advertising, particularly in industries such as insurance, financial services, healthcare, and education, where transactions often occur offline – outside the digital environment. Historically, these offline outcomes were difficult to connect to the online interactions that initiated them, resulting in incomplete training data for bidding systems. As privacy regulations tightened and traditional tracking mechanisms weakened, this gap became more pronounced, directly affecting the quality of model optimization.

Enhanced Conversions for Leads addressed this limitation by enabling advertisers to use their own first-party data, transmitted in a privacy-compliant manner, to establish deterministic links between digital interactions and offline outcomes. Instead of relying on fragile identifiers or tedious manual processes, the system introduced an easier way, a structured pipeline for ingesting, matching, and validating conversion data, allowing these signals to be incorporated into model training at scale.

The impact of this work was both measurable and systemic. Enhanced conversions for leads improved the accuracy of advertiser conversion measurement.  More importantly, the gains were driven by improved training data quality rather than changes to the bidding algorithm itself, demonstrating that optimization performance is fundamentally tied to the integrity of the signals being learned from.

This aligns with broader industry trends. In 2026, 82% of marketers report that AI-powered optimization is essential to their advertising strategy, reflecting widespread reliance on automated systems. However, adoption alone does not guarantee effectiveness. Without high-quality training data, these systems cannot fully realize their potential. “When outcome signals are incomplete, models optimize toward what is visible, not what is valuable,” Hatle explains. “Closing that gap changes how the system learns.” By restoring the connection between signals and outcomes, the system redefined how performance optimization operates at scale.

Scaling Signal Integrity Across Fragmented Channels

As advertising expands across channels such as connected television, mobile ecosystems, and privacy-constrained environments, maintaining signal integrity becomes more complex. Each new channel introduces additional fragmentation, making it more difficult to capture and connect user interactions across the full conversion journey. Identity resolution becomes less deterministic, attribution becomes less direct, and feedback loops become slower.

This shift requires a rethinking of performance infrastructure. Rather than treating measurement and bidding as separate functions, modern systems must integrate these components into a unified framework where data flows seamlessly from signal capture to model training and back into decisioning. First-party data strategies play a central role in this transformation, providing a more reliable foundation for signal collection while aligning with evolving privacy requirements.

In his current role, Hatle focuses on building performance advertising systems that operate across these fragmented environments, processing large volumes of data while ensuring accuracy, compliance, and scalability. The challenge is not simply to collect more data, but to ensure that the data collected is eligible for model training, arrives within a useful timeframe, and accurately represents real-world outcomes. “The advantage is no longer in who has the most data,” he observes. “It is in who can turn fragmented signals into reliable training inputs fast enough to influence bidding decisions.”

This challenge is becoming more urgent as budgets shift toward environments with less deterministic measurement. Approximately 45% of marketers are reallocating spend from linear television to connected TV, a channel that offers significant reach but introduces new attribution complexities. Without robust signal infrastructure, optimization in these environments risks being driven by incomplete data, limiting the effectiveness of even the most advanced bidding systems.

The Future of Bidding Is Decided Before the Bid

Performance advertising has long been framed as a problem of decision-making, focused on placing the right bid at the right moment. That framing is increasingly insufficient. Bidding systems are trained systems, and their effectiveness depends on the quality of the data they learn from. When that data is incomplete or inaccurate, optimization is constrained, regardless of how sophisticated the underlying model is.

The next phase of performance advertising will be defined by the strength of its data foundations. Signal recovery, attribution accuracy, and training data integrity will determine how effectively systems can operate in increasingly complex environments. The auction itself is not where performance is decided but where it is expressed, reflecting the cumulative impact of upstream data and modeling decisions.

Hatle’s work illustrates this shift, demonstrating how improvements in data quality can drive meaningful gains in performance without requiring fundamental changes to bidding algorithms. By focusing on the systems that enable learning rather than the mechanics of bidding alone, he has contributed to a broader redefinition of what it means to optimize at scale.

“The next phase of performance advertising will not be defined by faster bidding,” he concludes. “It will be defined by how well we understand outcomes, and how effectively we translate that understanding into data the system can work with and trust.”

 

Comments

TechBullion

FinTech News and Information

Copyright © 2026 TechBullion. All Rights Reserved.

To Top

Pin It on Pinterest

Share This