As an Applied Scientist at Uber working on Causal Inference, Experimental Design, and Machine Learning, I encounter the limitations of certain statistical approaches every day. In this article, I want to explore the strengths and weaknesses of Bayesian and frequentist methods in real-world data analysis, highlighting where each approach excels and where it falls short.
Frequentist and Bayesian methods represent fundamentally different philosophies in statistical inference, particularly evident in cases of repetitive estimations of similar experiments in different conditions. This kind of experimentation is often run in large IT companies. To demonstrate, let us focus on a simple use case: measuring the marginal efficiency of a certain incentive lever.
Example:
Let’s say, a company’s advertising budget is allocated on a weekly basis based on the efficiency of last week’s campaign. Efficiency can be defined as Incremental trips derived from incremental spend. This quantity is typically estimated based on continuously running wiggles to the intensity of treatment (exposure to advertising in this case). Every week, customers are randomly divided into high and low groups. High group gets a slightly higher advertising budget per customer than the low group. At the end of the week the marginal efficiency is the expected extra revenue derived from the customers of high group relative to revenue of low group divided by the extra spend that the high group incurred relative to the low group:
ME = (saleshigh – saleslow) / (costhigh – costlow)
Where ME is the marginal efficiency of advertising saleshigh, saleslow, are population mean sales that individual customers from high and low cohorts make in a week. costhigh, costlow is the population mean advertising spend per customer from high and low cohorts.
Difference between Bayesian and frequentist approach to estimation
There are multiple statistical approaches to estimate the Marginal efficiency, but let’s focus on the so called sample analog method, where we simply estimate the marginal efficiency by separately estimating saleshigh, saleslow, costhigh, costlow and substituting these in the analog formula:
MÊ = (saleshigh – saleslow) / (costhigh – costlow)
Frequentist approach treat every sample of data as coming from general population (random experiment). The objective of a frequentist estimator is to minimize the deviation of the estimate based on the sample, from the true population value. The expectation is taken over all possible samples from the population. Frequentist approach for this reason is inherently based only on the data in sample.
Bayesian approach formulates the statistical problem slightly differently: it postulates the knowledge of the prior distribution of a population parameter i.e. we know, what is the distribution from which the true population parameter comes from. Then given the observed data, the goal is to minimize the deviation of our estimate from the true population parameter. The expectation in this case is taken over the prior distribution of the population parameter.
Bayesian approach is combining the prior knowledge that researcher possesses with the data and the model of data generating process. It allows the use of extra source of information and I want to argue that this enables sharing of insights between experiments in case, when different experiments can be reasonably assumed to share the same prior distribution of parameter values. This could lead to more precise estimates of parameters.
Constraints of Frequentist Techniques
Frequentist estimation often employs techniques like the maximum likelihood estimator (MLE). The MLE is favored for its desirable properties, such as consistency, meaning it converges to the true parameter value as more data is collected. For instance, estimating the average time a driver waits for a passenger using MLE provides an estimate based on all observed trips. Although the MLE is unbiased on average, it is subject to sampling variability, leading to the use of confidence intervals to indicate the range where the true parameter likely falls.
As Uber collects more data, such as new rides or changing traffic patterns, frequentist methods can recalibrate estimates and confidence intervals, leading to increasingly precise predictions. For example, frequentist techniques could continuously update the estimate of average ride duration based on accumulating trip data. However, a key limitation of the frequentist approach is its inability to incorporate prior knowledge or update beliefs about a parameter as new data arrives. Each dataset is treated independently, which can be a drawback in a rapidly changing environment like Uber’s, where historical data and expert insights could provide valuable context.
The Dynamic Nature of Bayesian Inference
In contrast, the Bayesian approach provides a more dynamic perspective. Bayesian inference combines prior beliefs about parameters with observed data to produce updated beliefs, or posterior distributions. For example, if Uber has historical data on rider demand during holidays, Bayesian methods can update this prior information with current data to refine real-time estimates.
Bayesian estimation focuses on the entire posterior distribution of the parameter, not just point estimates or confidence intervals. This distribution reflects updated beliefs after considering the data. For example, when estimating the likelihood of a surge pricing event, Bayesian methods provide a range that describes uncertainty, rather than a single estimate. Point estimates, such as the posterior mean or median, can be derived from this distribution.
Benefits of Bayesian Methods
Bayesian inference enables flexible decision-making by continuously updating the posterior distribution as new data arrives. For example, during rush hour, Bayesian models can adjust ride time predictions based on new traffic data, improving accuracy. This approach is particularly useful for real-time scenarios and when prior knowledge is crucial. Additionally, Bayesian methods quantify uncertainty at each stage, which is valuable in high-stakes situations like optimizing Uber’s supply and demand.
A key difference between frequentist and Bayesian approaches is their interpretation of probability. Frequentist methods view probability as a long-run frequency, interpreting estimates and intervals based on how they perform over many trials. Bayesian methods, however, treat probability as a measure of belief or uncertainty, allowing direct statements about the likelihood of a parameter value given the data. This can be powerful for integrating past experiences with current observations.
Handling New Data: Frequentist vs. Bayesian
Another distinction is how each approach handles new data. Frequentist methods treat each dataset independently, recalculating estimates as if starting from scratch each time. In contrast, Bayesian methods view data accumulation as a cumulative process, continuously refining existing knowledge with each new dataset. This makes the Bayesian approach particularly effective for continuous learning and adaptation, such as adjusting pricing algorithms or predicting demand in different neighborhoods.
Complementary Approaches
The frequentist approach’s reliance on fixed procedures and confidence levels offers a clear, objective framework for statistical inference but may not fully utilize prior knowledge. Bayesian methods introduce subjectivity through prior distributions, which can be a disadvantage but also beneficial when prior knowledge is valuable or decisions must be made amid uncertainty.
In conclusion, frequentist and Bayesian approaches offer complementary perspectives on statistical inference, especially in real-time data environments like Uber. The frequentist approach provides a robust framework with clear procedures, while the Bayesian approach offers flexibility and a dynamic method for updating beliefs as new information emerges. The choice between these methods depends on the context, including the need for real-time decisions and the availability of prior information. Together, these approaches form a comprehensive toolkit for addressing challenges in data analysis and decision-making within a technology-driven company like Uber.
