Integrating Algorithms and Analysis for Adaptively Randomized Experiments

John Langford Chair
Microsoft Research
 
Sofia Villar Discussant
University of Cambridge
 
Aaditya Ramdas Organizer
Carnegie Mellon University
 
Joseph Jay Williams Organizer
University of Toronto Statistics, Computer Science
 
Tong Li Organizer
 
Sunday, Aug 6: 4:00 PM - 5:50 PM
1423 
Invited Paper Session 
Metro Toronto Convention Centre 
Room: CC-701B 

Applied

Yes

Main Sponsor

Section on Statistical Learning and Data Science

Co Sponsors

Association for the Advancement of Artificial Intelligence
Section on Statistical Computing

Presentations

A Bandit Approach to Multiple Testing with False Discovery Control

We propose an adaptive sampling approach for multiple testing which aims to maximize statistical power while ensuring anytime false discovery control. We consider n distributions whose means are partitioned by whether they are below or equal to a baseline (nulls), versus above the baseline (actual positives). In addition, each distribution can be sequentially and repeatedly sampled. Inspired by the multi-armed bandit literature, we provide an algorithm that takes as few samples as possible to exceed a target true positive proportion (i.e. proportion of actual positives discovered) while giving anytime control of the false discovery proportion (nulls predicted as actual positives). Our sample complexity results match known information theoretic lower bounds and through simulations we show a substantial performance improvement over uniform sampling and an adaptive elimination style algorithm. Given the simplicity of the approach, and its sample efficiency, the method has promise for wide adoption in the biological sciences, clinical testing for drug discovery, and online A/B/n testing problems. 

Speaker

Kevin Jamieson, University of Washington

A review of adaptive experiments conducted at Meta

I review examples of adaptive experiments conducted at Meta. Providing information about the kinds of statistical and computation problems faced in doing adaptive experiments at scale in products that go out to billions of users.
I present work on using Bayesian optimization to more efficiently conduct large scale experimentation. And I consider cancer problems as well as other challenging issues that arise to be solved by statisticians and machine learning researchers in doing adaptive experiments, technology companies. I've set up open source frameworks for adaptive experimentation that provides opportunities for statisticians and machine learning researchers to understand the problems being solved and the best solutions open to industry, as well as save potential for contributing their own techniques and methods.
 

Speaker

Eytan Bakshy, Facebook

Algorithms for Adaptive Experiments that Trade-off Statistical Analysis with Reward: Adaptively Combining Uniform Random Assignment and Thompson Sampling

Multi-armed bandit algorithms like Thompson Sampling (TS) can be used to conduct adaptive experiments to improve user experience. However, such assignment strategies can often increase False Positive Rate (FPR) and decrease statistical power . We tackle this by introducing a novel heuristic algorithm, called TS-PostDiff (Posterior Probability of Difference), which takes a Bayesian approach to mixing TS and Uniform Random (UR) strategies, allowing for more UR exploration when there is little or no reward to be gained. We evaluate TS-PostDiff against state-of-the-art strategies. The empirical and simulation results help characterize the trade-offs of these approaches between reward, FPR, and statistical power, as well as under which circumstances each is effective. We quantify the advantage of TS-PostDiff in performing well across multiple differences in arm means (effect sizes), showing the benefits of adaptively changing randomization/exploration in TS in a "Statistically Considerate'' manner. This highlights important considerations for future algorithm development and analysis to better balance reward and statistical analysis. 

Co-Author

Joseph Jay Williams, University of Toronto Statistics, Computer Science

Speaker

Tong Li

Anytime-valid off-policy inference for contextual bandits

Contextual bandit algorithms are becoming ubiquitous tools for adaptive sequential experimentation in healthcare and the tech industry. This adaptivity raises interesting but hard statistical inference questions: e.g., how would our algorithm had done if we used a policy different from the logging policy that was used to collect the data – a problem known as "off-policy evaluation" (OPE).
Using modern martingale techniques, we present a suite of methods for OPE inference that can be used even while the original experiment is still running (that is, not necessarily post-hoc), when the logging policy is data-adaptively changing due to learning, and even if the context distributions are drifting over time.
Concretely, we derive confidence sequences – sequences of confidence intervals that are uniformly valid over time – for various functionals, including (time-varying) off-policy mean reward values, as well as the entire CDF of the off-policy rewards. All of our methods (a) are valid at arbitrary stopping times (b) only make nonparametric assumptions, (c) do not require known bounds on the maximal importance weights, and (d) adapt to the empirical variance of our estimators. 

Speaker

Ian Waudby-Smith, Carnegie Mellon University

Efficient Inference Without Trading-off Regret in Bandits: An Allocation Probability Test for Thompson Sampling

Using bandit algorithms to conduct adaptive randomised experiments can minimise regret, but it poses major challenges for statistical inference. Recent attempts to address these challenges typically impose restrictions on the exploitative nature of the bandit algorithm–trading off regret–and require large sample sizes to ensure asymptotic guarantees. However, large experiments generally follow a successful pilot study, which is tightly constrained in its size or duration. Increasing power in such small pilot experiments, without limiting the adaptive nature of the algorithm, can allow promising interventions to reach a larger experimental phase. In this work we introduce a novel hypothesis test, uniquely based on the allocation probabilities of the bandit algorithm, and without constraining its exploitative nature or requiring a minimum experimental size. We characterise our Allocation Probability Test when applied to Thompson Sampling, presenting its theoretical properties, and illustrating its finite-sample performances compared to state-of-the-art approaches. We show the regret and inferential advantages of our approach in both simulations and in a real-world experiment. 

Speaker

Nina Deliu, Sapienza University of Rome

Statistical Inference After Using Online Reinforcement Learning on Longitudinal Data

Online reinforcement learning algorithms are increasingly used in digital intervention experiments to optimize treatment delivery for users over time. In this work, we focus on longitudinal user data collected by a large class of adaptive sampling algorithms that are designed to optimize treatment decisions online using accruing data from multiple users. Combining or "pooling" data across users allows adaptive sampling algorithms to potentially learn faster. However, by pooling, these algorithms induce dependence between the sampled user data trajectories; this can cause standard variance estimators for i.i.d. data to underestimate the true variance of estimators on this data type. We develop novel methods to perform a variety of statistical analyses on such adaptively sampled data via Z-estimation. Specifically, we introduce the adaptive sandwich variance estimator, a corrected sandwich estimator that leads to consistent variance estimates under adaptive sampling. Additionally, I discuss how we plan to use this approach for the primary analysis for Oralytics, a mobile health study to encourage oral self-care, that will use a pooling reinforcement learning algorithm. 

Speaker

Kelly Zhang