Statistical Learning with Practical Constraints - by COPSS Leadership Academy Award Recipients

Amita Manatunga Chair
Emory University
Amita Manatunga Organizer
Emory University
Monday, Aug 7: 8:30 AM - 10:20 AM
Invited Paper Session 
Metro Toronto Convention Centre 
Room: CC-718B 



Main Sponsor

Committee of Presidents of Statistical Societies


Advances in Distribution Compression

This talk will introduce two new tools for summarizing a probability distribution more effectively than independent sampling or standard Markov chain Monte Carlo thinning:

1. Given an initial n point summary (for example, from independent sampling or a Markov chain), kernel thinning finds a subset of only square-root n points with comparable worst-case integration error across a reproducing kernel Hilbert space.

2. If the initial summary suffers from biases due to off-target sampling, tempering, or burn-in, Stein thinning simultaneously compresses the summary and improves the accuracy by correcting for these biases.

These tools are especially well-suited for tasks that incur substantial downstream computation costs per summary point like organ and tissue modeling in which each simulation consumes 1000s of CPU hours.  


Lester Mackey, Microsoft Research New England

An Automatic Finite-Sample Robustness Metric: Can Dropping a Little Data Change Conclusions?

One hopes that data analyses will be used to improve people's health, finances, and well-being. But analyzed data may systematically differ from data where decisions are ultimately applied. E.g., suppose we analyze data in one country and conclude that microcredit alleviates poverty, so we decide to distribute microcredit in other locations and times. We might ask: can we trust our conclusion to apply under new conditions? If we found that a very small percentage of the original data was instrumental to the original conclusion, we might worry about generalization. So we propose a method to assess the sensitivity of data analyses to the removal of a very small fraction of the data. Analyzing all possible data subsets of a certain size is computationally prohibitive, so we provide an approximation. Our approximation is automatically computable, theoretically supported, and works for common estimators. We show that any non-robustness our method finds is conclusive. We find that while some real-world applications are robust, in others the sign of a treatment effect can be changed by dropping less than 0.1 percent of the data, even in simple models and when standard errors are small. 


Tamara Broderick, MIT

Generalizing Warner's randomized response for private collection of bounded data

Warner's "randomized response" is a very simple mechanism to privately collect Yes/No responses. On being asked a question like "Did you use drugs before 18?", the respondent is instructed to first privately flip a coin with bias (say) 75%. If it comes up heads, they tell the truth, else they lie. Since the induced bias is known, it is easy to correct for it during statistical inference (eg: estimating teenage population-level drug usage). We present a simple nonparametric generalization of this mechanism that can be used to privately collect bounded, non-binary, data (eg: hours per day spent on a mobile phone, or age, or frequency of alcohol consumption). Our mechanism recovers Warner's as a special case, and can similarly be used for efficient statistical inference. We show how the IT industry can thus perform private A/B testing, and more generally how to do private two-sample, independence or conditional independence tests.  


Aaditya Ramdas, Carnegie Mellon University

Theoretical Guarantees for Sparse PCA based on the Elastic Net

Sparse principal component analysis (SPCA) has been widely used for dimensionality reduction and feature extraction in high-dimensional data analysis. Despite there being many methodological and theoretical developments of SPCA in the past two decades, the theoretical guarantees of the popular SPCA algorithm proposed by Zou, Hastie, and Tibshirani (2006) based on the elastic net are still unknown. To close this important theoretical gap, we first revisit the SPCA algorithm of Zou et al. (2006) and present our implementation. Also, we study a computationally more efficient variant of the SPCA algorithm in Zou et al. (2006) that can be considered as the limiting case. We provide the guarantees of convergence to a stationary point for both versions of SPCA. We prove that, under a sparse spiked covariance model, both algorithms can recover the principal subspace consistently under mild regularity conditions. We show that their estimation error bounds match the best available bounds of existing works or the minimax rates up to some logarithmic factors. Moreover, we demonstrate the numerical performance of both algorithms in simulation studies. 


Teng Zhang
Haoyi Yang, Penn State University


Lingzhou Xue, Pennsylvania State University