Tuesday, Aug 8: 4:00 PM - 5:50 PM
Invited Paper Session
Metro Toronto Convention Centre
Sparse dictionary learning has a long history and produces wavelet-like filters when fed with natural image patches, corresponding to the V1 primary visual cortex of the human brain. Wavelets as local Fourier Transforms are interpretable in physical sciences and beyond. In this talk, we first describe adaptive wavelet distillation (AWD) to turn black-box deep learning models interpretable in cosmology and cellular biology problems while improving predictive performance. Then we present theoretical results that, under a very simple sparse dictionary model, gradient descent in auto-encoder fitting converges to one point on a manifold of global minima, and which minimum depends on the batch size. In particular, we show that when using a small batch-size as in stochastic gradient descent (SGD) a qualitatively different type of "feature selection" occurs from that in gradient descent.
, University of California at Berkeley