Monday, Aug 7: 8:30 AM - 10:20 AM
Topic-Contributed Paper Session
Metro Toronto Convention Centre
International Indian Statistical Association
Section on Physical and Engineering Sciences
Section on Statistics and the Environment
Most general-purpose classification methods, such as support-vector machine (SVM)
and random forest (RF), fail to account for an unusual characteristic of astronomical
data: known measurement error uncertainties. In astronomical data, this information is
often given in the data but discarded because popular machine learning classifiers
cannot incorporate it. We propose a simulation-based approach that incorporates
heteroscedastic measurement error into an existing classification method to better
quantify uncertainty in classification. The proposed method first simulates perturbed
realizations of the data from a Bayesian posterior predictive distribution of a Gaussian
measurement error model. Then, a chosen classifier is fit to each simulation. The
variation across the simulations naturally reflects the uncertainty propagated from the
measurement errors in both labeled and unlabeled data sets. We demonstrate the use
of this approach via two numerical studies. The first is a thorough simulation study
applying the proposed procedure to SVM and RF, which are well-known hard and soft
classifiers, respectively. The second study is a realistic classification problem.
Model fitting with Poisson counting processes and validation has been adopted and put into software packages for high energy physics. The heterogeneous Poisson counting process with a large number of energy bins with zero counts makes traditional large sample approximations inappropriate to use in practice. Numerical solutions have been proposed in the astrophysics literature. Astronomers have always been interested in learning theoretical guarantees of the procedures that they adopt. We study the problem of goodness-of-fit with rigorous statistical methods, and show practical implications of our results with numerical studies.
I discuss examples of how we at the CHASC Astrostatistics Collaboration have done several analyses of high-energy astronomical data (spatial, spatial+temporal, spatial+spectral, spectral+temporal, spatial+spectral+temporal) tailored to the specific astronomical problems that are of interest. The differences in approaches highlight what compromises were necessary to achieve progress and what trade-offs were needed to maximize the utility of the results. The methods include Bayesian, frequentist, computer vision, and machine learning techniques, several used in combination.
, Center for Astrophysics | Harvard & Smithsonian
Astronomers often deal with data where the covariates and the dependent variable are measured with heteroskedastic, non-Gaussian errors. While techniques have been developed for estimating regression parameters for data with heteroskedasticity and measurement errors, most methods lack procedures for model validation such as checking structural assumptions. We develop a model validation test, using ideas from conformal prediction, that is invariant to heteroskedasticity and measurement errors. We empirically demonstrate that this new test gives finite-sample control over type 1 error probabilities under a variety of assumptions on the measurement errors in the observed data, while other prediction intervals do not. We further demonstrate how our conformal prediction approach can be used for testing structural assumptions of proposed models from the literature relating planet mass and planet radius.