Monday, Aug 7: 8:35 AM - 8:55 AM
Topic-Contributed Paper Session
Metro Toronto Convention Centre
Most general-purpose classification methods, such as support-vector machine (SVM)
and random forest (RF), fail to account for an unusual characteristic of astronomical
data: known measurement error uncertainties. In astronomical data, this information is
often given in the data but discarded because popular machine learning classifiers
cannot incorporate it. We propose a simulation-based approach that incorporates
heteroscedastic measurement error into an existing classification method to better
quantify uncertainty in classification. The proposed method first simulates perturbed
realizations of the data from a Bayesian posterior predictive distribution of a Gaussian
measurement error model. Then, a chosen classifier is fit to each simulation. The
variation across the simulations naturally reflects the uncertainty propagated from the
measurement errors in both labeled and unlabeled data sets. We demonstrate the use
of this approach via two numerical studies. The first is a thorough simulation study
applying the proposed procedure to SVM and RF, which are well-known hard and soft
classifiers, respectively. The second study is a realistic classification problem.