Modeling techniques for astrostatistical datasets

Sujit Ghosh Chair
North Carolina State University
Jonathan Williams Discussant
North Carolina State University
Jonathan Williams Organizer
North Carolina State University
Monday, Aug 7: 8:30 AM - 10:20 AM
Topic-Contributed Paper Session 
Metro Toronto Convention Centre 
Room: CC-206F 



Main Sponsor

International Indian Statistical Association

Co Sponsors

Section on Physical and Engineering Sciences
Section on Statistics and the Environment


Incorporating Measurement Error in Astronomical Object Classification

Most general-purpose classification methods, such as support-vector machine (SVM)
and random forest (RF), fail to account for an unusual characteristic of astronomical
data: known measurement error uncertainties. In astronomical data, this information is
often given in the data but discarded because popular machine learning classifiers
cannot incorporate it. We propose a simulation-based approach that incorporates
heteroscedastic measurement error into an existing classification method to better
quantify uncertainty in classification. The proposed method first simulates perturbed
realizations of the data from a Bayesian posterior predictive distribution of a Gaussian
measurement error model. Then, a chosen classifier is fit to each simulation. The
variation across the simulations naturally reflects the uncertainty propagated from the
measurement errors in both labeled and unlabeled data sets. We demonstrate the use
of this approach via two numerical studies. The first is a thorough simulation study
applying the proposed procedure to SVM and RF, which are well-known hard and soft
classifiers, respectively. The second study is a realistic classification problem. 


Hyungsuk Tak, Pennsylvania State University

Model fitting and goodness-of-fit in astrophysics

Model fitting with Poisson counting processes and validation has been adopted and put into software packages for high energy physics. The heterogeneous Poisson counting process with a large number of energy bins with zero counts makes traditional large sample approximations inappropriate to use in practice. Numerical solutions have been proposed in the astrophysics literature. Astronomers have always been interested in learning theoretical guarantees of the procedures that they adopt. We study the problem of goodness-of-fit with rigorous statistical methods, and show practical implications of our results with numerical studies.


Yang Chen, University of Michigan

Pragmatic Approaches to Modeling Multi-Dimensional Astronomical Data

I discuss examples of how we at the CHASC Astrostatistics Collaboration have done several analyses of high-energy astronomical data (spatial, spatial+temporal, spatial+spectral, spectral+temporal, spatial+spectral+temporal) tailored to the specific astronomical problems that are of interest. The differences in approaches highlight what compromises were necessary to achieve progress and what trade-offs were needed to maximize the utility of the results. The methods include Bayesian, frequentist, computer vision, and machine learning techniques, several used in combination. 


Vinay Kashyap, Center for Astrophysics | Harvard & Smithsonian

Model Validation under Heteroskedastic Measurement Error Models

Astronomers often deal with data where the covariates and the dependent variable are measured with heteroskedastic, non-Gaussian errors. While techniques have been developed for estimating regression parameters for data with heteroskedasticity and measurement errors, most methods lack procedures for model validation such as checking structural assumptions. We develop a model validation test, using ideas from conformal prediction, that is invariant to heteroskedasticity and measurement errors. We empirically demonstrate that this new test gives finite-sample control over type 1 error probabilities under a variety of assumptions on the measurement errors in the observed data, while other prediction intervals do not. We further demonstrate how our conformal prediction approach can be used for testing structural assumptions of proposed models from the literature relating planet mass and planet radius. 


Naomi Giertych, North Carolina State University