Hierarchical models for survey data

Akhil Vaish Chair
RTI International
 
Monday, Aug 7: 10:30 AM - 12:20 PM
0046 
Contributed Papers 
Metro Toronto Convention Centre 
Room: CC-713B 

Main Sponsor

Survey Research Methods Section

Presentations

Accounting for dose uncertainty in dose-response curve estimation using hierarchical Bayes models

The relationship between a noise exposure level of a supersonic flight event and the probability of an individual being highly annoyed by the event is quantified by a dose-response curve. Dose-response curve estimation is based on logistic regression modeling, and it is subject to bias when the dose is measured with error. The National Aeronautics and Space Administration (NASA) are planning to conduct a set of supersonic aircraft annoyance surveys in select communities in the U.S., to measure public perception of reduced sonic boom. Unavoidable amounts of estimation error are expected in these upcoming noise measurements. Hence, the development of an estimation approach that accounts for such error when estimating the dose-response curve is imperative. In this paper, an evaluation study is conducted to assess the impact of measurement error on dose-response curve estimation. For this, hierarchical Bayes models are fit to data collected by NASA in 2018, as part of a risk-reduction study of supersonic flights affecting Galveston, Texas. It is observed that the estimated dose-response curve appears sensitive to uncertainty in dose measurements, being subject to attenuation bias. 

Keywords

community response

measurement error

noise level

supersonic aircraft 

Co-Author

Jean Opsomer, Westat

First Author

Andreea Erciulescu, Westat

Presenting Author

Andreea Erciulescu, Westat

WITHDRAWN Small Area Estimation using Samplics, a Python Package for Survey Analysis

Survey samples are often selected using predefined probabilistic methods from finite populations. Complex sampling designs facilitate fieldwork and keep costs under control (e.g., stratification, clustering, stage sampling, etc.), resulting in samples with unequal selection probabilities. Techniques such as sample selection, weight adjustment, and sample analysis need to account for the complexity of the sampling design. Until the development of samplics, Python users did not have a comprehensive library for designing and analyzing complex survey samples. The Python package samplics provides modules for sample size calculation, sample selection and weighting, and population parameter estimation, including small area estimation. Small Area Estimation methods are increasingly used to provide local estimates to inform and guide public policy decisions. In this presentation, we will present the modules of Samplics dedicated to producing small-area estimates including the validation of the estimates. We will review both area-level and unit-level models under the empirical Bayes framework. We will motivate the discussion with applications from developing countries. 

Keywords

Small area estimation

Empirical best prediction

Survey sampling

Python

Samplics 

First Author

Mamadou Diallo, Samplics LLC

Mixture Model and Its Application

Federal Statistical Agencies are required to produce estimates of subpopulations with few samples. Traditional design-based estimates based on only sample data from subpopulations are unreliable. For over forty years, the Fay-Herriot model has been widely used to produce reliable small area statistics. This model develops prediction of small area of interest based on a linear regression on auxiliary variables. The Fay-Herriot model are treated as independent and normally distributed zero-mean random variables with an unknown variance. It is sensitive to outliers because the outliers may result in overestimation of the model variance. In this talk, we propose a new robust estimation approach to estimate small area populations. The robustness property is achieved by replacing the standard normality assumption of the model errors by a mixture of two normal distributions with different variances, making this mixture model less sensitive to outliers. Finally, we compare the estimates from the proposed mixture model to alternative existing methods using study data set from the Cash Rents Survey conducted by the National Agricultural Statistics Service (NASS). 

Keywords

Mixture model



Small area estimation

Fay-Herriot model

Cash Rents Survey 

Co-Author(s)

Lu Chen, NISS
Gauri Datta, University of Georgia

First Author

Yang Cheng, National Agricultural Statistics Service

Presenting Author

Yang Cheng, National Agricultural Statistics Service

Comparison of Measurement Error Models with Differentially Private Covariates

The Fay-Herriot model is a popular approach for estimating small area means using a linear mixed effects model. This approach is used in applications such as the U.S. Census Bureau's Small Area Income and Poverty Estimates (SAIPE) program for producing estimates of poverty for different subgroups, such as at the county level. These models take into account the uncertainty due to sampling error; however, the addition of privacy protection on data sources used as covariates can introduce additional noise. One such method of protecting respondent privacy is differential privacy, which was adopted by the U.S. Census Bureau beginning with the 2020 Decennial Census. In this work, we compare small area measurement error models for covariates which have been treated with differential privacy. We focus on the case of discrete noise added to count data at various levels. 

Keywords

Small Area Estimation

Fay Herriot

Differential Privacy 

First Author

Kyle Irimata

Presenting Author

Kyle Irimata

WITHDRAWN Data Integration For Small Areas using Dirichlet Process

We consider the problem of integrating probability sample and nonprobability sample. For the nonprobability sample, there are no survey weights, but for the probability sample, the weights are available. We use matching method to calculate the weight for the nonprobability sample and then use the Fay-Herriot model to predict the finite population mean for each area. Bayesian predictive inference for small area estimation is studied by embedding a parametric model in a nonparametric model. The Dirichlet process has attractive properties such as clustering that permits borrowing information. For sampling, we use a stick break algorithm. 

Keywords

Big data

Covariate

Finite population mean

Dirichlet process

Small area

Selection bias 

Co-Author

Balgobin Nandram, Worcester Polytechnic Institute

First Author

Yang Liu, Worcester Polytechnic Institute

Comparison of Small Area Procedures based on Gamma Distributions Extending to Informative Sampling

The gamma distribution is a useful model for small area prediction of a skewed response variable. We study the use of the gamma distribution for small area prediction. We emphasize a model, called the gamma-gamma model, in which the area random effects have gamma distributions. We compare this model to a generalized linear mixed model. Each of these two models has been proposed independently in the literature, but the two models have not yet been formally compared. We evaluate the properties of two mean square error estimators for the gamma-gamma model, both of which incorporate corrections for the bias of the estimator of the leading term. Finally, we extend the gamma-gamma model to informative sampling. We conduct thorough simulation studies to assess the properties of the alternative predictors. We apply the proposed methods to data from an agricultural survey. 

Keywords

Small area

Informative sampling

Agricultural survey

Parametric bootstrap

Gamma distribution 

Co-Author

Emily Berg

First Author

Yanghyeon Cho

Presenting Author

Yanghyeon Cho