Monday, Aug 7: 10:30 AM - 12:20 PM

0046

Contributed Papers

0046

Contributed Papers

Metro Toronto Convention Centre

Room: CC-713B

Survey Research Methods Section

The relationship between a noise exposure level of a supersonic flight event and the probability of an individual being highly annoyed by the event is quantified by a dose-response curve. Dose-response curve estimation is based on logistic regression modeling, and it is subject to bias when the dose is measured with error. The National Aeronautics and Space Administration (NASA) are planning to conduct a set of supersonic aircraft annoyance surveys in select communities in the U.S., to measure public perception of reduced sonic boom. Unavoidable amounts of estimation error are expected in these upcoming noise measurements. Hence, the development of an estimation approach that accounts for such error when estimating the dose-response curve is imperative. In this paper, an evaluation study is conducted to assess the impact of measurement error on dose-response curve estimation. For this, hierarchical Bayes models are fit to data collected by NASA in 2018, as part of a risk-reduction study of supersonic flights affecting Galveston, Texas. It is observed that the estimated dose-response curve appears sensitive to uncertainty in dose measurements, being subject to attenuation bias.

community response

measurement error

noise level

supersonic aircraft

measurement error

noise level

supersonic aircraft

Survey samples are often selected using predefined probabilistic methods from finite populations. Complex sampling designs facilitate fieldwork and keep costs under control (e.g., stratification, clustering, stage sampling, etc.), resulting in samples with unequal selection probabilities. Techniques such as sample selection, weight adjustment, and sample analysis need to account for the complexity of the sampling design. Until the development of samplics, Python users did not have a comprehensive library for designing and analyzing complex survey samples. The Python package samplics provides modules for sample size calculation, sample selection and weighting, and population parameter estimation, including small area estimation. Small Area Estimation methods are increasingly used to provide local estimates to inform and guide public policy decisions. In this presentation, we will present the modules of Samplics dedicated to producing small-area estimates including the validation of the estimates. We will review both area-level and unit-level models under the empirical Bayes framework. We will motivate the discussion with applications from developing countries.

Small area estimation

Empirical best prediction

Survey sampling

Python

Samplics

Empirical best prediction

Survey sampling

Python

Samplics

Federal Statistical Agencies are required to produce estimates of subpopulations with few samples. Traditional design-based estimates based on only sample data from subpopulations are unreliable. For over forty years, the Fay-Herriot model has been widely used to produce reliable small area statistics. This model develops prediction of small area of interest based on a linear regression on auxiliary variables. The Fay-Herriot model are treated as independent and normally distributed zero-mean random variables with an unknown variance. It is sensitive to outliers because the outliers may result in overestimation of the model variance. In this talk, we propose a new robust estimation approach to estimate small area populations. The robustness property is achieved by replacing the standard normality assumption of the model errors by a mixture of two normal distributions with different variances, making this mixture model less sensitive to outliers. Finally, we compare the estimates from the proposed mixture model to alternative existing methods using study data set from the Cash Rents Survey conducted by the National Agricultural Statistics Service (NASS).

Mixture model

Small area estimation

Fay-Herriot model

Cash Rents Survey

Small area estimation

Fay-Herriot model

Cash Rents Survey

The Fay-Herriot model is a popular approach for estimating small area means using a linear mixed effects model. This approach is used in applications such as the U.S. Census Bureau's Small Area Income and Poverty Estimates (SAIPE) program for producing estimates of poverty for different subgroups, such as at the county level. These models take into account the uncertainty due to sampling error; however, the addition of privacy protection on data sources used as covariates can introduce additional noise. One such method of protecting respondent privacy is differential privacy, which was adopted by the U.S. Census Bureau beginning with the 2020 Decennial Census. In this work, we compare small area measurement error models for covariates which have been treated with differential privacy. We focus on the case of discrete noise added to count data at various levels.

Small Area Estimation

Fay Herriot

Differential Privacy

Fay Herriot

Differential Privacy

We consider the problem of integrating probability sample and nonprobability sample. For the nonprobability sample, there are no survey weights, but for the probability sample, the weights are available. We use matching method to calculate the weight for the nonprobability sample and then use the Fay-Herriot model to predict the finite population mean for each area. Bayesian predictive inference for small area estimation is studied by embedding a parametric model in a nonparametric model. The Dirichlet process has attractive properties such as clustering that permits borrowing information. For sampling, we use a stick break algorithm.

Big data

Covariate

Finite population mean

Dirichlet process

Small area

Selection bias

Covariate

Finite population mean

Dirichlet process

Small area

Selection bias

The gamma distribution is a useful model for small area prediction of a skewed response variable. We study the use of the gamma distribution for small area prediction. We emphasize a model, called the gamma-gamma model, in which the area random effects have gamma distributions. We compare this model to a generalized linear mixed model. Each of these two models has been proposed independently in the literature, but the two models have not yet been formally compared. We evaluate the properties of two mean square error estimators for the gamma-gamma model, both of which incorporate corrections for the bias of the estimator of the leading term. Finally, we extend the gamma-gamma model to informative sampling. We conduct thorough simulation studies to assess the properties of the alternative predictors. We apply the proposed methods to data from an agricultural survey.

Small area

Informative sampling

Agricultural survey

Parametric bootstrap

Gamma distribution

Informative sampling

Agricultural survey

Parametric bootstrap

Gamma distribution