Weighting adjustments

Evrim Oral Chair
LSUHSC School of Public Health
 
Wednesday, Aug 9: 2:00 PM - 3:50 PM
0141 
Contributed Papers 
Metro Toronto Convention Centre 
Room: CC-717B 

Main Sponsor

Survey Research Methods Section

Presentations

Introducing Educational Attainment to the Poststratification Adjustment in the NSDUH

In Quarter 4 of 2020, multimode (web and in-person) data collection was introduced in the National Survey on Drug Use and Health (NSDUH). NSDUH provides national estimates of substance use and mental health among the civilian, noninstitutionalized population aged 12 or older in the United States. Adult web respondents had higher levels of educational attainment than adult in-person respondents, and educational attainment is correlated to survey outcomes in NSDUH. To correct the imbalance of the education distributions across survey modes, educational attainment was added to the poststratification adjustment in the 2020 and 2021 NSDUH data. Proportions of educational attainment calculated from 1-year American Community Survey (ACS) data were used to derive control totals for the main effect and two-way interactions of educational attainment by demographic variables and state. Two methods for calculating educational attainment proportions, marginal distribution and cell distribution, were compared for accuracy across domains and summation of subdomains. The impact of excluding the institutionalized and active-duty population from ACS data was also investigated and is discussed. 

Keywords

NSDUH

Weighting

Poststratification Adjustment

Educational Attainment 

Co-Author(s)

P. Mae Cooper, SAMHSA
Rong Cai, SAMHSA
Jennifer Hoenig, SAMHSA
Kathryn Spagnola, RTI International
Patrick Chen, RTI International
Lanting Dai, RTI International

First Author

Devon Cribb, RTI International

Presenting Author

Devon Cribb, RTI International

On calibration to estimated totals in survey sampling

Calibration of totals estimated from one survey to totals estimated from another survey is used in survey practice for consistency between these estimates and for reducing survey errors, e.g., undercoverage and nonresponse. We discuss issues of estimation efficiency of the resulting regression estimates for variables used in calibration and particularly for the rest of the survey variables, as well as practical problems with variance estimation. We point out that a calibration procedure that is statistically and operational more efficient is possible when micro-data from the other survey are available. In this procedure the combined survey data are calibrated simultaneously, so that estimated totals for common variables in the two surveys are calibrated to each other. We show that the improved efficiency of the regression estimates generated by this calibration procedure is due to the fact that the regression coefficients are approximately variance minimizing coefficients incorporating data from the two surveys. We also indicate that computations and variance estimation are greatly facilitated. An empirical study confirming the merits of the proposed calibration is also presented. 

Keywords

calibration estimator

regression estimator

survey data combination

survey errors

aligned estimates 

First Author

Takis Merkouris, Athens University of Economics & Business

Presenting Author

Takis Merkouris, Athens University of Economics & Business

2020 Decennial Census Effect on the NSDUH Substance Use and Mental Health Estimates

The National Survey on Drug Use and Health (NSDUH) provides national estimates of substance use and mental health among the civilian, noninstitutionalized population aged 12 or older in the United States. NSDUH person-level analysis weights are calibrated to estimated totals of the target population to reduce coverage bias and variance of survey estimates. The U.S. Census Bureau produces estimates annually, using current data on births, deaths, and migration to adjust the most recent decennial census data. These postcensal estimates become less accurate with each passing year. The person-level analysis weights used in the 2020 NSDUH estimates were calibrated to the 2020 postcensal population estimates based on the 2010 decennial census. In the 2021 NSDUH, population estimates began using the 2020 decennial census in the person-level analysis weights. The 2020 decennial census effect study examined whether, and to what extent, 2020 NSDUH estimates would have differed if person-level weights were calibrated to the 2020 decennial census data instead of the postcensal population estimates anchored to the 2010 decennial census. 

Keywords

2020 population

census effect

NSDUH

weighting

substance use

mental health 

Co-Author(s)

Neeraja Sathe, RTI International
Patrick Chen, RTI International
Devon Cribb, RTI International
Jennifer Hoenig, SAMHSA
Jingsheng Yan, SAMHSA
Rong Cai, SAMHSA

First Author

Kathryn Spagnola, RTI International

Presenting Author

Kathryn Spagnola, RTI International

Consistency of survey estimates through adjusted integer weights

Calibrating survey weights can improve estimates by enforcing consistency constraints derived from external known benchmarks, i.e., the weighted sums of certain variables should be equal to their known population totals. Popular calibration methods, even though robust, often fail when multiple benchmarks are to be satisfied simultaneously. In this paper, new extensions of the integer calibration method adopted by NASS for its 2017 US Census of Agriculture are discussed. New constraint relaxation techniques and new "distances" between calibrated and design weights are introduced for big-data applications, where millions of records are processed by benchmarking simultaneously several thousand variables. The same algorithm can also be adopted to produce fractional adjusted weights with simple arithmetic operations. The consistency of the estimator and the accuracy of the results are investigated through a simulation study using the R-package inca. An application using NASS Farm Labor Survey data illustrates how extreme weights are distributed among other records while maintaining benchmarking relationships. 

Keywords

Calibration

Weighting

Survey

Consistency

Estimation

Integer programming 

Co-Author

Lu Chen, NISS

First Author

Luca Sartore, National Institute of Statistical Sciences

Presenting Author

Luca Sartore, National Institute of Statistical Sciences

Propensity score weighting in survey data with survival outcomes

Survival outcomes is one of the main outcomes in oncology research. Propensity score (PS) methods, such as propensity score weighting (PSW), are widely adopted to estimate the treatment on survival outcome in the observational studies. However, there is dearth of guidance on how to use PSW methods in the context of population-based surveys with survival outcomes. In this paper, we describe how PSW can be integrated in survey data analysis to estimate absolute and relative treatment effects on survival outcomes at the population level. Depending on how the survey weights will be incorporated, PSW analysis can be performed using three methods: no-weight, single-weight and double-weight methods. We then conducted an extensive series of Monte Carlo simulations to compare the performances of these methods in terms of mean absolute bias and coverage probability considering different scenarios by varying treatment effect, treatment prevalence and censoring rate. We found that both weighted methods outperformed no-weight method under all scenarios, while the performance of single-weight and double-weight methods varied depending on time point when absolute treatment effects are measured. 

Keywords

Survey data

propensity score weighting

provider-patient discussion 

Co-Author(s)

Lihua Li, Icahn School of Medicine At Mount Sinai
Chen Yang, Icahn School of Medicine at Mount Sinai

First Author

Wei Zhang, University of Arkansas At Little Rock

Presenting Author

Wei Zhang, University of Arkansas At Little Rock

Comparison of Different Algorithms on Propensity Score Weighting with Survey Data:A Simulation Study

Propensity score (PS) weighting is frequently used to estimate the average treatment effect (ATE) and effect on the treated (ATT) in complex survey data. Several studies compared different machine learning algorithms (ML) in estimating PS in non-survey settings; however, it is unclear which algorithm performs best in the context of survey studies. We conduct a simulation study to compare the performance of five PS estimation methods: logistic regression (LR), covariate balance propensity score (CBPS), generalized boosted modeling (GBM), Classification and Regression Tree (CART) and Random Forest (RF) with binary outcomes. We consider twelve scenarios with varying treatment effects, degrees of non-linearity and non-additivity in association between covariates and exposure, and levels of PS overlap. The performance of each method is assessed by mean relative bias and coverage probability for the population ATE/ATT. Preliminary results suggest that ML algorithms performed better than LR and CPBS under non-additive and non-linear associations. Using the Health and Retirement Survey, we illustrate these methods in analyzing postoperative cognitive decline after two surgical procedures. 

Keywords

Propensity Score Weighting

Survey Design

Machine Learning

Causal Inference



Epidemiology 

Co-Author(s)

Chen Yang, Icahn School of Medicine at Mount Sinai
John Boscardin, Division of Geriatrics, University of California, San Francisco
Lihua Li, Icahn School of Medicine At Mount Sinai

First Author

Bocheng Jing

Presenting Author

Bocheng Jing

Identifying covariates to adjust for selection bias in national estimates of web-based panel surveys

Web-based panel surveys have become increasingly prevalent due to the growing availability and low cost. While these panels use a probability-based sampling approach to recruit panelists, they are still subject to the potential selection bias associated with web surveys due to lower coverage and response rates. One common method to adjust for selection bias in these surveys is to adjust the weights using a calibration weighting approach such as propensity score weighting or raking to balance specified covariates between the panel survey and a high-quality reference survey. Li et al. (2022) assessed variable selection for propensity score models based on logistic regression and found that model selection should be performed to inform the specification of the model, including variables associated with both the outcome of interest and the selection indicator of the survey. This study evaluates various methods for identifying and selecting key covariates to adjust the panel survey weights to reduce selection bias in population mean estimation. Findings are demonstrated using the National Center for Health Statistics Research and Development Survey and National Health Interview Survey. 

Keywords

machine learning

mean estimation

National Health Interview Survey

Research and Development Survey

web surveys 

Co-Author(s)

Yan Li, University of Maryland, College Park
Yulei He, National Center for Health Statistics

First Author

Katherine Irimata, National Center for Health Statistics

Presenting Author

Katherine Irimata, National Center for Health Statistics