Thursday, Aug 10: 10:30 AM - 12:20 PM
Metro Toronto Convention Centre
Section on Bayesian Statistical Science
The authors propose a new model-based Bayesian approach for Segmentation-Targeting-Positioning analysis. Specifically, the proposed method extends the current Bayesian vector multidimensional scaling approaches by introducing overlapping segmentation via a novel mixture model that allows a single consumer to belong to multiple segments simultaneously. It also performs factor analysis on attribute data to permit a direct interpretation of the derived joint space map even when the number of attributes is excessive. In addition, the authors adopt innovative Bayesian techniques that infer both the joint space dimensionality and number of market segments directly from data. They have devised an efficient MCMC algorithm to implement their method and proposed a post-processing procedure to deal with model identification issues. In a real data example, the proposed approach provides an intuitive interpretation of the derived configuration while uncovering heterogeneity in consumer preferences. It brings more consistent estimates of space dimensionality and the number of segments, more accurate prediction of consumer preferences, and more effective positioning and repositioning strategies.
Market Structure Analysis
Alzheimer's disease (AD) is a neurological disorder with no cure. It is of interest to define the preclinical phase of AD in order to develop effective therapies. A promising approach to define the preclinical phase of AD is by estimating when longitudinal biomarkers for AD demonstrate change points. However, this estimation task is complex, as multiple longitudinal biomarkers for AD are typically correlated, have an unknown number of change points, and may have missing data. We propose a Bayesian change point model to address these complexities. Through a simulation study, we found that our proposed model estimates the true change point times of multiple longitudinal biomarkers well. Furthermore, by an analysis of multiple longitudinal neuroimaging biomarkers from a study of AD, our proposed model identified a brain region which demonstrated a change point. Thus, our proposed model provides a method to estimate the change point times of multiple longitudinal biomarkers. This is practically relevant in terms of potentially defining the preclinical phase of AD, and therefore helping to develop effective disease-modifying therapies.
Change point model
We introduce Bayesian estimation of a hierarchical linear model (HLM) where a continuous response R and a mixture of continuous and categorical covariates C are partially observed. Our C comprises cluster-level covariates. With C having linear effects, the HLM may be efficiently estimated by available methods. When C has interactive or other polynomial effects, however, it is difficult to estimate the HLM in a way that guarantees compatibility and unbiased estimation. An available Gibbs sampler is based on a joint distribution of R, C and parameters compatible with the HLM, but imputes missing covariate having a nonlinear effect based on a Metropolis algorithm via proposal density with a constant variance while the target posterior has a nonconstant variance. Therefore, the sampler is not guaranteed to produce unbiased estimation. We derive a compatible Gibbs sampler that imputes missing covariate either directly from the target posterior or by a Metropolis algorithm via proposal density matching the variance of the target posterior. We estimate the HLM efficiently given MI from our sampler, and compare the estimates with those of competitors by simulated and real data analyses.
Missing data, Multiple Imputation, Compatibility, Hierarchical Linear Model, Interaction effect, Nonlinear effect, Bayesian estimation
We develop a new longitudinal count data regression model to account for zero-inflation and spatio-temporal correlation across responses. This project is motivated by an analysis of the Iowa Flouride Study (IFS), a longitudinal cohort study with data on caries scores measured for each tooth across 5 time points. To that end, we use a hurdle model for zero-inflation with two parts: the logistic regression presence model for whether a count is non-zero and the severity model that considers the non-zero counts through a shifted negative binomial distribution with overdispersion. To incorporate dependence across teeth and time, these marginal models are imbedded within a Gaussian copula that introduces a spatio-temporal correlation. Standard Bayesian sampling from such a model is quite challenging, so we use approximate Bayesian computing (ABC) for inference. Our ABC strategy involves an initial MCMC-based algorithm under an independence assumption to obtain a proposal distribution for the ABC-MCMC algorithm. This approach is implemented on the IFS data to gain insight into the risk factors for cavities and the correlation structure across teeth and time.
Approximate Bayesian computing
There is substantial evidence that air pollution and weather mixtures are associated with increased risk of mortality and morbidity, and that maternal exposures negatively impact birth outcomes including preterm birth and low birth weight. Often, interest lies in exposures on multiple days or weeks prior to the assessment of a health endpoint. These effects are commonly estimated with a distributed lag model, where an outcome is regressed on repeated measures of exposure. Yet there are currently no established methods for estimating the lagged effects of exposure mixtures with count data. We develop a method of estimating the relationship between repeated measures of a mixture and a count outcome. Our method estimates exposure effects with an ensemble of time-partitioning binary trees that add structure to the lagged effects in a zero-inflated negative binomial model. We show via simulation that our method outperforms a splines-based method for a single exposure and excels with mixture exposures. We apply the method to examine the exposure effects of weekly average maternal exposure to fine particulate matter and temperature on birth outcomes in a Colorado administrative dataset.
Distributed lag model
Model development for sequential count-valued data characterized by small counts and non-stationarities is essential for broader applicability and appropriate inference in the scientific community. Specifically, we introduce global-local shrinkage priors into a Bayesian dynamic generalized linear model to adaptively estimate both changepoints and a smooth trend for count time series. We utilize a parsimonious state-space approach to identify a dynamic signal with local parameters to track smoothness of the local mean at each time-step. This setup provides a flexible framework to detect unspecified changepoints in complex series, such as those with large interruptions in local trends. We detail the extension of our approach to time-varying parameter estimation within dynamic Negative Binomial regression analysis to identify structural breaks. Finally, we illustrate our algorithm with empirical examples in social sciences.