Innovations in estimation approaches for biomedical data

Lu Chen Chair
USDA/NASS
 
Sunday, Aug 6: 4:00 PM - 5:50 PM
0019 
Contributed Papers 
Metro Toronto Convention Centre 
Room: CC-809 

Main Sponsor

Biometrics Section

Presentations

Adaptive Bayesian Sum of Trees Model for Covariate-dependent Spectral Analysis

This work introduces a flexible and adaptive nonparametric method for estimating the association between multiple covariates and power spectra of multiple time series. The approach uses a Bayesian sum of trees model to capture complex dependencies and interactions between covariates and the power spectrum, which are often observed in biomedical studies. Local power spectra corresponding to terminal nodes within trees are estimated nonparametrically using Bayesian penalized linear splines. Trees are random and fit using a Bayesian backfitting MCMC algorithm that sequentially considers tree modifications via reversible-jump techniques. For high-dimensional covariates, a sparsity-inducing Dirichlet hyperprior on tree splitting proportions provides sparse estimation of covariate effects and efficient variable selection. By averaging over the posterior distribution of trees, the proposed method can recover both smooth and abrupt changes in the power spectrum across multiple covariates. The proposed methodology is used to study gait maturation in young children by evaluating age-related changes in power spectra of stride interval time series in the presence of other covariates. 

Keywords

Bayesian backfitting

gait variability

multiple time series

reversible jump Markov Chain Monte Carlo

Whittle likelihood 

Co-Author(s)

Zeda Li, Baruch College CUNY
Scott Bruce, Texas A&M University

First Author

Yakun Wang

Presenting Author

Scott Bruce, Texas A&M University

Confidence Intervals for a Difference of the Binomial Proportions with Missing Observations

Interval estimation for the difference between effect measures is commonly used in various applications. The method of variance estimates recovery (MOVER) is a useful method for constructing the confidence interval of the difference between effect measures. The interval estimation with missing data has been widely studied in recent years, as missing values can occur during data collection. In this study, we propose two proper multiple imputation procedures for the MOVER to estimate the confidence intervals for the difference of the binomial proportions, not only for missing at random but also for missing not at random. A simulation study shows that the coverage probabilities of the proposed intervals are closer to the nominal level than the existing intervals in most cases. These multiple imputation confidence intervals are illustrated with a real data example. 

Keywords

Binomial distribution

Coverage probability

Incomplete data

Missing data

Missing not at random

Missing value 

Co-Author

Hsiuying Wang, National Yang Ming Chiao Tung University

First Author

Chung-Han Lee

Presenting Author

Chung-Han Lee

Effects of Complexity in Pseudo-likelihood Estimation for Graphical Models

Graphical and sparse covariance models have found widespread use due to their immediate appeal in modern sample-starved high dimensional applications. A part of their wide appeal stems from the significantly low sample sizes required for the existence of maximum and pseudo-likelihood estimators, especially in comparison with the classical full covariance model. For undirected Gaussian graphical models, the minimum sample size required for the existence of MLEs had been an open question since their introduction in the late '70s, and has been recently settled. The very same question for pseudo-likelihood estimators has remained unsolved ever since their introduction in the '70s. Pseudo-likelihood estimators have recently received renewed attention as they impose fewer restrictive assumptions and have better computational tractability, improved statistical performance, and appropriateness in modern high dimensional applications, thus renewing interest in this longstanding problem. In this work, we undertake a comprehensive study of this open problem within the context of pseudo-likelihood methods proposed in the literature. 

Keywords

Pseudolikelihood

Graphical Models

Covariance Estimation 

First Author

Benjamin Roycraft, UC Davis

Presenting Author

Benjamin Roycraft, UC Davis

Estimating Menstrual Cycle Phase Using Single-time Biomarker Measurements

Understanding the impact of environmental exposures on health outcomes in women is complicated by the timing of measurement in the menstrual cycle: how exposures are metabolized and measured depends on hormone levels which vary over time. Knowing the cycle time at which measurements are obtained would allow this to be incorporated in subsequent analyses, but the self-reported start of last menstrual cycle has proven unreliable. Moreover, repeated biomarker assessment during a single cycle -- which could establish cycle times -- is burdensome for participants. We therefore propose a method which allows the estimation of cycle phase given hormone levels in a single urine sample. We leverage repeated assessments in a reference sample to model within-subject variability in hormone levels over the course of a menstrual cycle, and then estimate current cycle time given a single value for a new participant. Relying on this novel, data-driven estimation of cycle time allows researchers to have a more accurate benchmark when evaluating environmental factors. 

Keywords

Functional Data

Environmental Research

Biomarkers

Precision Medicine

Dimension Reduction 

Co-Author(s)

Jeff Goldsmith, Columbia University
Lauren Houghton, Columbia University
Julie Herbstman, Columbia University

First Author

Madison Stoms

Presenting Author

Madison Stoms

Maximum softly-penalized likelihood for mixed effects logistic regression

Maximum likelihood estimation in mixed effects logistic regression often results in estimates on the boundary of the parameter space. Such estimates, including infinite values for fixed effects or singular variance components matrices, can cause havoc to numerical estimation procedures and inference. We add a scaled penalty to the log-likelihood function, which penalizes the fixed effects by the Jeffreys' invariant prior for the model with no random effects and the variance components by a composition of negative Huber loss functions. The maximum penalized likelihood estimates are shown to lie in the interior of the parameter space. Appropriate scaling of the penalty preserves the optimal asymptotic properties expected by the maximum likelihood estimator, namely consistency, asymptotic normality, and Cramer-Rao efficiency. Our choice of penalties and scaling factor preserves equivariance of the fixed effects estimates under linear transformation of the model parameters, such as contrasts. The method's superior finite sample performance over other prevalent approaches is shown on real-data examples and comprehensive simulation studies. 

Keywords

logistic regression

infinite estimates

singular variance components

data separation

Jeffreys' prior 

Co-Author

Ioannis Kosmidis, University of Warwick

First Author

Philipp Sterzinger

Presenting Author

Philipp Sterzinger

Models for Destructive Samples and Inference

In Fetoscopy for Spina Bifida fetus, a polymer patch is spread on the gap in the skin covering the vertebra. The researcher wants to measure roughness of the patch at 0, 4, 8, 12, and 16 weeks. When roughness is measured at any chosen time, the patch gets destroyed after the measurement is obtained. The basic query is how to obtain a profile of roughness over time using data from the marginal data of times. We propose models so that the joint distribution can be estimated from the marginal data. The next objective is to estimate the parameters of the joint distribution. We use Newton-Raphson method and a version of the EM algorithm and compare their performances. 

Keywords

Destructive Samples

Marginal data

Joint Distribution

Newton-Raphson Method

EM Algorithm 

Co-Author(s)

Tianyuan Guan
Koffi Wima

First Author

Marepalli Rao, University of Cincinnati

Presenting Author

Marepalli Rao, University of Cincinnati

Zero-Inflated Bivariate Generalized Linear Mixed Model for Meta Analysis of Double-Zero-Event Study

Double-zero-event studies pose a challenge for accurately estimating effect sizes in meta-analysis due to the absence of events in both control and treatment groups. To address this issue, we introduce a Zero-Inflated Bivariate Generalized Linear Mixed Model (ZIBGLMM) and develop both frequentist and Bayesian versions of it. This model is a two-component finite mixture model that includes a subpopulation with extremely low risk. Through extensive simulation studies and real-world meta-analysis case studies, we demonstrate that the ZIBGLMM model outperforms traditional meta-analysis methods and the standard BGLMM model in estimating the true effect size with substantially less bias and comparable coverage probability. In addition, we illustrated our method with a real-world meta-analysis case study, where the existence of extremely low-risk subpopulations is clinically justifiable, and find that the ZIBGLMM model performs better in terms of AIC and DIC than the standard BGLMM model. Our findings suggest that the ZIBGLMM model can properly account for extremely low-risk subpopulations and accurately estimate the effect size in meta-analysis with substantial double-zero studies. 

Keywords

Bivariate Generalized Linear Mixed Models

Double-zero-event studies

Generalized Linear Mixed Models

Meta-Analysis

Zero-inflation 

Co-Author(s)

Lifeng Lin
Joseph Cappelleri, Pfizer Inc
Haitao Chu, Pfizer
Yong Chen, University of Pennsylvania, Perelman School of Medicine

First Author

Lu Li

Presenting Author

Lu Li