Bayesian nonparametric generative machine learning (ML) for missing outcome data bias correction in randomized trials
Published in Providence, RI, 2024
Project Summary:
Randomized controlled trials (RCTs) play a vital role in evidence-based medicine, providing the most reliable form of scientific evidence for the effectiveness of healthcare interventions. However, the traditional analysis methods used in RCTs often rely on rigid models for covariate adjustment, which may not fully capture the complex relationships between covariates and outcomes. This can result in biased estimates of treatment effects, reducing the reliability and validity of the trial results. We propose to explore the Bayesian perspective of causal inference, particularly the performance of Bayesian nonparametric (BNP) generative machine learning (ML) and, specifically, the Dirichlet Process Mixture Model (DPMM) in correcting bias from adjustment model misspecification and missing outcome data in RCTs. We will compare this approach to Bayesian Additive Regression Trees (BART), linear regression, and the unadjusted estimator. We will also develop a Bayesian sensitivity analysis approach for data Missing Not At Random (MNAR). We seek to show that Bayesian methods offer a flexible alternative to covariate adjustment, improving efficiency while reducing bias in the estimated effects relative to simpler models. Integrating Bayesian statistical methods into primary analyses of future trials can optimize efficiency and mitigate bias induced by missing data, even in cases where the correct functional form of the adjustment covariates is unknown. This study will accentuate the distinctive advantages of Bayesian ML in enhancing the efficiency of estimated treatment effects through covariate adjustment in RCTs, in addition to providing a framework for guiding the use of advanced Bayesian ML methods when estimating treatment effects in primary analyses of future randomized trials evaluating treatments.
Keywords: Bayesian modeling, Causal inference, Missing data, Dirichlet Process Mixture Models (DPMM), Generative models, BART.