Data fusion learning methods for individualized treatment rules (Shu Yang)

Prerequisites: Probability, statistical computing.

Outline: The project is motivated by the 21st-century cures act, which emphasizes (i) harnessing real-world data for data-driven evidence and (ii) precision medicine that aims at learning individualized treatment regimes, tailoring to individual patient’s characteristics. Heterogeneous data sources are becoming increasingly popular for estimating treatment effects and learning individualized treatment regimes. Importantly, parallel randomized controlled trial data and large observational data present complementary features (see a review paper (1). For example, randomized controlled trial data are free of confounding due to randomization of treatment by design, while they may be small and less representative of the real-world patient population. On the other hand, observational data contains rich information in the patient portfolios while they suffer from data quality issues such as confounding, missingness, etc.

Research objectives: This project will review (2-4) and develop data integration methods for estimating treatment effects and individualized treatment regimes that leverage the unique strengths of data sources.

Outcomes: The project will result in new methods for data fusion to improve treatment effect estimation.

  • Colnet B, Mayer I, Chen G, Dieng A, Li R, Varoquaux G, Vert J, Josse J, Yang S. Causal inference methods for combining randomized trials and observational studies: a review. ArXiv. 2020;2011.08047.
  • Yang S, Zeng D, Wang X. Improved Inference for Heterogeneous Treatment Effects Using Real-World Data Subject to Hidden Confounding. arXiv. 2020;2007.12922
  • Yang S, Zeng D, Wang X. Elastic Integrative Analysis of Randomized Trial and Real-World Data for Treatment Heterogeneity Estimation. arXiv. 2020;2005.10579
  • Dong L, Yang S, Wang X, Zeng D, Cai J. Integrative analysis of randomized clinical trials with real world evidence studies. arXiv. 2020;2003.01242.