Data fusion learning methods for individualized treatment rule

Faculty Mentors:

  • Shu Yang (Statistics, NCSU)
  • Wenyu Ye, (Eli Lilly, Indianapolis, IN)

Prerequisites: Probability, statistical computing.

Outline: The project is motivated by the 21st-Century Cures Act, which emphasizes (i) harnessing real-world data for data-driven evidence and (ii) precision medicine that aims at learning individualized treatment regimes, tailoring to individual patient’s characteristics. Heterogeneous data sources are becoming increasingly popular for estimating treatment effects and learning individualized treatment regimes. Importantly, parallel randomized controlled trial data and large observational data present complementary features [1]. For example, randomized controlled trial data are free of confounding due to randomization of treatment by design, while they may be small and less representative of the real-world patient population [2]. On the other hand, observational data contains rich information in the patient portfolios while they suffer from data quality issues such as confounding, missingness, etc. [1, 3].

Objectives: This project will review and develop data integration methods for estimating treatment effects and interpretable individualized treatment regimens (ITRs, such as linear rules and tree-based rules) that leverage the unique strengths of data sources [4, 5]. The research aims to implement

transfer learning to recover the optimal ITRs from the biased sample.

Outcomes: The project will result in new methods for data fusion to improve treatment effect estimation.


[1] Colnet, B., et al., Causal inference methods for combining randomized trials and observational studies: a review. ArXiv, 2020. 2011.08047.

[2] Lee, D., et al., Improving trial generalizability using observational studies. Biometrics, 2022. doi:10.1111/biom.13609.

[3] Chu, J., W. Lu, and S. Yang, Targeted optimal treatment regime learning using summary statistics. arXiv, 2022. 2201.06229.

[4] Wu, L. and S. Yang, Transfer learning of individualized treatment rules from experimental to real-world data. arXiv, 2021. 2201.06229.

[5] Wu, L. and S. Yang. Integrative R-learner of heterogeneous treatment effects combining experimental and observational studies. in CLeaR. 2022. Proc Mach Learn Res.