Statistical Tests and Machine Learning for Pooling and Analyzing Multi-site Datasets
How can one efficiently combine experimental and observational predictive data from different laboratories into a single predictive model? How can one transfer a predictive model in one dataset to the other dataset? In our works, we provide sufficient conditions for when these problems are identifiable and machine learning algorithms to complete these tasks. Compared to classical transfer learning or multi-task learning works, our algorithms also have the good statistical properties, which means the uncertainty can be quantified, p-value and confidence interval can be derived. We also provide a framework for different laboratories to communicate efficiently without privacy issue to evaluate the benefits of the multi-site collaboration. The methods are applied to Alzheimer’s disease studies and improved the understanding of relation between cerebral spinal fluid and Alzheimer’s disease. We provide tools to add to the armamentarium of the scientific experimenter and data analyst for efficient combination of information from diverse sources.