How can hospitals, clinics or researchers collaborate to learn from biomedical data when privacy concerns or regulations prohibit sharing data? Machine learning approaches that require data to be copied to a single location cannot be used in such circumstances. Federated learning (FL) enables training deep learning models over distributed sites, while maintaining the data private at each site. Our architecture does not share any data, only aggregated parameters that are protected by encryption. The data is further protected from an insider by training under specific gradient noise models. Learning performance of encrypted models is similar to non-encrypted, with little overhead. However, federated training is half of the problem. Silos often have different schemata, data formats, data values, and access patterns. The field of data integration has developed many methods to address these challenges, including techniques for data exchange and query rewriting using declarative schema mappings, and entity linkage.
We propose an architecture for Federated Learning and Integration (FLINT), incorporating the critical steps of data harmonization and data imputation. We will illustrate the methods on neuroimaging tasks.
Dr. Jose-Luis Ambite is an Associate Research Professor of Computer Science, and a Research Team Leader at the Information Sciences Institute, both at the University of Southern California. His research interests include data integration (query rewriting under constraints, learning schema mappings, entity linkage, information extraction), databases, semantic web, federated learning, and biomedical data science and genetics.