You’re worried you are at risk for a neurological disease, so you get an MRI scan of your brain.
Option #1: Your doctor could predict your risk based on all the MRIs she’s seen in her career.
Option #2: Using artificial intelligence, a machine learning (ML) model trained over all the MRI scans stored in the medical records of a single hospital could predict your risk.
Option #3: Using the ML technique of Federated Learning, a group of hospitals could collaborate and jointly train an ML model for risk prediction using all the MRI scans from all the hospitals within the group, while protecting patient privacy.
In general, the more medical data an ML model is trained on, the better the predictions will be, leading to improved patient care and better outcomes. (In other words, choose option three!)
A team at USC Viterbi’s Information Sciences Institute (ISI), led by Professor Jose-Luis Ambite and his USC Department of Computer Science Ph.D. candidate Dimitris Stripelis, has proposed a novel architecture to address some of the most critical challenges observed in Federated Learning environments. They will present this architecture as part of the 2023 International Workshop on Health Intelligence (W3PHIAI 2023) held in Washington, D.C. on Feb. 13-14, 2023, in association with the 37th AAAI Conference on Artificial Intelligence.
The Challenges in Federated Learning
Federated Learning (FL) is a method used to train a machine learning model collaboratively, using data from various distributed sources (“data silos”), without sharing the data. FL is often used when the privacy of the data must be maintained.
In the example above, the data silos are the individual hospitals. The private data are the MRI scans.
The ISI team proposes a general-purpose federated learning platform architecture, called Federated Learning INTegration (FLINT), that addresses three specific challenges of federated learning that frequently arise in the biomedical space.
Challenge #1: The Learning Task
A successful FL system must first be able to perform the “learning task.” This means that, with the information that is garnered from the various silos, the collaborative model can, to some accuracy, make predictions.
The ISI research team used MetisFL, the FL platform they have developed with significant funding from DARPA and NIH. They have used MetisFL in neuroimaging tasks, such as Brain Age Gap Estimation (BrainAGE) and Alzheimer’s Disease prediction.
The BrainAGE learning task is to predict the age of a human brain from a structural MRI scan. The difference between the predicted and chronological age value is a biomarker of brain pathologies.
Stripelis, co-author of the paper explained, “You have MRI scans distributed across hospitals and you want to analyze what is the difference between the true chronological and the predictive chronological age of the subject. Because the larger the difference between those two values is, the greater the risk of developing a neurological disease. It’s an indicator – or biomarker – of a neurological disease.”
This was the learning task: can the model accurately predict the BrainAGE given an MRI, if it has been trained within the proposed FLINT architecture?
Stripelis said, “We show that yes, using our system, you can actually learn that task.” Ambite added, “And the architecture is secure. No private data leaves a hospital and the models are trained under homomorphic encryption.” (Homomorphic encryption allows computations to be performed on data without decrypting it.)
Challenge #2: Schema Harmonization
Stripelis continued, “The second challenge has to do with the data integration component, and specifically schema and data harmonization.” Silos often have different data formats, values, schemata and access patterns. In other words, different silos have different characteristics and specifications. So the researchers need to make sure that the data used for learning can be made compatible.
Stripelis gave an example: “Let’s say you have one column that is called ‘DOB’ in one table and ‘birth_date’ in another table. They represent exactly the same attribute, under a different name. Or one site measures weight in kilograms and another in pounds. You have to harmonize the attributes and values in order to do meaningful analysis; that’s data integration.”
The field of data integration has developed many methods to address this challenge, which are used by the ISI team in the FLINT architecture.
Challenge #3: Missing Values
The third component is missing values. “One interesting aspect that arises in federated data silos is that, with all these different datasets, some might have missing values, even completely missing attributes,” said Stripelis.
Missing values are a critical problem in federated learning and in machine learning in general. Stripelis explained, “Often, if a single value is missing from a record, we need to either drop the record altogether or impute (fill in) the missing value.”
The ISI team proposes an architecture to address this.
The Proposed Architecture
“We propose to solve this problem through principled data integration and imputation techniques, so that learning can be done over data that ‘makes sense.’ This is an exciting and ambitious vision that we wanted to share with the research community, but the work is still in progress, so we are presenting it first in a workshop,” said Stripelis.
The team will present the FLINT architecture at the upcoming 2023 International Workshop on Health Intelligence (W3PHIAI 2023). This year marks the seventh annual W3PHIAI, focused on “Saving Lives with AI.”
The workshop program is an extension of the 37th AAAI Conference on Artificial Intelligence. Run by the largest professional organization in the field, the Association for the Advancement of Artificial Intelligence (AAAI), this conference aims to promote research in artificial intelligence and scientific exchange among AI researchers, practitioners, scientists, and engineers in affiliated disciplines.
Published on February 14th, 2023
Last updated on February 14th, 2023