Comparing Data While Keeping It Private

by Julia Cohen

Published on August 12th, 2022Last updated on August 12th, 2022

Tanmay Ghai, a research engineer in the Networking and Cybersecurity Division at USC’s Information Sciences Institute (ISI), and recent ISI alumnus, is the recipient of the 2022 Viterbi Master’s Student Award for Best Research in the Computer Science Department. 

The Viterbi Master’s Student Awards recognize graduating master’s degree students across all eight departments of USC’s Viterbi School of Engineering for excellence in research, service, leadership and academics. Ghai won the Best Research Award for his work in privacy-preserving entity resolution.

The importance of preserving privacy in entity resolution

In his research, Ghai studied how to keep data private while resolving various entities and identifying relationships between them, across datasets. For example, if you take your hospital and your bank, each containing your health or financial records in their databases, the task of understanding and linking those records that refer to you is entity resolution. It’s a surprisingly difficult problem, made more difficult by the fact that data like this is often highly sensitive.  

Ghai explained, “In the example, comparing data between a hospital and a bank would obviously require them to share the data with each other so they could perform the comparison. This is a privacy concern because now we are leaking information from one entity to another, and while in certain cases it may just be a “name” or “username,” in more complicated scenarios it could be an address, social security number, or even a bank account number among other possibilities.  

Keeping highly sensitive data private adds a level of difficulty to entity resolution because the data must be obfuscated to preserve privacy, making similarity comparisons difficult and costly. This is especially the case for “fuzzy” or “approximate” matching – matching that accounts for differences in naming conventions and formats.  

A novel methodology: AMPPERE

Tanmay Ghai holding his Viterbi award

Tanmay Ghai holding his Viterbi award

In their paper, Ghai and his co-authors presented a methodology called AMPPERE: A Universal Abstract Machine for Privacy-Preserving Entity Resolution Evaluation. It’s a computational model that uses similarity measurements and privacy tools. By implementing AMPPERE utilizing two different privacy tools over real-world datasets, they showed that two parties can perform entity resolution over their data without leaking sensitive information.  

Ghai was pleased with the level of success AMPPERE achieved. “Perhaps the most surprising outcome was the ability to be as accurate as non-privacy-preserving entity resolution algorithms with our two implementations while preserving sensitive information from being leaked.” 

From the start, it was important for the research team to make AMPPERE universal and platform-agnostic so that it can be used in a variety of applications, and for further research. “Our abstract model should be able to support many possible and promising new directions, and as the privacy tools we incorporated become more efficient, solving privacy-preserving entity resolution will become even more computationally feasible,” said Ghai. 

Ghai is honored to be recognized for his research

“I am truly humbled and grateful to win the Viterbi Graduate Award for best research and would like to deeply thank my advising professor Prof. Srivatsan Ravi for his guidance and support throughout my research journey at USC. I started at USC in the midst of the pandemic and when I look back, the community that I found at ISI, tackling cutting-edge problems, was one that truly fostered a new passion of mine — research. I am excited to keep contributing to the ongoing efforts at ISI in the Networking and Cybersecurity Division, continuing to work at the intersection of distributed and secure systems and machine learning.” 

This is not the first time Ghai has received recognition for this research. In 2021, the AMPPERE paper was published at the 30th ACM International Conference on Information and Knowledge Management (CIKM), a top-tier, international conference for information and knowledge management, as well as recent advances in data and knowledge bases. In 2021, when Ghai’s paper was accepted, the acceptance rate was 21.7 percent.  

Ghai and his collaborators are continuing their research in privacy-preserving entity resolution. They currently have a new paper on the subject out for review at this year’s edition of CIKM and are planning on future directions involving other privacy tools and their applications in the entity resolution space.

Published on August 18th, 2022

Last updated on August 18th, 2022

Want to write about this story?