Intelligent Systems Division


ISI computer scientist Mayank Kejriwal has been awarded best paper at the 2017 Semantic Web Science Association (SWSA) Distinguished Dissertation Awards.

His dissertation — Populating a Linked Data Entity Name System — tackles an important information integration problem called entity resolution: the science of connecting people, places and things by linking or grouping them together.

For example, how many email addresses do you have? How many spellings of your name exist? How many addresses have you lived at in your lifetime?

Uniting and resolving numerous data elements into a single holistic view is a complex and tedious task for machines. Yet it has important applications in numerous areas, including government and public health, law enforcement, web searches and comparison shopping.

Kejriwal's dissertation describes a set of algorithms to automatically populate an Entity Name System (ENS), a core data integration component, which could speed up the process significantly.

"We want to automate solutions to this problem because the web contains far too much data for manual annotations to be practical," says Kejriwal, who earned his PhD in computer science at the University of Texas at Austin.

"I showed that my unsupervised algorithms approached the performance of supervised systems, and in some cases, outperformed them."

These algorithms are now helping hundreds of law enforcement officials combat human trafficking as part of the Domain-specific Insight Graph (DIG) system, funded by ISI's DARPA MEMEX project and led by Kejriwal's supervisor, Pedro Szekely.

In addition, Kejriwal is collaborating with social scientists to analyze a corpus of more than 100-million human trafficking webpages, using scientific tools to measure the extent of the problem.

"Without reliable entity resolution, we cannot do even simple operations like counting, since we could end up counting an entity, in this case a trafficked worker, multiple times," says Kejriwal.

The approach also works on datasets from numerous domains, spanning social media, movies, books, publications and locations, and including large-scale knowledge bases containing millions of facts and entities.

Contributions from Kejriwal's dissertation have led to at least eight peer-reviewed publications in several top-tier Semantic Web and knowledge discovery conferences and journals, including: the Institute of Electrical and Electronics Engineers (IEEE) International Conference on Data Mining (ICDM), International Semantic Web Conference (ISWC), Extended Semantic Web Conference (ESWC), Journal of Web Semantics (JWS), Association for the Advancement on Artificial Intelligence (AAAI) and the IEEE Big Data Conference.

Kejriwal's dissertation was also published in a book, as part of iOS Press's Studies in the Semantic Webseries. He is currently co-authoring a textbook about knowledge graphs with his ISI supervisors Pedro Szekely and Craig Knoblock, which is slated to be published by MIT Press in 2018.

About the Semantic Web Science Association (SWSA) Distinguished Dissertation Awards

The 2017 Semantic Web Science Association (SWSA) Distinguished Dissertation Awards, which includes a prize of 1000 euros, will be presented at the International Semantic Web Conference (ISWC), which will take place in Vienna, Austria, October 21-25, 2017.

It recognizes dissertations that present innovative research results related to semantics, data and the web; winners are selected based on originality, significance and impact of work, including publications in highly selective conferences and journals.