Finding Answers (About the Best Way to Find Answers)

USC computer scientists evaluate data representation methods for a number of applications.

by Julia Cohen

November 6, 2023

USC computer scientists evaluate data representation methods for a number of applications. — Photo Credit: Eoneren/iStock

Knowledge graphs (KGs) are all around us. They’re used to answer a Google query, recommend a show on Netflix, or suggest a Facebook friend. They do this by capturing and organizing data in a way that is easy to understand and access. So, for example, a suggestion like “Customers who bought this also bought…” is powered by algorithms analyzing your past purchases, browsing history, and similar users’ preferences using a KG, and as a result you get to discover products tailored to you.

KGs consist of entities (such as people, places, and things) and the relationships between them. These relationships are represented as edges connecting the entities, forming a graph-like structure. There are different ways to do this, resulting in different types of KG representations. So, a person who wants to use a KG in their application has options to choose from, however, determining which type to use is a bit of a mystery. There is currently no consensus on which type of KG representation best supports each use-case.

It was with that in mind that computer scientist Jay Pujara, a research assistant professor of Computer Science at the USC Viterbi School of Engineering and the director of the Center on Knowledge Graphs at the USC Information Sciences Institute (ISI), along with a team of researchers, set out to find answers.

Does the Task Dictate the Tool?

“Many different KG representations have been proposed, but no one knows which are the best for humans who are exploring knowledge, writing queries, or building machine learning models,” said Pujara. These are the three tasks the researchers focused on based on applications they see in the real world:

Exploring knowledge. “Like when you browse Wikipedia, you’re just getting a sense of what’s going on at a high level. That’s the data exploration problem,” said Pujara. To assess this, the researchers ran a user study where participants interacted with a web browser and query to determine the KG type that resulted in the best answers to real-world questions.
Writing queries. The second use-case is “when you have some sort of report you want to get or set of knowledge that you need, and you want to write a query that gets you that knowledge,” Pujara explained. The researchers tested several queries using synthetic and real-world data to compare the KGs.
Building machine learning models. “If you want to take all this data and package it in a way that some machine learning model can use it and make predictions,” that’s the third scenario, said Pujara. To look at this, the researchers trained a selection of machine learning models using different KG representations and evaluated the results.

Which Representation Is Up to the Task?

The researchers looked at how four types of knowledge graph representations performed across the three tasks. “Basically, there was not a clear winner,” said Pujara. “This is not a situation where you can say a certain type of representation is always best for a certain type of task.”

However, there was a stand-out, a type of KG representation called Qualifiers. This is a method of representing data that assigns information to the edges connecting the entities, allowing it to convey additional facts and making it good for complicated knowledge. Pujara said, “One of the findings was that Qualifiers — which is what the popular public knowledge graph Wikidata uses — seem to work well in all scenarios, but there’s still a case where each of these proposed representations might have some benefit.”

What’s Next?

The team is presenting their paper, Comparison of Knowledge Graph Representations for Consumer Scenarios, at the 2023 International Semantic Web Conference (ISWC) in Athens, Greece, November 6 – 10, 2023. ISWC is the premier venue for presenting fundamental research, innovative technology, and applications concerning semantics, data, and the Web.

Alongside Pujara, the team consisted of Ana Iglesias-Molina, a visiting student from Universidad Politécnica de Madrid (UPM) in Spain and Kian Ahrabian, a USC Computer Science Ph.D. student in Pujara’s group at ISI. The work was done in collaboration with Iglesias-Molina’s ISI mentor Filip Ilievski, a research lead at ISI and Research Assistant Professor at the USC Viterbi School of Engineering and Oscar Corcho, Iglesias-Molina’s UPM supervisor.

Published on November 6th, 2023

Last updated on May 16th, 2024