Extracting and Aligning Quantitative Data with Text

Friday, April 27, 2018, 3:00 pm - 4:00 pm PDTiCal
Conf. Rm #1135
This event is open to the public.
NL Seminar
Jay Pujara (USC/ISI)

Abstract: Quantitative data, such as time series and numerical attribute data, often play a crucial role in understanding the world and validating factual statements. Unfortunately, quantitative datasets are often expressed in diverse formats that exhibit significant variation, posing difficulties to machine reading approaches. Furthermore, the scant context that accompanies these data often makes it difficult to relate the quantitative data with broader ideas. Finally, the vast amount of quantitative data make it difficult for humans to find, understand, or access. In this talk, I highlight my recent work, which focuses on developing general approaches to extracting quantitative data from structured sources, creating high-level descriptions of these sources, and aligning quantitative data with textual and ontological labels.

Bio: Jay Pujara is a research scientist at the University of Southern California's Information Sciences Institute whose principal areas of research are machine learning, artificial intelligence, and data science. He completed a postdoc at UC Santa Cruz, earned his PhD at the University of Maryland, College Park and received his MS and BS at Carnegie Mellon University. Prior to his PhD, Jay spent six years at Yahoo! working on mail spam detection, and he has also worked at Google, LinkedIn and Oracle. Jay is the author of over thirty peer-reviewed publications and has received three best paper awards for his work. He is a recognized authority on knowledge graphs, and has organized the Automatic Knowledge Base Construction (AKBC) and Statistical Relational AI (StaRAI) workshops, presented tutorials on knowledge graph construction at AAAI and WSDM, and had his work featured in AI Magazine. For more information, visit https://www.jaypujara.org 

« Return to Upcoming Events