Identity of long-tail entities in text: a knowledge perspective

Thursday, August 29, 2019, 11:00 am - 12:00 pm PDTiCal
10th floor conference room (1016)
This event is open to the public.
AI Seminar
Filip Ilievski, Vrije Universiteit (VU) Amsterdam,
Video Recording:

Entity linking systems are faced with a complex M-to-N mapping between surface forms in text and instances in a knowledge base, caused by the ambiguity of surface forms, the variance of the instances, and their frequency/popularity interplays, well-explained by pragmatic principles such as the Gricean maxims (Grice, 1975). Although current entity linkers report high accuracy scores, in this talk I will describe phenomena that capture large differences in performance between ‘head’ and ‘tail’ entities. To improve performance on the tail entities, I will argue that we need: to revisit evaluation (part I) and to employ knowledge and reason over it in a more systematic way (part II).

During the first half, I will depict how the current evaluation datasets, as well as the metrics employed, obfuscate the difference between head and tail, and discourages focus on tail entities. I will propose recommended actions and examples for long-tail-focused evaluation. 

In the second half of my talk, I will present our efforts to generate expectations on long-tail entities through building neural profiling machines on top of background knowledge from Wikidata. In addition to an intrinsic evaluation, these profiling techniques are evaluated extrinsically on clustering NIL entities. I will discuss how an extension of this work can be used to capture commonsense knowledge and act as an active component in future reading machines.


Filip Ilievski is a Postdoctoral Researcher in Natural Language Processing at Vrije Universiteit (VU) Amsterdam, and closely affiliated with the Knowledge Representation and Reasoning group at the same University. His research investigates how systematic and extensive use of knowledge can help machines to deal with the ‘long-tail’ (knowledge scarcity and ambiguity) of human communication, thus integrating ideas from Information Extraction, Knowledge Graphs, and Machine Learning.

He developed LOTUS, the largest publicly available index over the Linked Data cloud at the time, which received an award at the Semantics conference in 2016. Later, he collaborated with prof. Ed Hovy at CMU on building neural generalization models (‘profiling machines’) over Linked Data knowledge and applying them to cluster long-tail entities. As part of his research on measuring and improving biases in NLP evaluations, he co-organized a SemEval competition on ‘Counting Events and Participants in the Long Tail’ in 2018.

Filip Ilievski authored over 20 publications about these topics in peer-reviewed international journals and conference proceedings, including COLING, ESWC, and SWJ.

« Return to Upcoming Events