WWW'22 Tutorial: KGTK: Tools for Creating and Exploiting Large Knowledge Graphs

The Knowledge Graph Toolkit (KGTK) is a comprehensive framework for the creation and exploitation of large KGs, designed for simplicity, scalability, and interoperability. KGTK represents KGs in tab-separated (TSV) files with four columns: edge-identifier, head, edge-label, and tail. All KGTK commands consume and produce KGs represented in this simple format, so they can be composed into pipelines to perform complex transformations on KGs. The simplicity of its data model also allows KGTK operations to be easily integrated with existing tools, like Pandas or graph-tool. KGTK provides a suite of commands to import Wikidata, RDF (e.g., DBpedia), and popular graph representations into the KGTK format. A rich collection of transformation commands make it easy to clean, union, filter, and sort KGs, while the KGTK graph combination commands support efficient intersection, subtraction, and joining of large KGs. Its advanced functionality includes a query language variant of Cypher (called “Kypher”), which has been optimized for querying KGs stored on disk with minimal indexing overhead; graph analytics commands support scalable computation of centrality metrics such as PageRank, degrees, connected components, and shortest paths; lexicalization of graph nodes, and computation of multiple variants of text and graph embeddings over the whole graph. In addition, a suite of export commands supports the transformation of KGTK KGs into commonly used formats, including the Wikidata JSON format, RDF triples, JSON documents for ElasticSearch indexing, and graph-tool. Finally, KGTK allows browsing locally stored KGs using a variant of SQID; and includes a development environment using Jupyter notebooks that provides seamless integration with Pandas. KGTK can process Wikidata-sized KGs, with billions of edges, on a laptop computer. We have used KGTK in multiple settings, focusing primarily on the construction of subgraphs of Wikidata, analysis of over 300 Wikidata dumps since the inception of the Wikidata project, linking tables to Wikidata, construction of a consolidated commonsense KG combining multiple existing sources, creation of an extension of Wikidata for food security, and creation of an extension of Wikidata for the pharmaceutical industry.

Presenter names: Filip Ilievski, Daniel Garijo, Hans Chalupsky, Pedro Szekely

WWW'22 Tutorial Page

ISWC 2021 Tutorial: KGTK: Tools for Creating and Exploiting Large Knowledge Graphs

Knowledge Graphs (KGs) have become the de facto method for representing, sharing, and using knowledge, but exploiting KGs in AI applications is challenging for most researchers and developers, as it requires knowledge of a variety of approaches, tools, and formats. Our tutorial will showcase the Knowledge Graph Toolkit (KGTK), a comprehensive framework for creating and exploiting large KGs such as Wikidata. KGTK is designed for ease of use, scalability, and speed, and can process Wikidata-size KGs on a laptop. In the first half of the tutorial, we will introduce and experiment with a wide range of import, curation, transformation, analysis, and export commands, which can be flexibly chained into streaming pipelines through the command line. In the second half, we will show its applicability to three common and diverse KG use cases. This tutorial will introduce AI researchers and practitioners to effective tools for addressing a wide range of KG creation and exploitation use cases, and inform us on how to bring KGTK closer to its users.

ISWC 2021 Tutorial Page

KDD 2021 Tutorial: From Tables to Knowledge: Recent Advances in Table Understanding

A wealth of human knowledge is expressed in structured tables, across web pages, scientific articles, spreadsheets, and databases. This wealth of knowledge is mirrored by diversity in the vast number of layout structures, content types, formats, and surface forms used to express tables. Recent advances in representation learning and knowledge representation have made progress in exploiting structural regularities in tabular data to unlock this knowledge. In this tutorial, we provide a survey of these advances for a host of table understanding tasks, including table segmentation, semantic typing of cells, transforming tables to knowledge graphs, entity linking, and table retrieval tasks for question answering.

KDD 2021 Tutorial Page

AAAI 2021 Tutorial: Commonsense Knowledge Acquisition and Representation

The tutorial consists of four main components, each covered by one of the presenters, followed by a discussion session. We start by introducing theories on an axiomatization of commonsense knowledge. Next, we cover efforts to harmonize nodes and relations across heterogeneous commonsense sources, as well as the impact of such consolidation on downstream reasoning tasks. Thirdly, we discuss how commonsense knowledge can be automatically extracted from text, as well as quantitatively and qualitatively contextualized. Then, we discuss how large-scale models, such as BERT, GPT-2, and T5, learn to implicitly represent an abundance of commonsense knowledge from reading the web. Also, how this knowledge can be extracted through carefully-designed language prompting, or through fine-tuning on knowledge graph tuples. We conclude the tutorial with a discussion of the way forward, and propose to combine language models, knowledge graphs, and axiomatization in the next-generation commonsense reasoning techniques. Prior knowledge expected from participants is minimal. Some knowledge of machine learning and language modeling is helpful, but not compulsory: we introduce relevant machine learning concepts so that everyone has an opportunity to follow along.

AAAI 2021 Tutorial Page

ASONAM 2020 Tutorial: Knowledge Graphs: A Practical Introduction across Disciplines

Knowledge Graphs (KGs) like Wikidata, NELL and DBPedia have recently played instrumental roles in several machine learning applications, including search and information retrieval, natural language processing, and data mining. The simplest definition of a KG is as a directed, labeled multi-network. Yet, despite being ubiquitous in the communities mentioned above, KGs have not witnessed much research attention in the network science and social network communities. With the rapid rise in Web data, there are interesting opportunities to construct domain-specific knowledge graphs, including over social media data. This tutorial provides a detailed and rigorous introduction to KGs, and a synthesis of KG research and applications in multiple areas of computer science and AI, including e-commerce, social media analytics and biology.

ASONAM 2020 Slides

ISWC 2020 Tutorial: Common Sense Knowledge Graphs (CSKGs)

Commonsense reasoning is an important aspect of building robust AI systems and is receiving significant attention in the natural language understanding, computer vision, and knowledge graphs communities. At present, a number of valuable commonsense knowledge sources exist, with different foci, strengths and weaknesses. Our tutorial will survey the most important commonsense knowledge resources, and introduce a new commonsense knowledge graph (CSKG) to integrate several existing resources. The tutorial will also introduce several tools to work with CSKG including query mechanisms, knowledge graph embeddings, and a framework to create a commonsense question answering systems. In a hands-on session, participants will use the framework and tools to build a question answering application using CSKG and language models.

ISWC 2020 Tutorial Page

Mining Knowledge Graphs from Text

Knowledge graphs have become an increasingly crucial component in machine intelligence systems, powering ubiquitous digital assistants and inspiring several large scale academic projects across the globe. Our tutorial explains why knowledge graphs are important, how knowledge graphs are constructed, and where new research opportunities exist for improving the state-of-the-art. In this tutorial, we cover the many sophisticated approaches that complete and correct knowledge graphs.

Tutorial Page

WWW 2018 Tutorial: Scalable Construction and Querying of Massive Knowledge Bases

In today's computerized and information-based society, people are inundated with vast amounts of text data, ranging from news articles, social media posts, scientific publications, to a wide range of textual information from various vertical domains (e.g., corporate reports, advertisements, legal acts, medical reports). How to turn such massive and unstructured text data into structured, actionable knowledge, and how to enable effective and user-friendly access to such knowledge is a grand challenge to the research community.

WWW 2018 Tutorials Page

AAAI 2018 Tutorial: Knowledge Graph Construction from Web Corpora

Knowledge Graphs (KGs) like Wikidata, NELL and DBPedia have recently played instrumental roles in several machine learning applications, including search and information retrieval, information extraction, and data mining. Constructing knowledge graphs is a difficult problem typically studied for natural language documents. With the rapid rise in Web data, there are interesting opportunities to construct domain-specific knowledge graphs over corpora that have been crawled or acquired through techniques like focused crawling. In this tutorial, we survey the techniques for knowledge graph construction from domain-specific Web corpora.

AAAI 2018 Tutorials Page

KDD 2017 Tutorial: Data mining in unusual domains with information-rich knowledge graph construction, inference and search

The growth of the Web is a success story that has spurred much research in knowledge discovery and data mining. Data mining over Web domains that are unusual is an even harder problem. There are several factors that make a domain unusual. In particular, such domains have significant long tails and exhibit concept drift, and are characterized by high levels of heterogeneity. Notable examples of unusual Web domains include both illicit domains, such as human trafficking advertising, illegal weapons sales, counterfeit goods transactions, patent trolling and cyberattacks, and also non-illicit domains such as humanitarian and disaster relief. Data mining in such domains has the potential for widespread social impact, and is also very challenging technically. In this tutorial, we provide an overview, using demos, examples and case studies, of the research landscape for data mining in unusual domains, including recent work that has achieved state-of-the-art results in constructing knowledge graphs in a variety of unusual domains, followed by inference and search using both command line and graphical interfaces.

KDD 2017 Tutorials Page

ISWC 2017 Tutorial: Constructing Domain-specific Knowledge Graphs (KGC)

The vast amounts of ontologically unstructured information on the Web, including semi-structured HTML, XML and JSON documents, natural language documents, tweets, blogs, markups, and even structured documents like CSV tables, all contain useful knowledge that can present a tremendous advantage to Semantic Web researchers if extracted robustly, efficiently and semi-automatically as an RDF knowledge graph. Domain-specific Knowledge Graph Construction (KGC) is an active research area that has recently witnessed impressive advances due to machine learning techniques like deep neural networks and word embeddings. This tutorial will synthesize and present KGC techniques, especially information extraction (IE) in a manner that is accessible to Semantic Web researchers. The presenters of the tutorial will use their experience as instructors and Semantic Web researchers, as well as lessons from actual IE implementations, to accomplish this purpose through visually intuitive and example-driven slides, accessible, high-level overviews of related work, instructor demos, and at least five IE participatory activities that attendees will be able to set up on their laptops.

ISWC 2017 Tutorials Page