FreebaseTools Overview

FreebaseTools is a small, lightweight toolkit to pre-process, filter, index and store the Freebase knowledge base into a fast and relatively "small" Lucene index. KB Variants such as BaseKB Gold which is used as the reference KB for NIST's TAC-KBP entity discovery and linking (EDL) evaluations can also be handled.

Main Features


How It Works

Index generation proceeds in two phase:
  1. Pre-processing and shrinking: this phase abbreviates and normalizes URIs so that they can be easily referenced in Lucene queries, and ignores unwanted triples (customized by a number of "ignore-*' files). In particular:
  2. The normalized, shrunk and sorted triples file gets indexed via Lucene. Each subject and all of its predicates become a Lucene document. Each predicate and its values become stored fields in the document. A small set of those fields (customizable) are also indexed to allow efficient querying for names and variants, descriptions, type fields, etc. Lucene efficiently compresses all field data, which results in a relatively small index size.

Pre-built Indexes

As of version 1.2.0, a couple of pre-built indexes are available for download so you can get started right away without having to build your own. See the README file on how to download and install them.

System Requirements

Java 1.7 and the following libraries which have also been provided already in the lib directory: Different versions might work as well, but your milage may vary. All code and scripts have been developed and tested under Linux only (specifically openSUSE). They should generally work on MacOS as well but will definitely require adaptation for Windows.

If you want to use the Python interface, you will need Python 2.7 and a compatible version of the jnius package.



Installation and Use

See the README file.

Questions, Suggestions and Comments

Please send any questions or comments to Hans Chalupsky (hans AT isi . edu).