This document describes FreebaseTools 1.2.0 or later. Overview ======== FreebaseTools is a small toolkit to pre-process, filter, index and store Google's Freebase knowledge base in a fast and relatively "small" Lucene index. KB Variants such as BaseKB Gold which is used as the reference KB for TAC-KBP can also be handled. Main Features: - significant size reduction: reduces the official Google RDF dump from about 3B triples to around 880M (or from 400GB uncompressed to about 60GB) - relatively small deployment size: the English-only view of TAC-KBP's BaseKB Gold can be stored in an 11.5 GB index directory which can be efficiently searched by a standard desktop machine without huge amounts of memory - Powerful ranked Lucene free text and fuzzy search on Freebase's description/ name/label/etc. text fields combined with hard restrictions on fields such as types. - simple command-line API to explore the data and extract relevant views (e.g., all subjects of type person) - simple Python interface for easy integration and interactive use Caveats: - this is not a replacement for a triple store: while many interesting and useful queries can be expressed and executed via Lucene, it is not a database system and some more complex SPARQL queries can only be emulated via code (which might not be very efficient). There is also no inference. - the toolkit is an early prototype and does not yet have a fully mature set of features and APIs Pre-built indexes: As of version 1.2.0, a couple of pre-built indexes are available for download so you can get started right away without having to build your own. See below on how to download and install them. Custom index generation proceeds in two phases: (1) Pre-processing and shrinking: this phase abbreviates and normalizes URIs so that they can be easily referenced in Lucene queries, and ignores unwanted triples (customized by a number of "ignore-*' files). In particular: - common namespace prefixes such as http://rdf.freebase.com/ns/ and http://rdf.basekb.com/ns/ become short prefixes such as `f_' or `rs_', etc. - redundant triples (e.g., and ) are eliminated (those are already eliminated from BaseKB) - triples referencing ignored language strings are eliminated - triples referencing ignored language Wikipedia pages are eliminated - a number of other useless (for your task) predicates such as ISBN numbers, MusicBrainz track listings, various keys, etc. are eliminated (2) The normalized, shrunk and sorted triples file gets indexed via Lucene. Each subject and all of its predicates become a Lucene document. Each predicate and its values become stored fields in the document. A small set of those fields (customizable) are also indexed to allow efficient querying for names and variants, descriptions, type fields, etc. Lucene efficiently compresses all field data, which results in a relatively small index size. System Requirements =================== Java 1.7 and the following libraries which have also been provided already in the lib directory: - lucene-5.2.1/core/lucene-core-5.2.1.jar - lucene-5.2.1/analysis/common/lucene-analyzers-common-5.2.1.jar - lucene-5.2.1/queryparser/lucene-queryparser-5.2.1.jar - args4j/2.0.23/args4j-2.0.23.jar - openrdf-sesame-2.8.4-onejar.jar Different versions might work as well, but your milage may vary. All development and scripts have been developed and tested under Linux only (specifically openSUSE). They should generally work on MacOS as well but will definitely require adaptation for Windows. If you want to use the Python interface, you will need Python 2.7 and a compatible version of the jnius package. Installation ============ - Unpack the tar file somewhere where you have a good amount of disk space available (50GB or more are recommended), lets call this $FBT_HOME - If not already provided, copy the required Java libraries to the $FBT_HOME/lib directory Using Pre-built Indexes ======================= To get started quickly and to let you experiment right away, you can download the indexes used in the examples below from this location: https://drive.google.com/drive/folders/0B5Cp0viUdlRxUnl3M0V2UWV6VzQ To install them, simply copy the downloaded archive(s) to the $FBT_HOME/data directory and unpack them there. IMPORTANT: make sure to rename away any pre-existing index directories there you might have built yourself to not lose any information. Unpacking should create one or more of the following directories depending on which indexes you downloaded: $FBT_HOME/data/basekb-gold-jan-2015.shrink.sort.index $FBT_HOME/data/basekb-gold-jan-2015.trilingual.shrink.sort.ml.index Then make sure the LUCENE_INDEX variable in your config.dat file points to the proper index directory (the distribution defaults already use the proper names). At this point you should be able to run queries, test the index with something like this: % ./fbt-lookup.sh -q f_m.0h54qv8 -v | head Loading index... Run time: setup=638ms, query=40ms, display=0ms f_m.0h54qv8: f_common.topic.article: f_m.0h54qvd f_common.topic.description: "Henry Hugh Higgins was an English botanist, bryologist, geologist, curator and clergyman. He is cited as an authority in scientific classification, as Higgins. He was inspector of the National Schools in Liverpool from 1842 to 1848 and chaplain to the Rainhill Asylum, also in Liverpool. He was also president of the Liverpool Field Naturalists' Club from 1861 to 1881. He especially worked on the Ravenhead collections, almost wholly made up of Upper Carboniferous flora, fish, bivalves and insect remains. Higgins had suggested that Ravenhead donate his collections to the Liverpool Museum and the donation gained a home with the construction of the railway in 1870, which exposed two Carboniferous seams known as the Upper and Lower Ravenhead. Most of Liverpool Museum's collections survived the Liverpool Blitz of May 1941 which practically destroyed the Museum itself, but the entire Ravenhead collection was lost in the fire."@en f_common.topic.description: "Rev. Henry Hugh Higgins fue un botánico, briólogo, clérigo, geólogo, y curador inglés. Fue inspector de Escuelas Nacionales, de Liverpool, de 1842 a 1848. Desde 1853 a 1886, fue capellán del Asilo Rainhill, también de Liverpool. Trabajó especialmente en las colecciones Ravenhead, compuestas sobre todo de flora del Carbonífero superior Langsettiano, peces y bivalvos y restos de insectos. El colector fue Liverpool Museum voluntarios reverendo Henry Higgins Hugh y la reunión se hizo desde un sitio de recolecta fue con la construcción del ferrocarril en 1870, donde se exponen dos vetas carboníferas conocidas como el Alto y el Bajo Ravenhead. La mayor parte de las colecciones, sobrevivió al bombardeo de mayo 1941 que prácticamente destruyó al museo de Liverpool; mas por desgracia, todo el material de Ravenhead se perdió en el incendio."@es f_common.topic.notable_for: f_g.125crzjzl f_common.topic.notable_types: f_m.022tfrk f_common.topic.topic_equivalent_webpage: http://es.wikipedia.org/wiki/Henry_Hugh_Higgins f_common.topic.topic_equivalent_webpage: http://es.wikipedia.org/wiki/index.html?curid=4127666 f_common.topic.topic_equivalent_webpage: we_Henry_Higgins_(botanist) f_common.topic.topic_equivalent_webpage: we_index.html?curid=32997517 Building Custom Indexes ======================= If the pre-built indexes don't work for your needs, you can build your own using the following process. Shrinking the Triples File(s) In the instructions below, we give examples for both the TAC-KBP BaseKB Gold version as well as the full Google RDF dump. Simply skip the parts for the KB version you are not using. The first time you run this, you should probably run everything in the default configuration which eliminates all non-English language information. Once you have successfully created an index and find it useful, you can change the configuration for your own needs (e.g., use the tri-lingual setup, preserve other languages of interest, ignore more or fewer predicates, etc.). If you feel adventurous, you can modify the config.dat file right away to suit your needs (see also the customization section below). To shrink the BaseKB Gold version that is used as the TAC-KBP reference knowledge base, run the following commands (substitute the proper LDC KB data directory): % cd $FBT_HOME % ./fbt-shrink-freebase.sh -o data/basekb-gold-jan-2015.shrink.sort.test.gz /data/LDC2015E42/data/*.nt.gz This will do a small test batch of 1M triples. You should inspect data/basekb-gold-jan-2015.shrink.sort.test.gz to make sure it looks ok (properly substituted namespaces, filtered triples, ignored languages, etc.) Once you are confident of that, you can run the script on the full data by giving the -f flag: % ./fbt-shrink-freebase.sh -f -o data/basekb-gold-jan-2015.shrink.sort.gz /data/LDC2015E42/data/*.nt.gz This should take approximately 3.5 hours and produce an approximately 7 GB result file. If you are want to work with the full Google RDF dump, run these commands (substitute the proper input data file): % cd $FBT_HOME % ./fbt-shrink-freebase.sh -o data/freebase-rdf-latest.shrink.sort.test.gz data/freebase-rdf-latest.gz This will do a small test batch of 1M triples. You should inspect data/freebase-rdf-latest.shrink.sort.test.gz to make sure it looks ok (properly substituted namespaces, filtered triples, etc.) Once you are confident of that, you can run the script on the full data by giving the -f flag: % ./fbt-shrink-freebase.sh -f -o data/freebase-rdf-latest.shrink.sort.gz data/freebase-rdf-latest.gz This should take approximately 6 hours to run and produce an approximately 8GB result file. Building the Lucene Index Once you have an appropriately pre-processed and shrunk triples file, you can build the Lucene index for it. For the TAC-KBP BaseKB using the file names we used above, this would look like this: % cd $FBT_HOME % java -cp '.:bin:lib/*' edu.isi.kres.FreebaseTools -T data/basekb-gold-jan-2015.shrink.sort.gz -I data/basekb-gold-jan-2015.shrink.sort.index -c index -f -o -nn -v This should take approximately 50 minutes and build an 11GB index directory. After that has finished, you can test it like this which should display a single record about Henry Higgins: % java -cp '.:bin:lib/*' edu.isi.kres.FreebaseTools -I data/basekb-gold-jan-2015.shrink.sort.index -c lookup -q f_m.0h54qv8 -v You can also use the following script (which is configured to use data/basekb-gold-jan-2015.shrink.sort.index as the index directory, if you used a different name, you have to edit the LUCENE_INDEX variable in the config.dat file accordingly): % ./fbt-lookup.sh -q f_m.0h54qv8 -v To build the index for a preprocessed full Google RDF dump, this looks very similar just with different file and directory names: % java -cp '.:bin:lib/*' edu.isi.kres.FreebaseTools -T data/freebase-rdf-latest.shrink.sort.gz -I data/freebase-rdf-latest.shrink.sort.index -c index -f -o -nn -v This will take approximately 70 minutes and build a 13GB index directory. To test it run the following query: % java -cp '.:bin:lib/*' edu.isi.kres.FreebaseTools -I data/freebase-rdf-latest.shrink.sort.index -c lookup -q f_m.0h54qv8 -v To use the `fbt-lookup.sh' and `fbt-search.sh' scripts with this index, you have to edit LUCENE_INDEX in config.dat to point to the appropriate index directory. Once the created index(es) are functioning appropriately, you can optionally delete the pre-processed and shrunk triples files they are based on. However, it might be useful to keep them for index recreation later (e.g., with a different set of indexed predicates), since they do take significant time to regenerate. Encoding and Normalization The N-Triples files use UTF-8 character encoding, however, they also encode certain Unicode characters as well as newlines, etc. as \uNNNN, \n, \r, \", etc. escape sequences. During indexing, escaped character sequences are translated back into UTF-8 before they get analyzed and stored by Lucene. Newlines can be normalized to spaces by supplying the -nn command line option. This is useful to avoid line breaks when querying for predicates. Newline normalization can be done during indexing time, which is more efficient but in which case the newlines cannot be recovered later. It can also be done during query time only by supplying the -nn option then (for the small cost of some extra run-time computation). Customization The following files are used to control the pre-processing and shrinking phase for an English-only version of the KB: - ignore-langs.lst - ignore-preds.lst - ignore-values.lst The following files are used for a tri-lingual English/Spanish/Chinese version: - ignore-langs-trilingual.lst - ignore-preds-trilingual.lst - ignore-values-trilingual.lst Edit the config.dat file to use the appropriate ignore files. If you need your own versions, make your own copies and edit them to ignore more or less. Be very careful, since these are pattern files that contain TAB characters. Make sure the TABs are preserved in your edited versions. Namespace abbreviation is currently hardcoded into the fbt-abbrev-basekb-uris.sh and fbt-abbrev-freebase-uris.sh scripts. Edit them if you need additional or different abbreviations. If you change prefixes, various other files such as ignore-preds.lst, indexed-preds.list, etc. will need to be adjusted also. After you are done customizing these files, rerun the fbt-shrink-freebase.sh script. Note that the official Freebase and BaseKB use different namespace prefixes as well as slightly different predicate names. The pre-configured ignore files are designed to handle both variants. The following file controls which predicates are indexed by Lucene: - indexed-preds.lst The predicates in this file can be referenced in queries, all other predicates are stored, printed and retrievable, but they cannot be directly queried on. If there are additional predicates you want to use in your Lucene queries, add them to this list. If you make changes to this list, you need to rerun the index creation step. Indexing Options The Lucene index supports three basic functions: (1) storing of all subject keys and their associated predicates and values (2) lookup of all predicate values for a specific subject based on its key (3) searching for subjects based on matching of textual information such as the words in a label, name, alias, description, etc. using the Lucene query language. To support the search functionality, predicates with text data must be indexed appropriately during index creation time. The toolkit supports three different text indexing options that can be used individually or in combination: (1) per-predicate indexing: in this mode, each indexed predicate that is text-valued (e.g., rs_label or f_type.object.name) becomes a searchable Lucene field of the same name. This mode allows very fine-grained Lucene search expressions that search for different information in different fields, for example: 'rs_label:"Barack Obama" AND f_common.topic.description:President'. This type of indexing is selected with the -ip option during index creation. (2) text indexing: in this mode, the text values of all indexed, text-valued predicates of a subject are indexed via a single, combined "text" field (they are still stored separately for fine-grained lookup via a subject). This allows search that spans information from all text values about a subject. For example: 'text:"President Barack Obama"'. (3) language indexing: in this mode, text values with language designations are indexed via separate language-qualified fields. This will only apply to languages for which specific analyzers have been configured in config.dat (more on that below). For example: 'rs_label@zh:巴拉克·歐巴馬 +rs_label@en:Barack' This allows for usage of language-appropriate Lucene text analyzers instead of using a one-size-fits-all approach. Language indexing applies to both per-predicate and text indexing, for example: 'text@zh:巴拉克·歐巴馬'. Per-predicate and text indexing can be used individually or in combination. Language indexing will apply to one or both of them for the languages configured in config.dat. The default setup without any -ip/-it/-il options uses all three of them together as the default. They can be down-selected to save some index space by specifying the desired options individually. Lucene uses a standard TF-IDF ranking scheme to select the most relevant matches, however, there are some normalizations (e.g., for document length) that might produce counter-intuitive results. Short fields will be boosted to avoid their being shadowed by long fields that contain many relevant search terms. This can produce ranking issues with "text" fields from subjects that have lots of information stored about them (such as celebrities). These can usually be remedied by adding per-field restrictions (as long as the index was created with -ip). The -d debug option provides explanation about how Lucene computed the rank score for each returned result. Multi-Lingual Shrinking, Indexing and Search In the default mono-lingual setup, only English text values are preserved and all indexed text predicates such as rs_label, f_type.object.name, etc. are analyzed using LUCENE_INDEX_ANALYZER_DEFAULT, which by default uses Lucene's StandardAnalyzer. In a multi-lingual setup, predicate values in multiple languages of interest are preserved and language-specific Lucene analyzers are used for those values to enable proper tokenization and search. For example, to enable a tri-lingual English. Spanish, Chinese configuration, uncomment the variables in the tri-lingual section of config.dat. This will enable the proper ignore files for the shrinking step. Run it just like above but with different file names to indicate the tri-lingual setup. For example (we only show the BaseKB scenario here, substitute the proper LDC data directory): % ./fbt-shrink-freebase.sh -f -o data/basekb-gold-jan-2015.trilingual.shrink.sort.gz /data/LDC2015E42/data/*.nt.gz Once this has successfully completed, you can build the Lucene index like this just like in the mono-lingual setting: % java -cp '.:bin:lib/*' edu.isi.kres.FreebaseTools -T data/basekb-gold-jan-2015.trilingual.shrink.sort.gz -I data/basekb-gold-jan-2015.trilingual.shrink.sort.ml.index -c index -f -o -nn -v The only difference here is the configuration of different analyzers for one or more languages in config.dat. For each language of interest, a specific analyzer is configured via a LUCENE_INDEX_ANALYZER_xx variable where "xx" is the up-cased code of the particular language. For example, LUCENE_INDEX_ANALYZER_ZH defines the anlyzer to be used for Chinese text strings. Analyzers are specified with their full Java pathnames and have to be visible in the classpath at run time. By default, language variants (e.g., zh-TW, es-419, etc.) are all handled by the analyzer configured for the root language. If necessary, this can be overridden by defining an analyzer for a language variant of interest. In multi-lingual mode, indexed predicates are indexed separately for each language of interest. For example, if rs_label is an indexed predicate, Lucene fields rs_label@en, rs_label@es and rs_label@zh will be created in the tri-lingual setup to be able to query values of specific languages. These language-qualified predicates are not stored, they are only used for querying. For example, the following query can be used to look for Barack Obama using a Chinese search string (assuming LUCENE_INDEX in config.dat points to the appropriate tri-lingual index): ./fbt-search.sh -s rs_label@zh -q '巴拉克·歐巴馬 +r_type:f_people.person' -v For this to work, the shell variable LANG must point to an appropriate UTF8 encoding for your locale. Use the -v option to make sure Lucene reports the proper UTF-8 encoded characters in its "Searching for:" string. You can search across different language fields simultaneously using Lucene's field-specific query syntax. For example: ./fbt-search.sh -q 'rs_label@zh:巴拉克·歐巴馬 +rs_label@en:Barack +r_type:f_people.person' The multi-lingual indexing and search strategy implemented here is just one example of how this might be done. Instead, one could build different mono-lingual indexes for each language of interest, or build a tri-lingual index that uses the same analyzer across fields from different languages, etc. What's the best option will require some experimentation and depend on the particular application. Performance Notes ================= The FreebaseTools command line utility used in examples above and below incurs a standard startup overhead of about 0.7 seconds to load the index. If you need to run many queries, you should use the toolkit programmatically, so you only pay for index loading once. Use the -v option to see times for index loading, querying and result display. Using the Python API is a good and simple way to only pay for index loading once. The more predicates of a retrieved subject you display, the more expensive the operation becomes, since more data needs to be accessed and decompressed. Therefore, always restrict display to data you actually need (e.g., with the -p option or by using the appropriate API function). Queries with very large result sets (e.g., all subjects of type person) are moderately expensive. To retrieve the 3M keys of all person entities takes about 2 minutes on my desktop. Lucene generally shines with queries that produce small to medium-size result sets with good ranking discrimination. If you ask for all "Smith"'s you will get large result sets (still pretty fast though) that will require significant filtering to get to what you want. The smaller the index and the fewer different predicates there are, the better performance will be. Therefore, try to ignore as much as possible for your application during the shrinking phase. Installing the index files on an SSD drive further improves performance, since random access read times will be significantly shorter. Troubleshooting =============== - make sure things work as exprected with the provided pre-built index files - before you make any modifications, try to run the test and full versions of the shrinking and indexing steps using the default configuration - preferably with the LDC BaseKB files which are very stable - if you are having problems with the fbt-shrink-freebase.sh script, try to run the pipeline at the end of the script directly in your shell (substituting the appropriate abbreviation script and input/output file names). - if you have problems with Unicode search strings, make sure the LANG variable is set to a proper UTF8 encoding (e.g., en_US.UTF-8 for a US locale) - use the -v and -d options to get more information about configuration options and search strings used Questions, Suggestions and Comments =================================== Send mail to Hans Chalupsky Example Lookup Queries ====================== In the examples below we use the fbt-lookup.sh script which is preconfigured to use data/basekb-gold-jan-2015.shrink.sort.index as the index. If you generated an index with a different name, you have to edit the config.dat file first to point the LUCENE_INDEX variable to the right index directory. Lookup queries are the most basic of operations where we look up information about a subject based on its key (or MID). For example: % cd $FBT_HOME % ./fbt-lookup.sh -q f_m.0h54qv8 f_m.0h54qv8: f_common.topic.article: f_m.0h54qvd f_common.topic.description: "Henry Hugh Higgins was an English botanist, bryologist, geologist, curator and clergyman. He is cited as an authority in scientific classification, as Higgins. He was inspector of the National Schools in Liverpool from 1842 to 1848 and chaplain to the Rainhill Asylum, also in Liverpool. He was also president of the Liverpool Field Naturalists' Club from 1861 to 1881. He especially worked on the Ravenhead collections, almost wholly made up of Upper Carboniferous flora, fish, bivalves and insect remains. Higgins had suggested that Ravenhead donate his collections to the Liverpool Museum and the donation gained a home with the construction of the railway in 1870, which exposed two Carboniferous seams known as the Upper and Lower Ravenhead. Most of Liverpool Museum's collections survived the Liverpool Blitz of May 1941 which practically destroyed the Museum itself, but the entire Ravenhead collection was lost in the fire."@en f_common.topic.notable_for: f_g.125crzjzl f_common.topic.notable_types: f_m.022tfrk f_common.topic.topic_equivalent_webpage: we_Henry_Higgins_(botanist) f_common.topic.topic_equivalent_webpage: we_index.html?curid=32997517 f_people.deceased_person.date_of_death: "1893"^^ f_people.person.date_of_birth: "1814"^^ f_people.person.gender: f_m.05zppz f_people.person.profession: f_m.036n1 f_type.object.name: "Henry Higgins"@en fk_key.wikipedia.en: "Henry_Higgins_$0028botanist$0029" fk_key.wikipedia.en: "Henry_Hugh_Higgins" fk_key.wikipedia.en_id: "32997517" fk_key.wikipedia.en_title: "Henry_Higgins_$0028botanist$0029" r_type: f_common.topic r_type: f_people.deceased_person r_type: f_people.person rs_label: "Henry Higgins"@en The -v option adds some additional progress and timing information. These are printed to stderr while results are printed to stdout, so they can be redirected to a file without being polluted by progress information. % ./fbt-lookup.sh -q f_m.0h54qv8 -v Loading index... f_m.0h54qv8: f_common.topic.article: f_m.0h54qvd f_common.topic.description: "Henry Hugh Higgins was an English botanist, bryologist, geologist, curator and clergyman. He is cited as an authority in scientific classification, as Higgins. He was inspector of the National Schools in Liverpool from 1842 to 1848 and chaplain to the Rainhill Asylum, also in Liverpool. He was also president of the Liverpool Field Naturalists' Club from 1861 to 1881. He especially worked on the Ravenhead collections, almost wholly made up of Upper Carboniferous flora, fish, bivalves and insect remains. Higgins had suggested that Ravenhead donate his collections to the Liverpool Museum and the donation gained a home with the construction of the railway in 1870, which exposed two Carboniferous seams known as the Upper and Lower Ravenhead. Most of Liverpool Museum's collections survived the Liverpool Blitz of May 1941 which practically destroyed the Museum itself, but the entire Ravenhead collection was lost in the fire."@en f_common.topic.notable_for: f_g.125crzjzl f_common.topic.notable_types: f_m.022tfrk f_common.topic.topic_equivalent_webpage: we_Henry_Higgins_(botanist) f_common.topic.topic_equivalent_webpage: we_index.html?curid=32997517 f_people.deceased_person.date_of_death: "1893"^^ f_people.person.date_of_birth: "1814"^^ f_people.person.gender: f_m.05zppz f_people.person.profession: f_m.036n1 f_type.object.name: "Henry Higgins"@en fk_key.wikipedia.en: "Henry_Higgins_$0028botanist$0029" fk_key.wikipedia.en: "Henry_Hugh_Higgins" fk_key.wikipedia.en_id: "32997517" fk_key.wikipedia.en_title: "Henry_Higgins_$0028botanist$0029" r_type: f_common.topic r_type: f_people.deceased_person r_type: f_people.person rs_label: "Henry Higgins"@en Run time: setup=740ms, query=36ms, display=0ms Instead of displaying all the information about a subject, we can restrict output to specific predicates. For example: % ./fbt-lookup.sh -q f_m.0h54qv8 -p 'f_type.object.name, f_common.topic.description, r_type' f_m.0h54qv8 f_type.object.name "Henry Higgins"@en f_m.0h54qv8 f_common.topic.description "Henry Hugh Higgins was an English botanist, bryologist, geologist, curator and clergyman. He is cited as an authority in scientific classification, as Higgins. He was inspector of the National Schools in Liverpool from 1842 to 1848 and chaplain to the Rainhill Asylum, also in Liverpool. He was also president of the Liverpool Field Naturalists' Club from 1861 to 1881. He especially worked on the Ravenhead collections, almost wholly made up of Upper Carboniferous flora, fish, bivalves and insect remains. Higgins had suggested that Ravenhead donate his collections to the Liverpool Museum and the donation gained a home with the construction of the railway in 1870, which exposed two Carboniferous seams known as the Upper and Lower Ravenhead. Most of Liverpool Museum's collections survived the Liverpool Blitz of May 1941 which practically destroyed the Museum itself, but the entire Ravenhead collection was lost in the fire."@en f_m.0h54qv8 r_type f_common.topic f_m.0h54qv8 r_type f_people.deceased_person f_m.0h54qv8 r_type f_people.person We can use predicate chains to access information that is one or more links away from a subject. For example, in the query below we lookup the labels of the objects defining the person's gender and professions: % ./fbt-lookup.sh -q f_m.0h54qv8 -p 'f_type.object.name, f_common.topic.description, f_people.person.gender>rs_label, f_people.person.profession>rs_label, r_type' f_m.0h54qv8 f_type.object.name "Henry Higgins"@en f_m.0h54qv8 f_common.topic.description "Henry Hugh Higgins was an English botanist, bryologist, geologist, curator and clergyman. He is cited as an authority in scientific classification, as Higgins. He was inspector of the National Schools in Liverpool from 1842 to 1848 and chaplain to the Rainhill Asylum, also in Liverpool. He was also president of the Liverpool Field Naturalists' Club from 1861 to 1881. He especially worked on the Ravenhead collections, almost wholly made up of Upper Carboniferous flora, fish, bivalves and insect remains. Higgins had suggested that Ravenhead donate his collections to the Liverpool Museum and the donation gained a home with the construction of the railway in 1870, which exposed two Carboniferous seams known as the Upper and Lower Ravenhead. Most of Liverpool Museum's collections survived the Liverpool Blitz of May 1941 which practically destroyed the Museum itself, but the entire Ravenhead collection was lost in the fire."@en f_m.0h54qv8 f_people.person.gender f_m.05zppz rs_label "Male"@en f_m.0h54qv8 f_people.person.profession f_m.036n1 rs_label "Geologist"@en f_m.0h54qv8 f_people.person.profession f_m.036n1 rs_label "Geologist"@en-GB f_m.0h54qv8 r_type f_common.topic f_m.0h54qv8 r_type f_people.deceased_person f_m.0h54qv8 r_type f_people.person Predicate chains trigger individual term lookup queries for the objects that are followed. This is ok for small result sets but will become expensive for larger sets. A database or triple store would run a more efficient join operation for those, but that is one of the operations not available to us with Lucene. Note, that if a predicate or chain is not defined for a particular subject, a null value will be displayed. We can lookup more than one object at the same time, which amortizes the index loading time. For example: % ./fbt-lookup.sh -q f_m.0h54qv8,f_m.03wwvwm -p 'f_type.object.name,r_type' f_m.0h54qv8 f_type.object.name "Henry Higgins"@en f_m.0h54qv8 r_type f_common.topic f_m.0h54qv8 r_type f_people.deceased_person f_m.0h54qv8 r_type f_people.person f_m.03wwvwm f_type.object.name "John R Owens"@en f_m.03wwvwm r_type f_common.topic f_m.03wwvwm r_type f_people.person In Unix we can also pipe in subject keys from stdin, one per line giving `-' as the query string. Those might have been generated by a different query or other operation. For example: % echo 'f_m.0h54qv8\nf_m.03wwvwm' | ./fbt-lookup.sh -q - -p 'f_type.object.name,r_type' f_m.0h54qv8 f_type.object.name "Henry Higgins"@en f_m.0h54qv8 r_type f_common.topic f_m.0h54qv8 r_type f_people.deceased_person f_m.0h54qv8 r_type f_people.person f_m.03wwvwm f_type.object.name "John R Owens"@en f_m.03wwvwm r_type f_common.topic f_m.03wwvwm r_type f_people.person Example Search Queries ====================== In the examples below we use the fbt-search.sh script which is preconfigured to use data/basekb-gold-jan-2015.shrink.sort.index as the index. If you generated an index with a different name, you have to edit the config.dat file first to point the LUCENE_INDEX variable to the right index directory. Search queries can be used to find subjects in the knowledge graph based on matching of text strings such as names, labels and descriptions. They use the Lucene query syntax to search the index (see https://lucene.apache.org/core/2_9_4/queryparsersyntax.html) for more information on Lucene queries. For example: % cd $FBT_HOME % ./fbt-search.sh -q 'Claude AND Parsons AND r_type:f_people.person' -v Loading index... Index contains 107692853 documents Searching for: +claude +parsons +r_type:f_people.person Found 1 matching subject(s) Printing results... f_m.02rk97l: [score=9.111403] f_common.topic.alias: "Claude Parsons"@en f_common.topic.article: f_m.02rk97p f_common.topic.description: "Claude VanCleve Parsons was a U.S. Representative from Illinois. Born on a farm near McCormick, Pope County, Illinois, Parsons attended the public schools. He taught in the rural schools of Pope County, Illinois from 1914 to 1922. He was graduated from Southern Illinois State Normal School at Carbondale in 1923. He moved to Golconda, Illinois, in 1922 to become county superintendent of schools, in which capacity he served until 1930. He was also engaged as an editor and newspaper publisher from 1924 to 1930. Parsons was elected on November 4, 1930, as a Democrat to the Seventy-first Congress to fill the vacancy caused by the resignation of Thomas S. Williams and on the same day was elected to the Seventy-second Congress. He was reelected to the Seventy-third and to the three succeeding Congresses and served from November 4, 1930, to January 3, 1941. He was an unsuccessful candidate for reelection in 1940 to the Seventy-seventh Congress. He was appointed first assistant administrator of the United States Housing Authority February 14, 1941, and served until his death in Washington, D.C., May 23, 1941. He was interred in Zion Church Cemetery, near Ozark, Illinois."@en f_common.topic.notable_for: f_g.125cz1ny1 f_common.topic.notable_types: f_m.05kpwk1 f_common.topic.topic_equivalent_webpage: http://bioguide.congress.gov/scripts/biodisplay.pl?index=P000086 f_common.topic.topic_equivalent_webpage: we_Claude_V._Parsons f_common.topic.topic_equivalent_webpage: we_index.html?curid=11586221 f_government.u_s_congressperson.thomas_id: "P000086" f_people.deceased_person.date_of_death: "1941-05-23"^^ f_people.person.date_of_birth: "1895-10-07"^^ f_people.person.gender: f_m.05zppz f_type.object.name: "Claude V. Parsons"@en fk_key.base.uspolitician.thomas_id: "P000086" fk_key.en: "claude_v_parsons" fk_key.wikipedia.en: "Claude_Parsons" fk_key.wikipedia.en: "Claude_V$002E_Parsons" fk_key.wikipedia.en_id: "11586221" fk_key.wikipedia.en_title: "Claude_V$002E_Parsons" r_type: f_common.topic r_type: f_government.politician r_type: f_government.u_s_congressperson r_type: f_people.deceased_person r_type: f_people.person rs_label: "Claude V. Parsons"@en Run time: setup=764ms, query=144ms, display=9ms In this query we looked for objects with name "Claude Parsons" that were also of type person. All query terms in a Lucene query are matched against a field. If no field is specified for a field, a default search field is used. In our implementation, the default field is `rs_label' which can be changed in config.dat or via the -s command line option which will override what's in the configuration file. To restrict a field different from the default field, the syntax `field:value' can be used, which we did for the type restriction. Remember that in our index implementation, each predicate becomes a field of the same name in the Lucene document describing a particular subject key. All predicates about a subject are stored so we can see their values when a subject is accessed, but only a small number are indexed (defined in indexed-preds.lst). `r_type' is one of the indexed predicates which is the reason we can use it in the query, `f_people.person.gender' is not. So, adding a clause restricting the gender will make the query fail: % ./fbt-search.sh -q 'Claude AND Parsons AND r_type:f_people.person AND f_people.person.gender:f_m.05zppz' -v Loading index... Index contains 107692853 documents Searching for: +claude +parsons +r_type:f_people.person +(f_people.person.gender:f_m f_people.person.gender:05zppz) Found 0 matching subject(s) Printing results... Run time: setup=806ms, query=131ms, display=0ms Lucene didn't know how to handle the f_people.person.gender field, so it tokenized it as a text field. If we wanted to restrict queries based on people's gender, we can simply add f_people.person.gender to indexed-preds.lst and rebuild the index. In the query below we use a more complex construct to look for matches in both the `rs_label' default field as well as in the `f_common.topic.description' field. This time we get additional matches that mention both Claude and Parsons in their description string, however, their scores are significantly lower than for the first (correct) match. Note that we also could have used Lucene's proximity search to look for matches to Claude and Parsons within a maximum distance. Also note, that similar to lookup queries, we can use -p to restrict what to print for our results: % ./fbt-search.sh -q '((Claude AND Parsons) OR (f_common.topic.description:Claude AND f_common.topic.description:Parsons)) AND r_type:f_people.person' -p 'f_type.object.name, f_common.topic.description' -v Loading index... Index contains 107692853 documents Searching for: +((+claude +parsons) (+f_common.topic.description:claude +f_common.topic.description:parsons)) +r_type:f_people.person Found 5 matching subject(s) Printing results... f_m.02rk97l 7.690588 f_type.object.name "Claude V. Parsons"@en f_m.02rk97l 7.690588 f_common.topic.description "Claude VanCleve Parsons was a U.S. Representative from Illinois. Born on a farm near McCormick, Pope County, Illinois, Parsons attended the public schools. He taught in the rural schools of Pope County, Illinois from 1914 to 1922. He was graduated from Southern Illinois State Normal School at Carbondale in 1923. He moved to Golconda, Illinois, in 1922 to become county superintendent of schools, in which capacity he served until 1930. He was also engaged as an editor and newspaper publisher from 1924 to 1930. Parsons was elected on November 4, 1930, as a Democrat to the Seventy-first Congress to fill the vacancy caused by the resignation of Thomas S. Williams and on the same day was elected to the Seventy-second Congress. He was reelected to the Seventy-third and to the three succeeding Congresses and served from November 4, 1930, to January 3, 1941. He was an unsuccessful candidate for reelection in 1940 to the Seventy-seventh Congress. He was appointed first assistant administrator of the United States Housing Authority February 14, 1941, and served until his death in Washington, D.C., May 23, 1941. He was interred in Zion Church Cemetery, near Ozark, Illinois."@en f_m.06x5hk 2.0857942 f_type.object.name "T. Claude Ryan"@en f_m.06x5hk 2.0857942 f_common.topic.description "Tubal Claude Ryan was an Irish-American aviator born in Parsons, Kansas. Ryan was best known for founding several airlines and aviation factories."@en f_m.025xzn6 1.4879408 f_type.object.name "Curly Putman"@en f_m.025xzn6 1.4879408 f_common.topic.description "Claude "Curly" Putman, Jr. is an American songwriter, based in Nashville. His biggest success was "Green, Green Grass of Home", which was covered by Roger Miller, Elvis Presley, Kenny Rogers, Johnny Darrell, Gram Parsons, Joan Baez, Jerry Lee Lewis, The Grateful Dead, Johnny Cash, Roberto Leal, Merle Haggard, Bobby Bare, Joe Tex, Nana Mouskouri, and Tom Jones. The Paul McCartney & Wings hit "Junior's Farm" was inspired by their short stay at Putman's farm in rural Wilson County, Tennessee in 1974."@en f_m.015wwg 1.402533 f_type.object.name "James Burton"@en f_m.015wwg 1.402533 f_common.topic.description "James Burton is a film actor."@en f_m.015wwg 1.402533 f_common.topic.description "James Burton is an American guitarist. A member of the Rock and Roll Hall of Fame since 2001, Burton has also been recognized by the Rockabilly Hall of Fame and the Musicians Hall of Fame and Museum. Critic Mark Demming writes that "Burton has a well-deserved reputation as one of the finest guitar pickers in either country or rock ... Burton is one of the best guitar players to ever touch a fretboard." James Burton is also known as the "Master of the Telecaster." Since the 1950s, Burton has recorded and performed with an array of notable singers, including Bob Luman, Dale Hawkins, Ricky Nelson, Elvis Presley, Johnny Cash, Merle Haggard, Glen Campbell, John Denver, Gram Parsons, Emmylou Harris, Judy Collins, Jerry Lee Lewis, Claude King, Elvis Costello, Joe Osborn, Roy Orbison, Joni Mitchell, Vince Gill, Suzi Quatro and Allen "Puddler" Harris."@en f_m.01qf23 1.2317178 f_type.object.name "Susan McMaster"@en f_m.01qf23 1.2317178 f_common.topic.description "Susan McMaster is a Canadian poet, literary editor, spoken word/performance poet, and 2011-12 President of the League of Canadian Poets. She lives in Ottawa, Ontario. Her recent poetry books are Paper Affair: Poems Selected and New, Pith & Wry: Canadian Poetry, and Crossing Arcs: Alzheimer's, My Mother, and Me, which was a finalist for the 2010 Acorn-Plantos People's Poetry Prize, the 2010 Ottawa Book Awards, and the 2010 Archibald Lampman Poetry Prize. She is the author of several wordmusic collections, performance poetry recordings, and scripts; has edited poetry anthologies and series; and was the founding editor of the national feminist and art magazine Branching Out. McMaster was an original member of the intermedia group First Draft, with members including Andrew McClure, Colin Morton, Alrick Huebener, Roberta Huebener, Claude Dupuis, Peter Thomas, and David Parsons. Together, they recorded, published, and performed some 40 times across Canada in the 1980s. Since 1996, she has been the wordsmith in Geode Music & Poetry, making four spoken word and music recordings with Jennifer Giles on keyboards, Alrick Huebener on bass, Gavin McLintock on sax, and friends, including Dave Broscoe, Jamie Gullikson, Mike Essoudry, Petr Cancura, Mark Molnar, John Higney, Linsey Wellman, Penn Kemp, Colin Morton, and Max Middle. She has performed and recorded with SugarBeat and Geode at 50-plus venues, including the Banff Centre, the National Library, the Kingston Fringe Jazz Festival, Rasputin's, the Blue Skies Music Festival, the Ottawa Folk Festival, the Elora Music Festival, Artscape, WordBeat, Morningside, Go, the National Arts Center Fourth Stage, and the Ottawa International Writers Festival, and has read and performed at festivals and venues in France and Italy."@en Run time: setup=779ms, query=287ms, display=12ms The following query use predicate chain result printing to describe family relationships of person's named Barack Obama (there are two in the KB, father and son): % ./fbt-search.sh -q '+Barack +Obama +r_type:f_people.person' -p 'f_type.object.name, f_common.topic.alias, f_people.person.sibling_s>f_people.sibling_relationship.sibling>rs_label, f_people.person.spouse_s>f_people.marriage.spouse>rs_label, f_people.person.children>rs_label' -v Loading index... Index contains 107692853 documents Searching for: +barack +obama +r_type:f_people.person Found 2 matching subject(s) Printing results... f_m.02mjmr 10.13228 f_type.object.name "Barack Obama"@en f_m.02mjmr 10.13228 f_common.topic.alias "Bama"@en f_m.02mjmr 10.13228 f_common.topic.alias "Barack H. Obama II"@en f_m.02mjmr 10.13228 f_common.topic.alias "Barack Hussein Obama II"@en f_m.02mjmr 10.13228 f_common.topic.alias "Barack Hussein Obama"@en f_m.02mjmr 10.13228 f_common.topic.alias "Barack Hussein Obama, Jr."@en f_m.02mjmr 10.13228 f_common.topic.alias "Barack Obama II"@en f_m.02mjmr 10.13228 f_common.topic.alias "Barak Obama"@en f_m.02mjmr 10.13228 f_common.topic.alias "Barry"@en f_m.02mjmr 10.13228 f_common.topic.alias "No Drama Obama"@en f_m.02mjmr 10.13228 f_common.topic.alias "Obama"@en f_m.02mjmr 10.13228 f_common.topic.alias "Obomber"@en f_m.02mjmr 10.13228 f_common.topic.alias "President Barack H. Obama"@en f_m.02mjmr 10.13228 f_common.topic.alias "President Barack Hussein Obama II"@en f_m.02mjmr 10.13228 f_common.topic.alias "President Barack Obama"@en f_m.02mjmr 10.13228 f_common.topic.alias "President Obama"@en f_m.02mjmr 10.13228 f_common.topic.alias "Rock"@en f_m.02mjmr 10.13228 f_common.topic.alias "Sen Barack Obama"@en f_m.02mjmr 10.13228 f_common.topic.alias "Sen. Barack Obama"@en f_m.02mjmr 10.13228 f_common.topic.alias "Senator Barack Obama"@en f_m.02mjmr 10.13228 f_common.topic.alias "The One"@en f_m.02mjmr 10.13228 f_people.person.sibling_s f_m.044_q0m f_people.sibling_relationship.sibling f_m.03w9f63 rs_label "Maya Soetoro-Ng"@en f_m.02mjmr 10.13228 f_people.person.sibling_s f_m.0kv53ff f_people.sibling_relationship.sibling f_m.0kv53fg rs_label "George Obama"@en f_m.02mjmr 10.13228 f_people.person.sibling_s f_m.0kv53fr f_people.sibling_relationship.sibling f_m.04vi1tj rs_label "Mark Okoth Obama Ndesandjo"@en f_m.02mjmr 10.13228 f_people.person.sibling_s f_m.0n4sqh8 f_people.sibling_relationship.sibling f_m.04ct7vv rs_label "Malik Abongo Obama"@en f_m.02mjmr 10.13228 f_people.person.sibling_s f_m.0n4sqk2 f_people.sibling_relationship.sibling f_m.0n4sqk3 rs_label "Bernard Obama"@en f_m.02mjmr 10.13228 f_people.person.sibling_s f_m.0n4sqkd f_people.sibling_relationship.sibling f_m.04vy21h rs_label "David Ndesandjo"@en f_m.02mjmr 10.13228 f_people.person.sibling_s f_m.0n4sqkk f_people.sibling_relationship.sibling f_m.0n4sqkl rs_label "Abo Obama"@en f_m.02mjmr 10.13228 f_people.person.sibling_s f_m.0n4sql0 f_people.sibling_relationship.sibling f_m.0h4np5z rs_label "Auma Obama"@en f_m.02mjmr 10.13228 f_people.person.spouse_s f_m.02nqglv f_people.marriage.spouse f_m.025s5v9 rs_label "Michelle Obama"@en f_m.02mjmr 10.13228 f_people.person.children f_m.02nqgyw rs_label "Natasha Obama"@en f_m.02mjmr 10.13228 f_people.person.children f_m.0gh6dh1 rs_label "Malia Ann Obama"@en f_m.03qccxj 10.13228 f_type.object.name "Barack Obama Sr."@en f_m.03qccxj 10.13228 f_common.topic.alias "Barack Hussein Obama"@en f_m.03qccxj 10.13228 f_common.topic.alias "Barack Obama, Sr."@en f_m.03qccxj 10.13228 f_common.topic.alias "Baraka Obama"@en f_m.03qccxj 10.13228 f_common.topic.alias "barack_obama_sr"@en f_m.03qccxj 10.13228 f_people.person.sibling_s f_m.0k6m6kc f_people.sibling_relationship.sibling f_m.04y93y0 rs_label "Zeituni Onyango"@en f_m.03qccxj 10.13228 f_people.person.sibling_s f_m.0wzjdyf f_people.sibling_relationship.sibling f_m.0wzj04v rs_label "Sarah Obama"@en f_m.03qccxj 10.13228 f_people.person.spouse_s f_m.040fvqp f_people.marriage.spouse f_m.03hfxq_ rs_label "Ann Dunham"@en f_m.03qccxj 10.13228 f_people.person.spouse_s f_m.0j4l4y8 f_people.marriage.spouse f_m.040fvrd rs_label "Ruth Nidesand"@en f_m.03qccxj 10.13228 f_people.person.spouse_s f_m.0j4l4yg f_people.marriage.spouse f_m.040fvr1 rs_label "Kezia Obama"@en f_m.03qccxj 10.13228 f_people.person.children f_m.02mjmr rs_label "Barack Obama"@en f_m.03qccxj 10.13228 f_people.person.children f_m.04ct7vv rs_label "Malik Abongo Obama"@en f_m.03qccxj 10.13228 f_people.person.children f_m.04vy1tj rs_label "Mark Okoth Obama Ndesandjo"@en f_m.03qccxj 10.13228 f_people.person.children f_m.04vy21h rs_label "David Ndesandjo"@en f_m.03qccxj 10.13228 f_people.person.children f_m.0h4np5z rs_label "Auma Obama"@en f_m.03qccxj 10.13228 f_people.person.children f_m.0kv53fg rs_label "George Obama"@en f_m.03qccxj 10.13228 f_people.person.children f_m.0n4sqk3 rs_label "Bernard Obama"@en f_m.03qccxj 10.13228 f_people.person.children f_m.0n4sqkl rs_label "Abo Obama"@en Run time: setup=787ms, query=147ms, display=59ms By default, each search query retrieves at most 10 results. That can be changed to more or less with the -m parameter. For example, to retrieve 5 arbitrary person keys, we can run the following: % ./fbt-search.sh -q r_type:f_people.person -p subject -m 5 -v Loading index... Index contains 107692853 documents Searching for: r_type:f_people.person Found 3073576 matching subject(s) Printing results... f_m.03cdrkk 4.5564413 f_m.03cdrkx 4.5564413 f_m.03cdrnp 4.5564413 f_m.03cdrp0 4.5564413 f_m.03cdrpc 4.5564413 Run time: setup=1161ms, query=153ms, display=12ms We can get all results by supplying -1 for -m. For example, to retrieve all 3073576 person keys we can do this (which takes about 1.5 minutes): % ./fbt-search.sh -q r_type:f_people.person -p subject -m -1 -v > /tmp/persons.lst Loading index... Index contains 107692853 documents Searching for: r_type:f_people.person Found 3073576 matching subject(s) Collecting results... Printing results... Run time: setup=771ms, query=2883ms, display=87807ms % wc -l /tmp/persons.lst 3073576 /tmp/persons.lst Installing and Using the Python Interface ========================================= Ensure FreebaseTools (FBT) is properly configured and works from the command line by calling 'fbt-lookup.sh' or 'fbt-search.sh' using some of the examples above. The fbtools.py package was developed and tested with Python 2.7 only, for Python 3 your milage may vary. It shouldn't be too hard to make it work with Python 3, but that work remains to be done. The fbtools.py packages calls the Java version of FBT via the jnius package, so ensure you have that available on your system or install it via 'sudo pip install jnius' or similar. Make sure you install a version of jnius that works with Python 2.7. Point your PYTHONPATH to the $FBT_HOME directory, for example: % setenv PYTHONPATH ${FBT_HOME}:${PYTHONPATH} Either permanently edit the 'freebaseToolsHome' and 'freebaseToolsConfig' variables in fbtools.py or use fbt.configure immediately after import to configure the package dynamically as shown in the example dialog below. Usage Example >>> import fbtools as fbt # point to your own $FBT_HOME directory (and config file if it differs from the default): >>> fbt.configure(home='/home/hans/projects/nlp/code/freebase', config='config.dat.dist') # ignore the warning: >>> fbi = fbt.FreebaseIndex() WARN: problem reading config file `/home/hans/projects/nlp/code/freebase/config.dat.dist': null >>> fbi.describe() Number of indexed documents: 107692853 Configuration: SORT_DIR /home/hans/projects/nlp/code/freebase/sort IGNORE_LANGS /home/hans/projects/nlp/code/freebase/ignore-langs.lst IGNORE_PREDS /home/hans/projects/nlp/code/freebase/ignore-preds.lst IGNORE_VALUES /home/hans/projects/nlp/code/freebase/ignore-values.lst LUCENE_DEFAULT_FIELD rs_label LUCENE_INDEX /home/hans/projects/nlp/code/freebase/data/basekb-gold-jan-2015.shrink.sort.index LUCENE_INDEXED_PREDS /home/hans/projects/nlp/code/freebase/indexed-preds.lst LUCENE_INDEX_ANALYZER_DEFAULT org.apache.lucene.analysis.standard.StandardAnalyzer LUCENE_INDEX_OPTIONS "-nn -v" >>> fbi.lookup('f_m.0h54qv8') > >>> fbi.getFieldValue('f_m.0h54qv8', 'rs_label') '"Henry Higgins"@en' >>> fbi.getFieldValues('f_m.0h54qv8', 'r_type') ['f_common.topic', 'f_people.deceased_person', 'f_people.person'] >>> fbi.getFieldValues('/m/0h54qv8', 'r_type') ['f_common.topic', 'f_people.deceased_person', 'f_people.person'] >>> fbi.getDocumentId('f_m.0h54qv8') 69442668 >>> fbi.getDocument(69442668) > >>> import pprint >>> pp = pprint.PrettyPrinter(indent=4) >>> pp.pprint(fbi.fetch('f_m.0h54qv8')) { 'f_common.topic.article': 'f_m.0h54qvd', 'f_common.topic.description': '"Henry Hugh Higgins was an English botanist, bryologist, geologist, curator and clergyman. He is cited as an authority in scientific classification, as Higgins.\\nHe was inspector of the National Schools in Liverpool from 1842 to 1848 and chaplain to the Rainhill Asylum, also in Liverpool. He was also president of the Liverpool Field Naturalists\' Club from 1861 to 1881.\\nHe especially worked on the Ravenhead collections, almost wholly made up of Upper Carboniferous flora, fish, bivalves and insect remains. Higgins had suggested that Ravenhead donate his collections to the Liverpool Museum and the donation gained a home with the construction of the railway in 1870, which exposed two Carboniferous seams known as the Upper and Lower Ravenhead. Most of Liverpool Museum\'s collections survived the Liverpool Blitz of May 1941 which practically destroyed the Museum itself, but the entire Ravenhead collection was lost in the fire."@en', 'f_common.topic.notable_for': 'f_g.125crzjzl', 'f_common.topic.notable_types': 'f_m.022tfrk', 'f_common.topic.topic_equivalent_webpage': [ 'we_Henry_Higgins_(botanist)', 'we_index.html?curid=32997517'], 'f_people.deceased_person.date_of_death': '"1893"^^', 'f_people.person.date_of_birth': '"1814"^^', 'f_people.person.gender': 'f_m.05zppz', 'f_people.person.profession': 'f_m.036n1', 'f_type.object.name': '"Henry Higgins"@en', 'fk_key.wikipedia.en': [ '"Henry_Higgins_$0028botanist$0029"', '"Henry_Hugh_Higgins"'], 'fk_key.wikipedia.en_id': '"32997517"', 'fk_key.wikipedia.en_title': '"Henry_Higgins_$0028botanist$0029"', 'r_type': [ 'f_common.topic', 'f_people.deceased_person', 'f_people.person'], 'rs_label': '"Henry Higgins"@en', 'subject': 'f_m.0h54qv8'} >>> fbi.search('Henry Higgins AND r_type:f_people.person', maxHits=5) [(69442668, 8.640470504760742), (93746332, 8.640470504760742), (3290194, 4.759587287902832), (15812, 4.0124993324279785), (63493, 4.0124993324279785)] >>> pp.pprint(fbi.fetch(69442668)) { 'f_common.topic.article': 'f_m.0h54qvd', 'f_common.topic.description': '"Henry Hugh Higgins was an English botanist, bryologist, geologist, curator and clergyman. He is cited as an authority in scientific classification, as Higgins.\\nHe was inspector of the National Schools in Liverpool from 1842 to 1848 and chaplain to the Rainhill Asylum, also in Liverpool. He was also president of the Liverpool Field Naturalists\' Club from 1861 to 1881.\\nHe especially worked on the Ravenhead collections, almost wholly made up of Upper Carboniferous flora, fish, bivalves and insect remains. Higgins had suggested that Ravenhead donate his collections to the Liverpool Museum and the donation gained a home with the construction of the railway in 1870, which exposed two Carboniferous seams known as the Upper and Lower Ravenhead. Most of Liverpool Museum\'s collections survived the Liverpool Blitz of May 1941 which practically destroyed the Museum itself, but the entire Ravenhead collection was lost in the fire."@en', 'f_common.topic.notable_for': 'f_g.125crzjzl', 'f_common.topic.notable_types': 'f_m.022tfrk', 'f_common.topic.topic_equivalent_webpage': [ 'we_Henry_Higgins_(botanist)', 'we_index.html?curid=32997517'], 'f_people.deceased_person.date_of_death': '"1893"^^', 'f_people.person.date_of_birth': '"1814"^^', 'f_people.person.gender': 'f_m.05zppz', 'f_people.person.profession': 'f_m.036n1', 'f_type.object.name': '"Henry Higgins"@en', 'fk_key.wikipedia.en': [ '"Henry_Higgins_$0028botanist$0029"', '"Henry_Hugh_Higgins"'], 'fk_key.wikipedia.en_id': '"32997517"', 'fk_key.wikipedia.en_title': '"Henry_Higgins_$0028botanist$0029"', 'r_type': [ 'f_common.topic', 'f_people.deceased_person', 'f_people.person'], 'rs_label': '"Henry Higgins"@en', 'subject': 'f_m.0h54qv8'} >>> pp.pprint(fbi.retrieve('Henry Higgins AND r_type:f_people.person', maxHits=5)) [ { '_docid': 69442668, '_score': 8.640470504760742, 'f_common.topic.article': 'f_m.0h54qvd', 'f_common.topic.description': '"Henry Hugh Higgins was an English botanist, bryologist, geologist, curator and clergyman. He is cited as an authority in scientific classification, as Higgins.\\nHe was inspector of the National Schools in Liverpool from 1842 to 1848 and chaplain to the Rainhill Asylum, also in Liverpool. He was also president of the Liverpool Field Naturalists\' Club from 1861 to 1881.\\nHe especially worked on the Ravenhead collections, almost wholly made up of Upper Carboniferous flora, fish, bivalves and insect remains. Higgins had suggested that Ravenhead donate his collections to the Liverpool Museum and the donation gained a home with the construction of the railway in 1870, which exposed two Carboniferous seams known as the Upper and Lower Ravenhead. Most of Liverpool Museum\'s collections survived the Liverpool Blitz of May 1941 which practically destroyed the Museum itself, but the entire Ravenhead collection was lost in the fire."@en', 'f_common.topic.notable_for': 'f_g.125crzjzl', 'f_common.topic.notable_types': 'f_m.022tfrk', 'f_common.topic.topic_equivalent_webpage': [ 'we_Henry_Higgins_(botanist)', 'we_index.html?curid=32997517'], 'f_people.deceased_person.date_of_death': '"1893"^^', 'f_people.person.date_of_birth': '"1814"^^', 'f_people.person.gender': 'f_m.05zppz', 'f_people.person.profession': 'f_m.036n1', 'f_type.object.name': '"Henry Higgins"@en', 'fk_key.wikipedia.en': [ '"Henry_Higgins_$0028botanist$0029"', '"Henry_Hugh_Higgins"'], 'fk_key.wikipedia.en_id': '"32997517"', 'fk_key.wikipedia.en_title': '"Henry_Higgins_$0028botanist$0029"', 'r_type': [ 'f_common.topic', 'f_people.deceased_person', 'f_people.person'], 'rs_label': '"Henry Higgins"@en', 'subject': 'f_m.0h54qv8'}, { '_docid': 93746332, '_score': 8.640470504760742, 'f_common.topic.article': 'f_m.0ll1ywd', 'f_common.topic.description': '"Henry Higgins was an English bullfighter, who was born in Bogot\\u00E1, Colombia in 1944. He died as a result of a hang-gliding accident, while demonstrating it by jumping off a 200 ft high hill in 1978. He was educated at King Williams College in the Isle of Man."@en', 'f_common.topic.notable_for': 'f_g.12q4p343t', 'f_common.topic.notable_types': 'f_m.04kr', 'f_common.topic.topic_equivalent_webpage': [ 'we_Henry_Higgins_(bullfighter)', 'we_index.html?curid=36799328'], 'f_people.person.profession': 'f_m.01kr58', 'f_type.object.name': '"Henry Higgins"@en', 'fk_key.wikipedia.en': '"Henry_Higgins_$0028bullfighter$0029"', 'fk_key.wikipedia.en_id': '"36799328"', 'fk_key.wikipedia.en_title': '"Henry_Higgins_$0028bullfighter$0029"', 'r_type': ['f_common.topic', 'f_people.person'], 'rs_label': '"Henry Higgins"@en', 'subject': 'f_m.0ll1yw8'}, { '_docid': 3290194, '_score': 4.759587287902832, 'f_common.topic.article': 'f_m.04j19m', 'f_common.topic.description': '"Terence Langley Higgins, Baron Higgins KBE DL PC is a retired British Conservative politician and Commonwealth Games silver medalist winner for England.\\nHiggins was Member of Parliament for Worthing from 1964 to 1997, and Financial Secretary to the Treasury between 1972 and 1974.\\nHe served in the RAF from 1946 to 1948, and was a member of British Olympic Team in 1948 and 1952. He was created a life peer as Baron Higgins, of Worthing in the County of West Sussex on 28 October 1997. While in opposition, he served as the Conservative shadow minister for work and pensions in the House of Lords. He was appointed a Knight Commander of the Order of the British Empire in the 1993 New Years Honours List."@en', 'f_common.topic.notable_for': 'f_g.1257q_vsp', 'f_common.topic.notable_types': 'f_m.02xlh55', 'f_common.topic.topic_equivalent_webpage': [ 'we_Terence_Higgins,_Baron_Higgins', 'we_index.html?curid=1215819'], 'f_government.politician.government_positions_held': 'f_m.04ntzv0', 'f_government.politician.party': 'f_m.04htc0_', 'f_people.person.date_of_birth': '"1928-01-18"^^', 'f_people.person.gender': 'f_m.05zppz', 'f_people.person.nationality': 'f_m.07ssc', 'f_type.object.name': '"Terence Higgins, Baron Higgins"@en', 'fk_key.en': '"terence_higgins_baron_higgins"', 'fk_key.wikipedia.en': [ '"Baron_Higgins"', '"Lord_Higgins"', '"Terence_Higgins$002C_Baron_Higgins"', '"Terence_Langley_Higgins"'], 'fk_key.wikipedia.en_id': '"1215819"', 'fk_key.wikipedia.en_title': '"Terence_Higgins$002C_Baron_Higgins"', 'r_type': [ 'f_common.topic', 'f_government.politician', 'f_people.person', 'f_royalty.chivalric_order_member', 'f_royalty.noble_person'], 'rs_label': '"Terence Higgins, Baron Higgins"@en', 'subject': 'f_m.04j19g'}, { '_docid': 15812, '_score': 4.0124993324279785, 'f_common.topic.article': 'f_m.03chhdq', 'f_common.topic.description': '"Debra Elaine Higgins is a Canadian provincial politician, who was the Saskatchewan New Democratic Party member of the Legislative Assembly of Saskatchewan for the constituency of Moose Jaw Wakamow from 1999 to 2011. She is currently the mayor of Moose Jaw, Saskatchewan, having been elected as the city\'s first female mayor in the Saskatchewan municipal elections, 2012.\\nShe was first elected in the 1999 election and was re-elected in the 2003 and 2007 elections. Higgins served in the cabinet of Lorne Calvert as the Minister of Labour and later as the Minister of Learning.\\nAfter the defeat of the NDP government in the 2007 election, Higgins has served as the NDP critic for municipal affairs, liquor and gaming, and women\'s issues.\\nOn January 30, 2009, she announced her bid to succeed Calvert as Saskatchewan NDP leader at the party\'s June 2009 leadership convention. Higgins ran on the theme of renewal and defeating Premier Brad Wall. In the end she finished last of four candidates with Dwain Lingenfelter being the victor.\\nIn the 2011 election Higgins was defeated in her riding by Greg Lawrence of the Saskatchewan Party.\\nHiggins got her start in politics when she became involved with the UFCW union in 1982 while working at a Safeway grocery store. She later served as the President of the UFCW Council from 1993 to 1999, during which period she also served as a table officer for the Moose Jaw & District Labour Council."@en', 'f_common.topic.notable_for': 'f_g.12556xmf4', 'f_common.topic.notable_types': 'f_m.04kr', 'f_common.topic.topic_equivalent_webpage': [ 'we_Deb_Higgins', 'we_index.html?curid=13762292'], 'f_government.politician.party': 'f_m.0lr0_qy', 'f_people.person.date_of_birth': '"1954"^^', 'f_people.person.gender': 'f_m.02zsn', 'f_people.person.places_lived': 'f_m.0wllybw', 'f_type.object.name': '"Deb Higgins"@en', 'fk_key.en': '"deb_higgins"', 'fk_key.source.videosurf': '"125328"', 'fk_key.wikipedia.en': '"Deb_Higgins"', 'fk_key.wikipedia.en_id': '"13762292"', 'fk_key.wikipedia.en_title': '"Deb_Higgins"', 'r_type': ['f_common.topic', 'f_people.person'], 'rs_label': '"Deb Higgins"@en', 'subject': 'f_m.03chhdl'}, { '_docid': 63493, '_score': 4.0124993324279785, 'f_common.topic.article': 'f_m.03d0fnj', 'f_common.topic.description': '"Terence John Higgins is Chief Justice of the Australian Capital Territory, a territory of Australia."@en', 'f_common.topic.notable_for': 'f_g.125dtp53k', 'f_common.topic.notable_types': 'f_m.04kr', 'f_common.topic.topic_equivalent_webpage': [ 'we_Terence_Higgins_(judge)', 'we_index.html?curid=14320511'], 'f_people.person.date_of_birth': '"1943"^^', 'f_people.person.education': 'f_m.0sw2b_6', 'f_people.person.gender': 'f_m.05zppz', 'f_people.person.place_of_birth': 'f_m.0chghy', 'f_type.object.name': '"Terence Higgins"@en', 'fk_key.en': '"terence_john_higgins"', 'fk_key.wikipedia.en': [ '"Terence_Higgins_$0028judge$0029"', '"Terence_John_Higgins"', '"Terrence_John_Higgins"'], 'fk_key.wikipedia.en_id': '"14320511"', 'fk_key.wikipedia.en_title': '"Terence_Higgins_$0028judge$0029"', 'r_type': ['f_common.topic', 'f_people.person'], 'rs_label': '"Terence Higgins"@en', 'subject': 'f_m.03d0fnd'}]