Record linkage is the problem of determining the matches between two data sources. However, as data sources become larger and larger, this task becomes difficult and expensive. To aid in this process, blocking is the efficient generation of candidate matches which can then be examined in detail later to determine whether or not they are true matches. So, blocking is a preprocessing step to make record linkage a more scalable process. The goal of blocking is to create a candidate set of matches that has as many true matches as possible, while minimizing the number of fake matches (false positives).
The BSL system presented here does this in the supervised setting of record linkage. This means that given some training matches, it can discover rules (a blocking scheme) to efficiently generate candidate matches between the sets. It is recommended that you first read the paper to get an idea of how the algorithm works, especially if you are unfamiliar with record linkage and blocking.
After filling out the form below, just download BSL.zip and unzip it. From there, all instructions for configuring the system to run are presented in detail in the README.txt file. To save you some time, you will need Java and MySQL to run the software. Please direct all inquiries here.
This License Agreement (the "Agreement") is entered, effective this date, by and between University of Southern California, and the individual executing this Agreement below as "Licensee" (hereinafter, the "Licensee").
WHEREAS, USC has developed the Blocking Scheme Learner (BSL) package and related documentation (the "Software"); and
WHEREAS, Licensee desires, and USC is willing to grant to Licensee, a license to use the Software in accordance with this Agreement;
NOW, in consideration of the foregoing, the mutual covenants hereinafter set forth, and for other good and valuable consideration, the receipt and sufficiency of which is hereby acknowledged, the parties agree as follows:
1. USC hereby grants Licensee a royalty-free, non-exclusive,
non-transferable right to use the Software as follows solely for a
(a) Licensee may prepare derivative works (the "Derivative Works") which are based on or incorporate all or part of the Software, including, without limitation, works (the "Adaptations") which
2. All copies of the Software and Derivative Works prepared in accordance with paragraph 1 shall retain the copyright notice appearing in the Software. If the Software includes computer programs in object code form, Licensee shall not de-compile, reverse engineer or disassemble such programs.
3. As used in this Agreement, "Non-Commercial Purpose" means use of the Software and Derivative Works solely for education or research. "Non-Commercial Purpose" excludes, without limitation, any use of the Software or Derivative Works for, as part of, or in any way in connection with a product (including software) or service which is sold, offered for sale, licensed, leased, loaned or rented.
4. Licensee hereby grants USC a non-exclusive, royalty-free, fully paid-up, worldwide, perpetual license to:
(a) Reproduce, prepare derivative works based on and distribute all or part of the Adaptations; and
5. This Agreement is personal between USC and Licensee. No ownership interest in the Software (or the copy of which is provided by USC pursuant to paragraph 1) is transferred to Licensee. Licensee's interest in the Derivative Works is limited solely to Licensee's additions and the Derivative Works are subject in their entirety to USC's intellectual property rights. USC may assign or transfer to any company or person, or grant to any company or person a license or sublicense under, all or part of its interest in any rights to the Software, this Agreement, or any license granted to USC hereunder. Licensee may not assign, transfer or sublicense Licensee's rights hereunder without the written consent of USC.
6. USC may terminate this Agreement at any time by sending written notice of termination to Licensee at the address specified below. Termination shall be effective as provided in the notice. Unless the notice shall provide otherwise, upon termination, Licensee shall destroy all copies of the Software and Derivative Works. Licensee's obligations under this Agreement, including any rights granted to USC pursuant to paragraph 5, shall survive and continue after termination.
7. Licensee shall not, directly or indirectly, export the Software or any Derivative Work to any country to which such export is prohibited by law.
8. Licensee agrees to comply with all export laws and restrictions and regulations of the United States or foreign agencies or authorities, and not to export or re-export the Software or any direct product thereof in violation of any such restrictions, laws or regulations, or without all necessary approvals. Neither the Software nor the underlying information or technology may be downloaded or otherwise exported or re-exported
(i) into Cuba, Iran, Iraq, Libya, North Korea, Sudan, Syria, Serbia, Taliban-controlled portions of Afghanistan or any other country subject to U.S. trade sanctions covering the Software, to individuals or entities controlled by such countries, or to nationals or residents of such countries other than nationals who are citizens or lawfully admitted permanent residents of the United States and not currently domiciled in countries subject to such sanctions; or
9. USC has no obligation to support or maintain the Software and grants Licensee this right to use the Software "AS IS". LICENSEE ASSUMES TOTAL RESPONSIBILITY AND RISK FOR LICENSEE'S USE OF THE SOFTWARE. USC DOES NOT MAKE, AND EXPRESSLY DISCLAIMS, ANY EXPRESS OR IMPLIED WARRANTIES, REPRESENTATIONS OR ENDORSEMENTS OF ANY KIND WHATSOEVER, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, AND THE WARRANTIES OF TITLE OR NON-INFRINGEMENT. IN NO EVENT SHALL USC BE LIABLE FOR
(a) ANY INCIDENTAL, CONSEQUENTIAL, OR INDIRECT DAMAGES (INCLUDING, WITHOUT LIMITATION, DAMAGES FOR LOSS OF PROFITS, BUSINESS INTERRUPTION, LOSS OF PROGRAMS OR INFORMATION, AND THE LIKE) ARISING OUT OF THE USE OF OR INABILITY TO USE THE SOFTWARE, EVEN IF USC OR ANY OF ITS AUTHORIZED REPRESENTATIVES HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES,
10. This Agreement shall be governed by and construed in accordance with the laws of the State of California, USA, applicable to agreements made and to be performed wholly therein without regard to its conflicts of law rules. Any cause of action or claim Licensee may have with respect to the Software must be brought within one (1) year after the claim or cause of action arises or such claim or cause of action is barred. USC's failure to insist upon or enforce strict performance of any provision of this Agreement is not a waiver of any provision or right.
11. If a dispute arises out of, or relates to, this Agreement or the subject matter of this Agreement, either party may submit the dispute to a sole mediator selected by the parties or, at any time prior to selection of a sole mediator, to mediation by the American Arbitration Association ("AAA"). If not thus resolved, it shall be referred to a sole arbitrator selected by the parties or to the AAA for arbitration. The arbitration shall be governed by the United States Arbitration Act, shall be conducted in the County of Los Angeles, California, USA, and judgment on the award may be entered by any court having jurisdiction. The arbitrator shall not limit, expand or modify the terms of the Agreement nor award damages in excess of compensatory damages, and each party waives any claim to excess damages. A request by a party to a court for interim protection shall not affect either party's obligation hereunder to mediate and arbitrate. Each party shall bear its own expenses and an equal share of all cost and fees of the mediation and/or arbitration. Any arbitrator selected shall be competent in the legal and technical aspects of the subject matter of this Agreement. The content and result of mediation and/or arbitration shall be held in confidence by all participants.
I certify that I am not a national or a resident of Cuba, Iran, Iraq, Libya, North Korea, Sudan, Syria, Serbia, Taliban-controlled portions of Afghanistan or any other country subject to U.S. trade sanctions, nor, to the best of my knowledge, have I been designated a Specially Designated National, Blocked Person, or otherwise been denied export-related privileges by the United States Government.