LANDER:as to org mapping subsidiary linkage-20101019 From Predict README version: 3020, last modified: 2012-06-25. This file describes the trace dataset "as_to_org_mapping_subsidiary_linkage-20101019" provided by the LANDER project. The most recent version of this file can be found on-line at http://wiki.isi.edu/predict/index.php/LANDER:as_to_org_mapping_subsidiary_linkage-20101019. This is a derived dataset processed on 2012-06-18, with data obtained from sources below: * U.S. Securities and Exchange Commission***s (SEC) EDGAR database. http://www.sec.gov/investor/pubs/edgarguide.htm, 2010. * Regional Internet Registry (RIR) WHOIS database. http://www.afrinic.net/, http://www.apnic.net/, http://www.arin.net/, http://www.lacnic.net/, http://www.ripe.net/, October 2010. LANDER Metadata http://wiki.isi.edu/predict/index.php/LANDER:as_to_org_mapping_subsidiary_linkage-20101019/landermeta) +------------------------------------------------------------------------------------------------------------------------+ |dataSetName |as_to_org_mapping_subsidiary_linkage-20101019 | |--------------------------+---------------------------------------------------------------------------------------------| |status |usc-web-and-predict | |--------------------------+---------------------------------------------------------------------------------------------| |shortDesc |Links between ASes and subsidiaries | |--------------------------+---------------------------------------------------------------------------------------------| |longDesc |This dataset provides a linking between ASes and company subsidiaries. It is derived from | | |WHOIS database and Form 10-K filings. | | | | | |The linking is useful to associate ASes that belong to different subsidiaries of the same | | |organization. We determined the links by automatic record linkage algorithms and followed by | | |manual verification and pruning. The general idea is to compare how similar the name of an AS| | |is to the name of a subsidiary. Due to the inaccuracy of automatic linkage, we then manually | | |verify and prune the links for selected most important organizations. | |--------------------------+---------------------------------------------------------------------------------------------| |datasetCategory |InternetTopologyData | |--------------------------+---------------------------------------------------------------------------------------------| |datasetSubCategory |AS Organizational Data | |--------------------------+---------------------------------------------------------------------------------------------| |requestReviewRequired |true | |--------------------------+---------------------------------------------------------------------------------------------| |productReviewRequired |false | |--------------------------+---------------------------------------------------------------------------------------------| |ongoingMeasurement |false | |--------------------------+---------------------------------------------------------------------------------------------| |collectionStartDate |2010-10-19 | |--------------------------+---------------------------------------------------------------------------------------------| |collectionStartTime |00:00:00 | |--------------------------+---------------------------------------------------------------------------------------------| |collectionEndDate |2010-12-31 | |--------------------------+---------------------------------------------------------------------------------------------| |collectionEndTime |00:00:00 | |--------------------------+---------------------------------------------------------------------------------------------| |availabilityStartDate | | |--------------------------+---------------------------------------------------------------------------------------------| |availabilityStartTime | | |--------------------------+---------------------------------------------------------------------------------------------| |availabilityEndDate | | |--------------------------+---------------------------------------------------------------------------------------------| |availabilityEndTime | | |--------------------------+---------------------------------------------------------------------------------------------| |anonymization |false | |--------------------------+---------------------------------------------------------------------------------------------| |archivingAllowed | | |--------------------------+---------------------------------------------------------------------------------------------| |keywords |internet, topology, AS, organization, subsidiary, linking | |--------------------------+---------------------------------------------------------------------------------------------| |format |text | |--------------------------+---------------------------------------------------------------------------------------------| |access |https | |--------------------------+---------------------------------------------------------------------------------------------| |hostName |USC-LANDER | |--------------------------+---------------------------------------------------------------------------------------------| |privateAccessInstructions |See http://www.isi.edu/ant/traces/index.html#getting_datasets for information on obtaining | | |this dataset. | | |See | | |http://wiki.isi.edu/predict/index.php/LANDER:as_to_org_mapping_subsidiary_linkage-20101019 | | |for details on this dataset. | +------------------------------------------------------------------------------------------------------------------------+ Dataset Contents as_to_org_mapping_subsidiary_linkage-20101019.README.txt copy of this README The IDs and names of organizations. orgs.fsdb Only US public companies are included. The selected most important orgs_selected.fsdb organizations, a subset of orgs.fsdb. The subsidiaries of organizations in orgs.fsdb, subsidiaries.fsdb including names of both organizations and their subsidiaries. ases.fsdb The ASNs and names of ASes. The manually verified and pruned links links_selected.fsdb between ASes and subsidiaries of the selected organizations. The original Form form10k.tar.bz2 10-K filings of organizations in orgs.fsdb. The extracted Exhibit 21 contained in each ex21.tar.bz2 Form 10-K filing that provides information about organization subsidiaries. _sha1sum SHA-1 checksum The file "_sha1sum" contains SHA1 checksums of individual compressed files. The integrity of the distribution thus can be checked by independently calculating SHA1 sums of files and comparing them with those listed in the file. If you have the sha1sum utility installed on your system, you can do that by executing: sha1sum --check _sha1sum Data Format Syntax Each of the *.fsdb files are in FSDB file format---this is a simple, white-space-separated text database format, where each line is a database row and whitespace separates columns. Schema Each *.fsdb file is a simple database. In orgs.fsdb and orgs_selected.fsdb, each row is an organization, and the 3 columns provide information about it. +------------------------------------------------------------------------+ | /* the following fields are derived from EDGAR databse */ | |------------------------------------------------------------------------| | cik | Central Index Key (CIK), the unique identifier of the | | | organization in the EDGAR databse. | |-----------+------------------------------------------------------------| | orgname | the name of the organization. | |-----------+------------------------------------------------------------| | | the accession number that identifies the Form 10-K filing | | | of the organization. Use this number to find the original | | | filing in form10k.tar.bz2 and the extracted exhibit 21 in | | | ex21.tar.bz2. For example, if the accession number is | | accession | "0000002178-10-000008", then decompress form10k.tar.bz2, | | | and the original Form 10-K filing will be | | | "form10k/0000002178-10-000008.txt". We also provide the | | | already extracted Exhibit 21 in each filing. Decompress | | | ex21.tar.bz2, and the exhibit 21 of the organization will | | | be "ex21/0000002178-10-000008-ex21.htm". | +------------------------------------------------------------------------+ In subsidiaries.fsdb, each row is a subsidiary, 2 columns provide information about it. +------------------------------------------------------------------------+ | /* the following fields are derived from EDGAR databse */ | |------------------------------------------------------------------------| | cik | Central Index Key (CIK), the unique identifier of the | | | organization to which the subsidiary belong. | |------------+-----------------------------------------------------------| | subsidiary | the name of the subsidiary. | +------------------------------------------------------------------------+ In ases.fsdb, each row is an AS, 2 columns provide information about it. +-----------------------------------------------------------+ | /* the following fields are derived from WHOIS databse */ | |-----------------------------------------------------------| | asn | the unique identifier of the AS. | |----------------+------------------------------------------| | asname | the name of the AS. | +-----------------------------------------------------------+ In links_selected.fsdb, each row is a link between a subsidiary and an AS. 4 columns provide information about it. +------------------------------------------------------------------------+ | /* the following fields are derived from EDGAR databse */ | |------------------------------------------------------------------------| | cik | Central Index Key (CIK), the unique identifier of the | | | organization to which the subsidiary belong. | |------------+-----------------------------------------------------------| | subsidiary | the name of the subsidiary. | |------------------------------------------------------------------------| | /* the following fields are derived from WHOIS databse */ | |------------------------------------------------------------------------| | asn | the unique identifier of the AS. | |------------+-----------------------------------------------------------| | asname | the name of the AS. | +------------------------------------------------------------------------+ If the value in a certain column is "-", it means the info is not available. Included Data We also include the raw SEC 10-K data obtained from the EDGAR database. This data is distributed freely by the SEC at [1] with the statement "Anyone can access and download this information for free". form10k.tar.bz2 contains the original Form 10-K filings of organizations in orgs.fsdb. Each filing is in text format. Each filing contains an exhibit 21 that lists the organization's subsidiaries. The exhibit 21 is in html format. We extract these exhibits from all 10-K filings and store them in ex21.tar.bz2. Use the "accession" number stored in orgs.fsdb to find the corresponding Form 10-K and Exhibit 21 of an organization. How organization vs. subsidiary files relate The organization file (orgs.fsdb) and the subsidiary file (subsidiaries.fsdb) relate to each other. The organization file lists all US public companies, one organization per row. The subsidiary file lists these organizations' subsidiaries we extracted from their 10-K filings, one subsidiary per row. To see what organization an subsidiary belongs to, join by cik, the organization file with the subsidiary file. How subsidiary, AS and link files relate The subsidiary file (subsidiaries.fsdb), AS file (ases.fsdb) and link file (links_selected.fsdb) relate to each other. The subsidiary file lists subsidiaries and their organization IDs, one subsidiary per row. The AS file lists ASes, one AS per row. We link subsidiaries with ASes by their names, and these links are stored in the link file. Note that limited by the manual effort, only links to the subsidiaries belonging to the selected most important organizations in orgs_selected.fsdb are included. Linking Method This dataset provides a linking between ASes and company subsidiaries. The linking is useful to associate ASes that belong to different subsidiaries of the same organization. We determined the links by automatic record linkage algorithms and followed by manual verification and pruning. The general idea is to compare how similar the name of an AS is to the name of a subsidiary. Due to the inaccuracy of automatic linkage, we then manually verify and prune the links for selected most important organizations. Details about our methodology are in technical report: * Xue Cai, John Heidemann, Balachander Krishnamurthy, and Walter Willinger. An Organization-Level View of the Internet and its Implications (Extended). Technical Report ISI-TR-2012-679, USC/Information Sciences Institute, June, 2012. ftp://ftp.isi.edu/isi-pubs/tr-679.pdf Citation If you use this trace to conduct additional research, please cite it as: PREDICT ID: USC-LANDER/as_to_org_mapping_subsidiary_linkage-20101019/rev3020. Traces generated on 2012-06-18. Provided by the USC/LANDER project (http://www.isi.edu/ant/lander). Results Using This Dataset This dataset has been used in the following previously published work: * Xue Cai, John Heidemann, Balachander Krishnamurthy, and Walter Willinger. An Organization-Level View of the Internet and its Implications (Extended). Technical Report ISI-TR-2012-679, USC/Information Sciences Institute, June, 2012. ftp://ftp.isi.edu/isi-pubs/tr-679.pdf User Annotations Suggestion: Edit the annotations at http://wiki.isi.edu/predict/index.php?title=LANDERNOTES:as_to_org_mapping_subsidiary_linkage-20101019action=edit Currently no annotations. Categories Retrieved from "http://wiki.isi.edu/predict/index.php?title=LANDER:as_to_org_mapping_subsidiary_linkage-20101019oldid=3020" Categories: * LANDER:PredictCategory:InternetTopologyData * LANDER:PredictCategory:InternetTopologyData/AS Organizational Data * Datasets * LANDER * LANDER:Datasets