Publications
DRAT: An Unobtrusive, Scalable Approach to Large Scale Software License Analysis
Abstract
The Apache Release Audit Tool (RAT) performs software open source license auditing and checking, however RAT fails to successfully audit today's large code bases. Being a natural language processing (NLP) tool and a crawler, RAT marches through a code base, but uses rudimentary black lists and white lists to navigate source code repositories, and often does a poor job of identifying source code versus binary files. In addition RAT produces no incremental output and thus on code bases that themselves are "Big Data", RAT could run for e.g., a month and still not provide any status report. We introduce Distributed "RAT" or the Distributed Release Audit Tool (DRAT). DRAT overcomes RAT's limitations by leveraging: (1) Apache Tika to automatically detect and classify files in source code repositories and determine what is a binary file, what is source code, what are notes that need skipping, etc. (2) Apache Solr …
- Date
- 2015
- Authors
- Chris A Mattmann, Ji-Hyun Oh, Tyler Palsulich, Lewis John McGibbney, Yolanda Gil, Varun Ratnakar
- Conference
- 2015 30th IEEE/ACM International Conference on Automated Software Engineering Workshop (ASEW)
- Pages
- 97-101
- Publisher
- IEEE