ISI Researchers Recognized for Reproducibility

May 8, 2017

A 2013 journal paper co-authored by Information Sciences Institute Director of Knowledge Technologies Yolanda Gil and postdoctoral student Daniel Garijo is among a handful of articles chosen to represent the best of open-access publishing.

Gil is also a USC Viterbi research professor in computer science.

One of just 20 selected from 170,000 to appear in the 10-Year Anniversary Datasets Collection curated by the Public Library of Science (PLOS), the article demonstrates how difficult scientific research can be to reproduce even under the best of circumstances. PLOS is a nonprofit open-access publisher that seeks to promote sharing of data and research methodologies to speed scientific progress, ensure reproducible results, and other aims.

The collection features articles whose underlying datasets "have proven to be important or widely used or are particularly well reported," according to editors of PLOS ONE, the organization's flagship journal. PLOS ONE emphasizes rigorous science across the research spectrum. Sixteen included disciplines range from information sciences, bioinformatics and computational biology to paleontology, applied psychology, economic geography, animal science and conservation science. PLOS has published more than 11,000 papers in computational biology, which the Gil-Garijo article represents, over the decade.

Their article, "Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome," sprang from a challenge posed by University of California San Diego professor Phil Bourne to test the reproducibility of his own research. Bourne provided the community with datasets, software, and his original proposal, and asked for ideas to improve the publication of scientific materials.

But the study proved difficult to reproduce even with that documentation. For example, work had been done with proprietary software that no longer existed, forcing the team to recreate its functionality with open-source tools. "Imagine how much more difficult reproducibility is without so much access to the original author's data and software," says Gil. "Unfortunately, that is the norm in science."

Once the study was reproduced, all the data, software, and workflows were made available in shared repositories with open licenses, and well documented in an accompanying website. Among the outcomes: an approach for quantifying effort necessary to reproduce previous research, and recommendations for publishing reproducible work.

Founded in 2001, PLOS calls itself an "open access publisher, innovator and advocacy organization." The 10-Year Anniversary Datasets Collection entries are discussed in a blog accompanying the announcement. Says Gil, "This kind of recognition by publishers is important because it encourages all of us in the scientific community to redouble our efforts to practice open and reproducible science."

Gil and Garijo share their insights through Scientific Papers of the Future, an initiative to train and encourage scientists to adopt best practices of reproducible research, open science, and digital scholarship.