Leveling Up Research and Publishing in Geoscience’s Open-Source Era
Ancient rocks from the Southern Ocean; satellite images of the Earth’s atmosphere; soil samples from the Mojave Desert trenches. The 18 geoscientists who gathered for the first FROGS (Facilitating Reproducible Open GeoScience) workshop at USC Viterbi’s Information Sciences Institute (ISI) from June 3-6, 2024 may study a wide and diverse range of data sources, but they came with a common goal: to learn how to level up their research, data sharing, and publishing techniques.
Participants like Jhon Mojica, a senior researcher at the University of Miami working with NOAA (National Oceanic and Atmospheric Administration), got an introduction to Scientific Python and R programming languages. Mojica said, “I’m leading projects on the expansion of Port Everglades and water quality around South Florida. Learning to use Python to automate data processing will make our studies more robust and efficient.”
They were taught methods such as spectral analysis to interpret environmental variability over different timescales. Pranaykumar Tirpude, a Ph.D. student at the University of Delaware whose research involves studying 1.4 million years of data from the Southern Ocean said, “Implementing these techniques will help me better understand climate cycles and the stability of ice sheets over geological time.”
Setting sail with PyRATES
The workshop was hosted by LinkedEarth, an initiative that brings together AI and paleoclimate research to create a cohesive understanding of historical climate data by revolutionizing the way data is managed and analyzed.
This particular curriculum, dubbed PyRATES (Python and R Analysis of Time SerieS), catered to researchers with little to no experience in computer programming languages (i.e., Python and R) and was led by Deborah Khider, paleoclimatologist and Research Scientist at ISI; Julien Emile-Geay, Professor of Earth Sciences at USC Dornsife College of Letters, Arts & Sciences; and from Northern Arizona University’s School of Earth and Sustainability, Associate Professor Nick McKay and Data Scientist David Edge.
Khider, Emile-Geay, McKay and Edge set out with a lofty goal – “to elevate participants’ research to the next level by equipping them with advanced techniques that were previously beyond their reach, and helping them apply these methods to a broader array of datasets than previously thought possible.” Khider explained, “With PyRATES, we wanted participants to come out with the skills to do their science using open source libraries while sharing their own project openly and in a reproducible manner.”
Embracing open science with FAIR publishing
In addition to the advanced research methods, FROGS participants were taught best practices for FAIR science publishing – the principles of making research Findable, Accessible, Interoperable and Reusable. Through hands-on sessions, they learned techniques for data versioning, managing metadata, using open data repositories, and applying appropriate licensing. These skills enhance reproducibility and collaboration, ensuring data and findings are easily shared, accessed, and usable by others in the scientific community, meeting the evolving standards of journals and funding agencies.
Kathryn Chen, a biological oceanographer at the Scripps Institution of Oceanography, highlighted the workshop’s emphasis on this, “I am currently drafting my first paper, so I particularly appreciated the FAIR publishing aspect. Learning about versioning datasets, code, and workflows has been instrumental for my research.”
In the scientists’ own words…
The workshop drew participants from various fields within geosciences, each bringing unique perspectives and gaining invaluable insights.
Among them was Dannielle Fougere, a fifth-year Ph.D. student in the Earth Sciences department at USC. Fougere, a paleoseismologist, is focused on understanding the behavior of the Garlock Fault in the Mojave Desert. Her research involves calculating slip rates to determine how fast the fault has been moving over time, which is crucial for understanding seismic activity in the region. Fougere explained, “Paleoseismology is not very quantitative. With techniques learned here, I’d like to add a component in my thesis that’s a bit more quantitative to solidify things for editors and reviewers.”
Victor Olawoyin, a Ph.D. student in earthquake seismology at Boston College, found the workshop particularly beneficial for learning time series analysis and improving the reproducibility of his research. “The time series analysis is crucial for interpreting seismograms. The publishing part was also really cool, as it helps in releasing data and software with better workflow,” Olawoyin said. He plans to apply these new skills directly to his Ph.D. thesis and future research projects.
Venkataramana Sridhar, a faculty member from Virginia Tech specializing in hydrology, climate change, and water resources, saw the workshop as a way to enhance both his research and teaching. “This program covered a wide range of topics from publishing to data analytics, all crucial for my research on how climate change impacts hydrology and water resources. The skills and insights gained here will be invaluable in both my classroom teachings and research endeavors,” he stated.
Sreedevi Puthiyamadam Vasu, a Ph.D. student in Atmospheric Science at the Florida Institute of Technology, found the training perfectly tailored to her research needs. “My work on seasonal and sub-seasonal prediction of precipitation relies heavily on time series analysis. The focus on Python and R was exactly what I needed to transition from proprietary software like MATLAB to open-source tools, enhancing my ability to contribute to open science.”
Building a collaborative future
The workshop not only equipped participants with new technical skills but also fostered a sense of community and collaboration. The interdisciplinary nature of the event brought fresh perspectives to geoscientific challenges, inspiring innovative approaches and potential collaborations.
As these researchers return to their respective sub-fields equipped with new capabilities in open-source programming and FAIR principles, the hope is that the effects of this workshop will influence the broader geoscience community. The commitment to open science and reproducibility promises to drive forward the quality and impact of future geoscientific research.
The LinkedEarth team plans to continue supporting the geoscience community with similar training opportunities, fostering an ecosystem where scientific discoveries are not only made but shared openly and efficiently.
Published on July 15th, 2024
Last updated on July 15th, 2024