A Large Language Model-based Approach for Analyzing Covariates of Health Equity in Registered Research Projects

Abstract

Large language models (LLMs) have made significant advancements in natural language processing, offering broad applications in multiple domains. This study explores the use of the GPT-3.5 LLM to conduct efficient and robust computational analysis of registered research projects on the All of Us platform. Specifically, we explore the association between projects pursuing health equity research and: the project’s use of demographic categories (which All of Us enables), the multi-institutional composition of the team leading the project, and the involvement of R2 institutions (compared to only R1 institutions). We demonstrate the utility of GPT-3.5 in automating tasks ranging from generating Python scripts for extracting attributes from free text (such as project description and goals) to identifying and classifying institutions as R1 and R2, and summarizing project details into Unified Medical Language System (UMLS)-coded medical keywords. These contributions significantly reduced manual workload, allowing researchers to focus on more in-depth analysis. Our results reveal health equity insights not readily available in the original All of Us research hub. Specifically, we find a strong positive association between the use of demographic data and projects focused on health equity, while other associations such as health equity projects conducted by institutions were positive but weaker and more dependent on specific project topics.

Date: 2024
Authors: Navapat Nananukul, Mayank Kejriwal
Journal: medRxiv
Pages: 2024.09. 24.24314327
Publisher: Cold Spring Harbor Laboratory Press

View Paper

Information Sciences Institute

Publications

A Large Language Model-based Approach for Analyzing Covariates of Health Equity in Registered Research Projects

Abstract