Publications

Disaggregation via Gaussian regression for robust analysis of heterogeneous data

Abstract

Social data are often highly heterogeneous, coming from a population composed of diverse classes of individuals, each with their own characteristics and behaviors. As a result of heterogeneity, a model learned on population data may not make accurate predictions on held-out test data or offer analytic insights into the underlying behaviors that motivate interventions. To illustrate, consider Figure 16.1, which shows data collected for a hypothetical nutrition study measuring how the outcome, body mass index (BMI), changes as a function of daily pasta calorie intake. Multivariate linear regression (MLR) analysis finds a negative relationship in the population (dotted line) between these variables. The negative trend suggests that – paradoxically – increased pasta consumption is associated with lower BMI. However, unbeknownst to researchers, the hypothetical population is heterogeneous, composed of classes that …

Date
November 10, 2021
Authors
Nazanin Alipourfard, Keith Burghardt, Kristina Lerman
Book
Handbook of Computational Social Science, Volume 2
Pages
269-288
Publisher
Routledge