Publications
Disaggregation via Gaussian regression for robust analysis of heterogeneous data
Abstract
Social data are often highly heterogeneous, coming from a population composed of diverse classes of individuals, each with their own characteristics and behaviors. As a result of heterogeneity, a model learned on population data may not make accurate predictions on held-out test data or offer analytic insights into the underlying behaviors that motivate interventions. To illustrate, consider Figure 16.1, which shows data collected for a hypothetical nutrition study measuring how the outcome, body mass index (BMI), changes as a function of daily pasta calorie intake. Multivariate linear regression (MLR) analysis finds a negative relationship in the population (dotted line) between these variables. The negative trend suggests that – paradoxically – increased pasta consumption is associated with lower BMI. However, unbeknownst to researchers, the hypothetical population is heterogeneous, composed of classes that …
- Date
- November 10, 2021
- Authors
- Nazanin Alipourfard, Keith Burghardt, Kristina Lerman
- Book
- Handbook of Computational Social Science, Volume 2
- Pages
- 269-288
- Publisher
- Routledge