Greg Ver Steeg



Ph.D., Physics, Caltech


The main focus of my research is unsupervised learning based on latent factor discovery. Applications are to understanding high-dimensional but under-sampled data coming from human biology and behavior.  This page is sporadically updated, but you can get more up-to-date information by looking at my Google scholar profile or occasional updates to my blog or twitter

  • Viewing representation learning as compression led to work on new ways to control compression (with "echo noise", NeurIPS-19), and we can use that for a variety of things like information-theoretic invariant representation learning (NeurIPS-18) leading, for instance, to a new approach to harmonize MRI scans across sites.  
  • CorEx: An information-theoretic foundation for modularly and hierarchically decomposing information in complex systems with prototype and sample applications (NIPS-14) and (code), and more theoretical developments (AISTATS-15). A linear formulation of CorEx exhibits a unique "blessing of dimensionality" for recovering latent factor structure and excellent performance for estimating covariance matrices with high-dimensional, under-sampled data (NeurIPS-19code). An incremental version, the information sieve, is introduced in (ICML-16) and (IJCAI-17). The relationship with VAEs is shown in AISTATS-19. And an application to very sample efficient temporal covariance estimation is in arxiv:1905.13276
  • We also study foundational issues about information theory and ML. (1) A preoccupation with (pairwise) mutual information leads to its frequent mis-use, see (ICML) for one example. (2) Information measures are hard to estimate, (UAI) (AISTATS) especially when dependencies are strong, where we showed that an exponential number of samples may be needed (NIPS). (3) Appropriate measures of higher-order dependencies are hard to define (arxiv:1811.10839). 
  • Applications: gene expression (interesting podcast and article about this work), neuroscience 1, 2, 3, 4, text analysis (code) 1 2, psychometrics, finance, and work on clinical time series in Nature Scientific Data (2019).