Greg Ver Steeg



Ph.D., Physics, Caltech


The main focus of my research is unsupervised learning based on latent factor discovery. The main application for this work is understanding high-dimensional but under-sampled data coming from human biology and behavior. 

  • An information-theoretic foundation for modularly and hierarchically decomposing information in complex systems with prototype and sample applications (NIPS-14) and (code), and more theoretical developments (AISTATS-15)
  • The linear version of CorEx exhibits a unique "blessing of dimensionality" for recovering latent factor structure (paper) and excellent performance for estimating covariance matrices with high-dimensional, under-sampled data (code)
  • Information in complex systems can be extracted incrementally using the "information sieve" method (ICML-16) and (code). An implementation for continuous variables is more practical (code) and we show that we can use it to extract common information (IJCAI-17).
  • Historically, the impact of information theory on machine learning has been limited for two reasons. (1) A preoccupation with (pairwise) mutual information leads to its frequent mis-use, see (ICML-14) for one example. (2) Information measures are hard to estimate, (UAI-15) (AISTATS-15) (NIPS-16).
  • Applications: gene expression (interesting podcast and article about this work), brain imaging 1, 2, text analysis (code) 1 2, psychometrics, finance

See our group page for other ongoing efforts, or this page for more information about CorEx.