Greg Ver Steeg

About

Education

Ph.D., Physics, Caltech

Bio

The main focus of my research is unsupervised learning based on latent factor discovery. The main application for this work is understanding high-dimensional but under-sampled data coming from human biology and behavior. 

  • An information-theoretic foundation for modularly and hierarchically decomposing information in complex systems with prototype and sample applications (NIPS-14) and (code), and more theoretical developments (AISTATS-15)
  • The linear version of CorEx exhibits a unique "blessing of dimensionality" for recovering latent factor structure (paper) and excellent performance for estimating covariance matrices with high-dimensional, under-sampled data (code)
  • Information in complex systems can be extracted incrementally using the "information sieve" method (ICML-16) and (code). An implementation for continuous variables is more practical (code) and we show that we can use it to extract common information (IJCAI-17).
  • Historically, the impact of information theory on machine learning has been limited for two reasons. (1) A preoccupation with (pairwise) mutual information leads to its frequent mis-use, see (ICML-14) for one example. (2) Information measures are hard to estimate, (UAI-15) (AISTATS-15) (NIPS-16).
  • Applications: gene expression (interesting podcast and article about this work), brain imaging 1, 2, text analysis (code) 1 2, psychometrics, finance

See our group page for other ongoing efforts, or this page for more information about CorEx.

CorEx

Groups: