Yesterday I attended a talk organized by the London Machine Learning meetup group, where Yoshua Bengio was the invited speaker. Not surprisingly, there were about 200 people attending.
Yoshua reinforced the idea that a lot of the success of learning algorithms for AI tasks comes from incorporating meaningful priors. These should be general enough to hold true in a wide range of applications, but also specific enough to vastly reduce the amount of training data needed to achieve good generalization. This reminded me of a previous post I wrote in this blog, almost 5 years ago!
In the meanwhile, deep learning became main-stream, and Yohua slides high-lighted several theoretical progresses that were made, e.g:
- Expressiveness of deep networks with piecewise linear activation functions: exponential advantage for depth (Montufar et al NIPS 2014)
- Theoretical and empirical evidence against bad local minima (Dauphin et al NIPS 2014)
- Manifold and probabilistic interpretations of auto-encoders:
- Estimating the gradient of the energy function (Alain and Bengio ICLR 2013)
- Sampling via Markov chain (Bengio et al NIPS 2013)
- Variational auto-encoder breakthrough (Gregor et al arXiv 2015)