Vapnik’s picture explained

This is an extremely geeky picture! :) Let’s try to explain it:

First of all, as many of you know, the gentleman in the picture is Prof. Vladimir Vapnik. He is famous for his fundamental contributions to the field of Statistical Learning Theory, such as the Empirical Risk Minimization (ERM) principle, VC-dimension and Support Vector Machines.

Then we notice the sentence in the board: it resembles the famous “All your base are belong to us“! This is a piece of geek culture that emerged after a “broken English” translation of a Japanese video game for Sega Mega Drive .

Wait, but they replaced the word “Base” by “Bayes”!?
Yes, that Bayes, the British mathematician known for the Bayes’ theorem.
Okay, seems fair enough, we are dealing with people from statistics…

By the moment we think things can not get more geeky, we realize there is scary inequality written on the top of the white board:

My goodness, what’s this?! Okay, that’s when things get really technical:
This is a probabilistic bound for the expected risk of a classifier under the ERM framework. In simple terms, it relates the classifier’s expected test error with the training error on a dataset of size l and in which the cardinality of the set of loss functions is N.
If I’m not mistaken, the bound holds with probability (1 – eta) and applies only to loss functions bounded above by 1.

Sweet! Now that we got the parts, what’s the big message?

Well, it’s basically a statement about the superiority of Vapnik’s learning theory over the Bayesian alternative. In a nutshell, the Bayesian perspective is that we start with some prior distribution over a set of hypothesis (our beliefs) and we update these according to the data that we see. We then look for an optimal decision rule based on the posterior distribution.
On the other hand, in Vapnik’s framework there are no explicit priors neither we try to estimate the probability distribution of the data. This is motivated by the fact that density estimation is a ill-posed problem, and therefore we want to avoid this intermediate step. The goal is to directly minimize the probability of making bad decision in the future. If implemented through Support Vector Machines, this boils down to finding the decision boundary with maximal margin to separate the classes.

And that’s it, folks! I hope you had fun decoding this image! :)