If I had to explain what computer vision is all about, in just one snapshot, I would show you this:
Computer Graphics algorithms go from the parameter space to the image space (rendering), computer vision algorithms do the opposite (inverse-rendering). Because of this, computer vision is basically a (very hard) problem of statistical inference.
The common approach nowadays is to build a classifier for each kind of object and then search over (part of) the parameter space explicitly, normally by scanning the image for all possible locations and scales. The remaining challenge is still huge: how can a classifier learn and generalize, from a finite set of examples, what are the fundamental characteristics of an object (shape, color) and what is irrelevant (changes in illumination, rotations, translations, occlusions, etc.).
This is what is keeping us busy!
PS – Note that changes in illumination induce apparent changes in the color of the object and rotations induce apparent changes in shape!
François Fleuret, my PhD advisor, recently gave a talk about object detection at Google (Zurich offices).
You can now see it online:
If you wonder where my research will try to extend the work done so far, just go to minute 45:30!
Last week I attended the International Computer Vision Summer School in Sicily, Italy. The main topics were Reconstruction and Recognition. I think the quality of the lectures, organization and location were all quite good, therefore I would recommend it to other PhD students.
Here is a short summary of some of the things we heard about:
Andrew Zisserman (Oxford, UK) – gave an overview of object recognition and image classification, with focus on methods that use “bag of visual words” models. Quite nice for newcomers like me!
Silvio Savarese (UIUC, USA) – talked about 3D representations for object recognition. There is actually a Special Issue of the “Computer Vision and Image Understanding” on the topic at
Luc Van Gool (ETH Zurich, Switzerland) – Lots of cool and fancy demos about 3D reconstruction. They are starting to use some recognition to help reconstruction (opposite direction of S. Savarese).
Stefano Soatto (UCLA, USA) – gave an “opinion talk” on the foundations of Computer Vision and how it can be distinguished from Machine Learning. I would have to read his papers to understand better, but he seems to claim that the existence of non-invertible operations such as
occlusions would support the need for image analysis instead of just “brute-force machine learning”.
We also had Bill Triggs (CNRS) talking about human detection, Jan Koendrick (Utrecht, Netherlands) on “shape-from-shade” and a few tutorials touching stuff as diverse as: SIFT, object tracking, multi-view stereo and photometric methods for 3D reconstruction or
randomized decision forests.
To summarize, I think the message was:
- Traditionally, recognition uses lots of Machine Learning but models keep few 3D information about objects;
- Traditionally, reconstruction uses ideas from geometry, optics and optimization but not learning;
- The future trend is to merge them: use 3D reconstruction to help in recognition tasks and use recognition to help in 3D reconstruction.
If everything goes as planned this year I am attending two Summer Schools.
The first one, the International Computer Vision Summer School 2008 , will be hosted in Sicily, Italy in 14-19 July. The program seems to be quite good and it will cover topics like object detection, tracking or 3D reconstruction, among others. There’s also a reading group on “how to conduct a literature review and discover the context of an idea“. The challenge is to see how far back in the past one can track the origins of a scientific idea. For example, the AdaBoost is a well known machine learning meta-algorithm, in which a sequence of classifiers is progressively trained focusing on the instances misclassified by previous classifiers. The set of classifiers is then combined by a weighted average. It was introduced by Freund and Schapire in 1996. This is easy to track, the question however is: can you find the same or similar core idea, or intution, somewhere else back in the past? Possibly from a different domain?
It’s gonna be fun!
The second one is the 10th Machine Learning Summer School, 1-15 September, Ile de Re, France. The program is also quite nice, but I still don’t have the confirmation I can attend it.
I would be specially interested in Rich Sutton‘s lecture on “Reinforcement Learning and Knowledge Representation” although hearing about Active Learning, Bayesian Learning, Clustering, Kernel Methods, etc. also sounds quite appealing.
Looking forward to science in summer time!
Lately I have been wondering about the problem of Vision and how difficult it should be compared to problem of Artificial General Intelligence.
It seems to me that, given the order that it happened in Nature, processing visual input should be much simpler than using language or reasoning. I say this because there are quite simple animals with eyes, say a fish, a frog or a mouse… As I am not a biologist or neurologist, I am not sure what kind of visual tasks these animals are able to perform. For example, can a mouse tell if there is a cat in a picture or not?
In any case, I guess that these neuronal systems, much simpler than the human brain, are able to solve tasks that we have not yet achieved with Computer Vision algorithms.
If that’s the case, I have two questions to my readers, who hopefully can help me clarify these issues:
- What is the “perfect” biological system to understand vision? It should be powerful enough to solve problems that we are interested in, such as distinguishing between different objects, but it should also have a relatively simple brain. Any ideas?
- If animals without human-level intelligence use vision quite effectively, does this mean that Artificial Intelligence will follow the same order of achievements? Or given the properties of computers, it will turn out to be easier to do reasoning, planning or even language processing?
Looking forward to reading your comments.