Computer Vision vs Computer Graphics

If I had to explain what computer vision is all about, in just one snapshot, I would show you this:

Computer Graphics algorithms go from the parameter space to the image space (rendering), computer vision algorithms do the opposite (inverse-rendering). Because of this, computer vision is basically a (very hard) problem of statistical inference.
The common approach nowadays is to build a classifier for each kind of object and then search over (part of) the parameter space explicitly, normally by scanning the image for all possible locations and scales. The remaining challenge is still huge: how can a classifier learn and generalize, from a finite set of examples, what are the fundamental characteristics of an object (shape, color) and what is irrelevant (changes in illumination, rotations, translations, occlusions, etc.).
This is what is keeping us busy! ;)

PS – Note that changes in illumination induce apparent changes in the color of the object and rotations induce apparent changes in shape!

Generating all possible pictures

Think of an image of 800 x 600 pixel and 24 bit of color (8 bit per each RGB component). Its trivial binary representation is a sequence of 11520000 bits (800 x 600 x 24) and we can think of each picture as being a natural number.

Imagine now that we write an computer program that generates all these pictures one by one, incrementing the natural number by one in each round.

Running this algorithm for enough time you would eventually get:

- a picture of your face
- a picture of you in the Moon
- a picture of you with Marlin Monroe and James Dean
- pictures of ancient Earth, with dinosaurs
- pictures of all the paintings of Leonardo da Vinci, Van Gogh or Picasso
- pictures of all the pages of Shakespeare’s writings
- pictures of proofs of all relevant mathematical theorems (already proved or not)
- pictures of all great music compositions (already written or not)
- pictures of Microsoft Office and Windows source code
- pictures/printscreens of all pages in the World Wide Web, including all the versions of Wikipedia

Warning: don’t do this at home unless you can wait for some billion years between each pair of interesting pictures you would get!

Still, it’s interesting to realize that you can compress all the world’s information to a short and trivial program, all you have to do is add enough useless data to it!

How difficult is Vision?

Lately I have been wondering about the problem of Vision and how difficult it should be compared to problem of Artificial General Intelligence.
It seems to me that, given the order that it happened in Nature, processing visual input should be much simpler than using language or reasoning. I say this because there are quite simple animals with eyes, say a fish, a frog or a mouse… As I am not a biologist or neurologist, I am not sure what kind of visual tasks these animals are able to perform. For example, can a mouse tell if there is a cat in a picture or not?
In any case, I guess that these neuronal systems, much simpler than the human brain, are able to solve tasks that we have not yet achieved with Computer Vision algorithms.

If that’s the case, I have two questions to my readers, who hopefully can help me clarify these issues:

- What is the “perfect” biological system to understand vision? It should be powerful enough to solve problems that we are interested in, such as distinguishing between different objects, but it should also have a relatively simple brain. Any ideas?

- If animals without human-level intelligence use vision quite effectively, does this mean that Artificial Intelligence will follow the same order of achievements? Or given the properties of computers, it will turn out to be easier to do reasoning, planning or even language processing?

Looking forward to reading your comments.

On Intelligence by Jeff Hawkins

I would like to recommend this book by Jeff Hawkins, in which the author tries to create a theory about the neocortex.

He claims that the neocortex is basically a hierarchical memory system able to detect temporal and spatial patterns. Jeff Hawkins, and his company Numenta, are now trying to move forward and implementing this “neocortical algorithm” as software running on a computer.

I enjoyed a lot reading it and I am trying now to read the technical papers. So far it looks like a good model, specially for computer vision systems, but it’s not yet clear to me how to solve problems from other cognitive areas such as language processing or planning.
More posts on that for the coming weeks!
Update [2013]: 5 years later, nothing very impressive has come out of Numenta. Even though the ideas on this book are appealing, in practice, all solid Machine Learning results require: 1) a very clear loss function, 2) an efficient optimization algorithm and 3) preferably, lots of data.

Measuring Intelligence

In order to develop artificial intelligence further, it would be important to have a formal and quantitative way to measure intelligence of an agent, being it a human or a machine.
The most famous test for artificial intelligence is the so-called Turing Test, in which “a human judge engages in a natural language conversation with one human and one machine, each of which try to appear human; if the judge cannot reliably tell which is which, then the machine is said to pass the test”. There is even a competition, the Loebner Prize which really evaluates different chatbots and choses the one who most resembles a human.

However, this test is nowadays considered to be anthropomorphically biased, because an agent can be intelligent and still not be able to respond exactly like a human.
Marcus Hutter as recently proposed a new way of measuring intelligence, based on the concepts of Kolmogorov Complexity and Minimum Description Length, in which compression = learning = intelligence. The Hutter Prize measures how much one can compress the first 100MB of wikipedia. The idea is that intelligence is the ability to detect patterns and make predictions, which in turn allows one to compress data a lot.
In my opinion this is not yet a totally satisfactory way of measuring general intelligence, for at least two reasons:
- the fact that method A compressed the dataset more than method B, does not necessarily mean that method A is more intelligent. It may simply mean that the developer of the method exploited some characteristic of the (previously known) data. Or it can mean that the method is good to find regularities in such dataset, but not being able to learn other structures in other environments.
- it can not be applied to humans (or animals).
For these reasons, I guess measuring intelligence is still a fundamental open problem in AI.