I recently read an article from Y. Bengio and Y. LeCun named “Scaling Learning Algorithms to AI”. You can also find it as a book chapter in “Large-Scale Kernel Machines”L. Bottou, O. Chapelle, D. DeCoste, J. Weston (eds) MIT Press, 2007.
In some aspects it is an “opinion paper” where the authors advocate for deep learning architectures and their vision of the Machine Learning. However, I think the main message is extremely relevant. I was actually surprised to see how much it agrees with my own opinions. Here is how I would summarize it:
- no learning algorithm can be completely universal, due to the “No free lunch theorem” - that’s not such a big problem: we don’t care about the set of all possible functions - we care about the “AI set”, which contains the functions useful for vision, language, reasoning, etc. - we need to create learning algorithms with an inductive bias towards the AI set - the models should “efficiently” represent the functions of interest, in terms of having low Kolmogorov complexity - researchers have exploited the “smoothness” prior extensively with non-parametric methods. However many manifolds of interest have strong local variations. - we need to explore other types of priors, more appropriate to the AI set.
The authors then give examples of two “broad” priors, such as the sharing of weights in convolutional networks (inspired by translation invariance in vision) and the use of multi-layer architectures (which can be seen as levels of increasing abstraction).
Of course here is where many alternatives are open! Many other useful inductive-bias could be found. That’s where I think we should focus our research efforts!
In this 10min TED talk, Joshua Klein talks about crows and how they are incredibly good learners.
They seem to have a powerful memory, use vision effectively, have problem solving skills, use tools and even learn from examples of other crows. I guess AGI is more than achieved at “crow-level Artificial Intelligence”!!
Lately I have been wondering about the problem of Vision and how difficult it should be compared to problem of Artificial General Intelligence. It seems to me that, given the order that it happened in Nature, processing visual input should be much simpler than using language or reasoning. I say this because there are quite simple animals with eyes, say a fish, a frog or a mouse… As I am not a biologist or neurologist, I am not sure what kind of visual tasks these animals are able to perform. For example, can a mouse tell if there is a cat in a picture or not? In any case, I guess that these neuronal systems, much simpler than the human brain, are able to solve tasks that we have not yet achieved with Computer Vision algorithms.
If that’s the case, I have two questions to my readers, who hopefully can help me clarify these issues:
- What is the “perfect” biological system to understand vision? It should be powerful enough to solve problems that we are interested in, such as distinguishing between different objects, but it should also have a relatively simple brain. Any ideas?
- If animals without human-level intelligence use vision quite effectively, does this mean that Artificial Intelligence will follow the same order of achievements? Or given the properties of computers, it will turn out to be easier to do reasoning, planning or even language processing?
I have recently found two tech reports written by Shane Legg and Marcus Hutter (IDSIA, Lugano, Switzerland) in which they make very interesting reviews on the definitions of machine intelligence and ways to measure it. Have a look at:
Pei-Wang’s well-defined approach to Artificial General Intelligence takes as basic premises the fact that the agent has limited time and memory resources. He then develops a reasoning system that learns from experience and is able to deal with uncertainty and contradictory data.
I would like to recommend this book by Jeff Hawkins, in which the author tries to create a theory about the neocortex.
He claims that the neocortex is basically a hierarchical memory system able to detect temporal and spatial patterns. Jeff Hawkins, and his company Numenta, are now trying to move forward and implementing this “neocortical algorithm” as software running on a computer.
I enjoyed a lot reading it and I am trying now to read the technical papers. So far it looks like a good model, specially for computer vision systems, but it’s not yet clear to me how to solve problems from other cognitive areas such as language processing or planning.
More posts on that for the coming weeks!
Update : 5 years later, nothing very impressive has come out of Numenta. Even though the ideas on this book are appealing, in practice, all solid Machine Learning results require: 1) a very clear loss function, 2) an efficient optimization algorithm and 3) preferably, lots of data.
In order to develop artificial intelligence further, it would be important to have a formal and quantitative way to measure intelligence of an agent, being it a human or a machine. The most famous test for artificial intelligence is the so-called Turing Test, in which “a human judge engages in a natural language conversation with one human and one machine, each of which try to appear human; if the judge cannot reliably tell which is which, then the machine is said to pass the test”. There is even a competition, the Loebner Prize which really evaluates different chatbots and choses the one who most resembles a human.
However, this test is nowadays considered to be anthropomorphically biased, because an agent can be intelligent and still not be able to respond exactly like a human.
Marcus Hutter as recently proposed a new way of measuring intelligence, based on the concepts of Kolmogorov Complexity and Minimum Description Length, in which compression = learning = intelligence. The Hutter Prize measures how much one can compress the first 100MB of wikipedia. The idea is that intelligence is the ability to detect patterns and make predictions, which in turn allows one to compress data a lot.
In my opinion this is not yet a totally satisfactory way of measuring general intelligence, for at least two reasons:
- the fact that method A compressed the dataset more than method B, does not necessarily mean that method A is more intelligent. It may simply mean that the developer of the method exploited some characteristic of the (previously known) data. Or it can mean that the method is good to find regularities in such dataset, but not being able to learn other structures in other environments.
- it can not be applied to humans (or animals).
For these reasons, I guess measuring intelligence is still a fundamental open problem in AI.