In one of Alan Turing’s most noted papers, “Computing Machinery and Intelligence” (1950), Turing describes a test for machine intelligence, where a human “judge” would attempt to hold a conversation with two consoles, one operated by a human, and the other by a machine (with the judge unaware which is which). If, throughout the conversation, the judge cannot distinguish between the human and the machine, then the machine can be considered intelligent. Although it seems simplistic and rudimentary, this test can be quite useful since it circumvents any requirement to define or quantify “intelligence” or any aspects of it. It simply assumes that humans are intelligent, and if a machine can simulate human responses, then it must also be equally intelligent.
The first formal implementation of the Turing Test was organized in 1991 by Dr. Hugh Loebner, a somewhat eccentric academic figure. Dr. Loebner has been conducting this competition every year, ever since. The home page of the Loebner Prize contains transcripts of the conversations held between the judges and the various finalist programs.
Maybe it’s because I’m looking at the transcripts through the eyes of a software engineer, but I found the programs’ responses laughably crude and robotic. I fail to see how any human judge could attribute any “human” qualities to the programs’ output. It is trivial to observe how the programs randomly regurgitate a block of words spoken by the human, or, when asked a question they weren’t programmed to answer, spout off a random cliche to divert the judge’s attention from the program’s incompleteness.
Upon examining the transcripts from the earlier years of the competition (around 1994), and comparing them to the latest results (2004), something even more disturbing becomes clear: the sophistication of these programs has not changed a single bit! Of course, some will say, the programs have gotten more sophisticated internally, perhaps with a bigger repository of vocabulary. However, conversationally, they are virtually no different than the very first ELIZA implementation.
It seems to me that this kind of competition has more to do with behavioral psychology than computer science. It is, as some have called it, a beauty contest. In essence, the Loebner Prize would be awarded to the program that can do the best job of fooling a person into believing that it’s human, which, apparently, isn’t too difficult. This leads me to conclude that the Loebner competition, perhaps even the Turing test, is misguided at best. Since when does machine intelligence have to be expressed in the form of human conversation? If we are to expect a machine to sound remotely human, we would need to supply it with all of the life experiences of a human being, complete with sensory data (images, sounds, smells), memories from childhood, and fundamental instincts like self-preservation, the desire to learn, and the need to socialize.
In short, for a machine to become intelligent in the human sense, it would need to lead a human life from its conception. An example of such a machine may be an android that is perfectly disguised as a human being and made to interact with humans. It would be even better if the android itself is made to believe that it is human. But to expect a computer console application (no matter how complex), without any real sensory input except keyboard clicks, to ever respond like a human being is misguided indeed.