The Loebner Prize

The Loebner Prize

The Loebner Prize was begun in 1990 by Hugh Loebner, and set up a Grand Prize of $100,000 and a gold medal to the creators of the first bot that can pass an extended Turing Test involving textual, visual, and auditory components. $25,000 is the prize for the first bot that can pass a text-only Turing Test, and $2,000-$3,000 (the amount has varied in the past years) for the most human-seeming of all contestants that year. Once the Grand Prize has been given, the Loebner Prize dissolves. (Wikipedia - Loebner Prize)

Stuart M. Schieber's paper, entitled "Lessons from a Restricted Turing Test", details the first Loebner Competition on November 8th, 1991. For this competition, several computer contestants were pitted alongside two human "confederates", who communicated to the judges through computer terminals. The topic chosen for the competition was "whimsical conversation", whereas the two confederates "chose to converse on Shakespeare and women's clothing" in an attempt to fool the judges. The final scoring placed the two confederates ahead of all of the computer programs, although there were several misclassifications: "Five judges ranked the top contestants as human, and there were eight instances of such misclassifications of computers as humans overall... Miss Cynthia Clay, the Shakespeare aficionado, was thrice misclassified as a computer. At least one of the judges made her classification on the premise that '[no] human would have that amount of knowledge about Shakespeare.' Ms. Lizette Gozo was honored as the most human of the agents for her discussion on women's clothing, although one judge rated two computer programs above her." (Schieber)

Although the amount and the reasoning of the misclassifications could imply that computers are getting closer to being able to fool humans, analysis by experts in the field indicate that this was not the case. Schieber writes, "Perhaps the most conspicuous characteristic of the six computer programs was their poor performance. It was widely recognized that computer experts could readily distinguish the contestants from the confederates." (Schieber)

The disparity between the computers' mediocrity and their apparent success in fooling at least some of the judges can be explained by several factors. First, the inherent irrationality of some of the programs is easily misinterpreted in the setting of the test as "whimsical conversation". Second, the techniques used to uncover the programs' inadequacy, mainly typing in gibberish or repeating questions over and over, fell under the "trickery and guile" prohibition and thus could not be used. Third, and most damaging to the integrity of the test, is the realization that the Turing Test "relies solely on the ability to fool people", and thus is a "sorely inadequate test of intelligence." (Schieber) In this case, certain programs were built to give nonsense responses regardless of the input, but under the guise of "whimsical conversation" its speech patterns can be quite convincing.

That is the true flaw of this incarnation of the Turing Test, and why Schieber's article refers to a "restricted" Turing Test. Stripped of the ability to actively determine the test subjects' identity and set in an environment more conducive to showcasing than intensive testing, the Loebner Prize has more in common with "newspaper horoscopes and roadside psychics" (Schieber); it seems more an attempt to entrance the public with apparently sophisticated talking computers than an effort to improve artificial intelligence. If indeed it is contributing to AI development, most likely the improvement is in the ability for computers to fool people, not toward a holistic simulation of human intelligence. The Loebner Prize is the result of a subtle corruption and refocusing of the Turing Test, with its focus firmly on "fool" rather than "think". Ironically, the closest modern equivalent to the Turing Test is also the least true to its original intent.