I was just reading Ray Kurzweil’s chat with University of Reading’s Turing Test “winner,” a chat bot called Eugene Goostman. Its an embarrassing collection of chat tricks like deflection of questions via humor or deflection of questions via asking another unrelated question. I think this exchange is pretty damning:
Question: How old were you when you started to wear glasses?
Eugene: No I was not! How can you think so?! If I’m not mistaken – you still didn’t tell me where you live. OR it’s a secret?:-)
The lesson I’m seeing here is that Turing’s famous test is flawed, perhaps fatally so. The problem with using a purely American Idol-style judging competition is that humans are easy to fooled or social engineered. The people who wrote the Goostman chatbot knew exactly what they were doing. They filled it with logic that hid its shortcomings like the deflections above or by posing as a boy to make mistakes an adult would make excusable. They also made the boy understand English as a second language which further muddied the water and excuses poor grammar and other communication issues. It also plays on fears of coming off as too ethnocentrist if we dismiss a foreign chat participant as a robot if they sound a bit off. The people behind this chatbot really understood how flawed the Turing test is and exploited it.
This competition reminds me of 1970s research into ESP and the paranormal that usually involved one very charismatic subject like Uri Geller running rings around researchers completely unfamiliar with sleight of hand and other stage magic tricks. The researchers involved were just not able to see how badly they were being fooled until they brought in a stage magician like James Randi to consult. I think Turing, like a lot of high IQ types, maybe didn’t understand the social implications of a test like this. I imagine in his mind, everyone involved would be an honest and not engage in any sort of disingenuous activity. In the real world, any time there’s some kind of social or financial incentive involved, there will be non-honest players.
So, what now? The popular press has been running with Reading’s narrative that the Turing test has been passed. Futurists and other pop-science crackpots will be telling us for years to come that AI chat is here, after all some judges in England have decided so. My take is that we use this as a springboard to invalidate the Turing test and lay it to bed. I imagine a generalist approach using a suite of many different types of tests could replace it in the public consciousness. Human judging should only be one part, preferably a small part, of the testing suite. I think the idea that only humans can tell us when AI is here is a little old fashioned. Maybe we’ll need specialist machines to talk to specialist machines. As things grow more complex, more abstraction layers are inserted. Maybe communicating with AI will bring in the same challenges we have with communicating with intelligent animals like chimps or cephalopods.
I doubt IBM’s Watson could pass the Turing test, or even come close to winning, but its probably the closest we have “strong AI” as its a generalist self-learning machine. I’m not sure how you would test it to see if something like Watson is an AI, or if a larger suite of testing is even feasible or less faulty. I imagine Turing understood these challenges and came up with his test because its probably the least of the worst, but how its been implemented leaves a lot to be desired. There may be no AI test that’s truly trustworthy, and I think we should be fine with that, but the status quo of impersonating boys in a chatrooms seems like the worst path imaginable. The Turing test has unfortunately become a chatbot social engineering competition and does us no good. Perhaps its time to give up on it.