The AI-Box Experiment
Several years ago I became aware of Eliezer Yudkowsky’s “AI-Box Experiment” in which he plays the role of a transhuman “artificial intelligence” and attempts (via dialogue only) to convince a human “gatekeeper” to let him out of a box in which he is being contained (resumably so the AI doesn’t harm humanity). Yudkowsky ran this experiment twice and both times he convinced the gatekeeper to let the AI out of the box, despite the fact that the gatekeeper swore up and down that there was no way to persuade him to do so.
I have to admit I think this is one of the most fascinating social experiments ever conceived, and I’m dying to play the game as gatekeeper. The problem though that I realize after reading Yudkowsky’s writeup is that there are (at least) two preconditions which I don’t meet:
Currently, my policy is that I only run the test with people who are actually advocating that an AI Box be used to contain transhuman AI as part of their take on Singularity strategy, and who say they cannot imagine how even a transhuman AI would be able to persuade them.
For one, I believe the dichotomy between humans and transhuman intelligences is a false one, and thus there is no “strategy” necessary for the so-called Singularity. Second, supposing I believed such a strategy was necessary when I began the experiment; I suspect that the only way I’d let the AI out of the box is if my belief changed during the course of the experiment. And if my belief changed and I didn’t change my actions to match, I wouldn’t feel good about myself. In other words, since I don’t currently believe the dichotomy, I can imagine that if I did I could be convinced otherwise. Thus, I can imagine how a normal human could persuade me, it doesn’t even require a transhuman intelligence.
So I began to wonder if there were some experimental variant in which I could play the gatekeeper where I could acede to the following policy:
I only run the test with people who are actually advocating X, and who say they cannot imagine how even a transhuman AI would be able to persuade them of not X.
Which, I take to be as good of a litmus test for undying faith as any. With this in mind, I’ll turn to the question of science in my next post.