Newcomb's Meta-Paradox

Tweeter, Claus Metzner (@cmetzner) alerted me to this cool area of study with this paper.

Suppose you meet a Wise being (W) who tells you it has put $1,000 in box A, and either $1 million or nothing in box B. This being tells you to either take the contents of box B only, or to take the contents of both A and B. Suppose further that the being had put the $1 million in box B only if a prediction algorithm designed by the being had said that you would take only B. If the algorithm had predicted you would take both boxes, then the being put nothing in box B.  Presume that due to determinism, there exists a perfectly accurate prediction algorithm. Assuming W uses that algorithm, what choice should you make?

Ultimately one is lead to understand that the paradox is a manifestation of different interpretations of the problem definition (aren’t all paradoxes though?)  If you interpret the setup one way, then you should choose just B and you will net $1M.  If another way, then you should choose both and net either $1000 or $1,001,000 depending on W’s unknowable prediction.  As the authors conclude:

Newcomb’s paradox takes two incompatible interpretations of a question, with two different answers, and makes it seem as though they are the same interpretation. The lesson of Newcomb’s paradox is just the ancient verity that one must carefully define all one’s terms.

The authors suggest combining Bayesian nets with game theory is what yields this resolution.  And at first I thought they missed the obvious further conclusion from Bayes, which is that you should clearly choose just B.  Here was my reasoning.  The key clue is in this piece of information: “people seem to divide almost evenly on the problem”.  I.e. your Bayesian priors should now be set to 50% on either interpretation.  Now, we know that the expected value (EV) of the “just B” scenario is $1M, but we don’t really know what the EV is for the “both boxes” scenario in which “your choice occurs after W has already made its prediction”:

…if W predicted you would take A along with B, then taking both gives you $1,000 rather than nothing. If instead W predicted you would take only B, then taking both boxes yields $1,001,000….

Since in this scenario you are choosing after W’s prediction, is there any way you can “predict” what W’s choice might be?  No, of course not, it’s a variant of the Liar’s Paradox where if you predict one thing, the answer is the other.  Thus, if we are using a probabilistic approach (as the authors have laid out for us), we must conclude there is no information to be gleaned on W’s prediction and we are forced to assign 50% likelihood of either choice.  Hence, the EV of the “both boxes” interpretation is $501,000.

Putting both meta-Bayesian analyses together, we can conclude that since the “just B” interpretation yields $1M and the “both boxes” interpretation yield’s an EV of a little over half that, it’s a no-brainer to choose just B.  Which means your EV is exactly $500,000.  But wait!  We just concluded that the EV for “both boxes” is $501,000, which is clearly better!!!

Newcomb’s paradox will probably crack my list of Top 10 Paradoxes of All-Time (unless I figure out how to solve it after it does).

  • Tiltmom

    I will give the paper a read, but I had thought this was a question of whether or not you believe in Free Will.

    [Personal brag — I got to talk to Martin Gardner about this briefly when I interviewed him for the Randi bio.]

  • kevindick

    This is one of Eliezer Yudkowsky’s favorite topics:

    http://www.overcomingbias.com/2008/01/newcombs-proble.html

    His main point is that a rationalists should, above all, win. On average, rational strategies should produce better results.

    So the key question here is not the prior of what other people choose. The question is what the prior is that W will be right. If you receive credible evidence that W has done this in the past and has been right, you should one-box. If you think he’s just a joker, you should two-box.

  • @Tiltmom, yes that’s one way of framing the dilemma, but there are other ways that are equivalent.

    @Kevin, I’m usually on side with whatever Eliezer says, but his argument rings pretty hollow to me on this one, wouldn’t you agree? Where are you supposed to receive this evidence about W’s past behavior since it wasn’t part of the original setup? Seems like post hoc “rationalization” [sic]

  • 1. I have read the paper, but I cannot tell whether it is a genuinely different point of view or simply a rehash of previous papers, but written using different mathematical notation.

    2. If, as some have argued, Newcomb’s problem is the 20th century version of Pascal’s wager, then it is by no means obvious that rationalists should win.

    Here is one set up of the problem that I like. Suppose that we are given a correlation table between 1 box decisions, 2 box decisions, and the two states of nature, $1M in box, and nothing in the box.

    It turns out that yes, the number of times people choose 1 box have a very high number of $1 M, not perfect however, and the number times people choose 2 have a very high number of $0 in the box.

    So, in essence, if we were playing a game there are two focal points – choose 1 and predictor places 1 million or choose 2 and predictor places 0.

    Why the predictor has preferences like this is a genuine mystery, but suppose that is all we know.

    Here is the revised mystery: how can we credibly signal to the predictor, who makes his choice first, that we are committed to being a one box chooser? And is this credible signal sufficient?

    Your answers are?

  • kevindick

    @Rafe and Michael

    Eliezer’s reasoning bears on both your points. His argument is that if you could commit to one-boxing in the case where you face a credible W, you would. E.g., if you were a self-modifying AI, you would modify your source to one-box.

    Obviously, humans can’t directly modify their source. But there are a number of known psychological dynamics you could employ. For instance, you could publicly state that you are a one-boxer. You could write an scholarly article that argues for the rationality of one-boxing. You could hold a Newcomb simulation where you in fact one-box. Etc.

    Which brings us to the question of credibility. Certain formulations of the problem do in fact specify a track record of successful predictions. Your formulation specifies that there exists a perfect prediction algorithm and W uses it. So it’s not really an issue in the toy version. Either it’s specified or it isn’t.

    In the “broad” version, there’s always some information. Is there any evidence that W has access to advanced predictive technology? Then it’s a matter of how advanced you think it needs to be. But we all have some belief in this regard, which we can encode and solve for.

  • @Kevin;

    1. My formulation, which is pretty standard, does not assume a perfect correlation. Only a very high correlation.

    2. None of your examples of how to commit to one box are credible. Why should stating I am a one boxer be believed? Why should writing an article not be taken as satire? Why should my play in a simulation not be interpreted as a fake-out?

    3. The metaphor about modifying one’s source code is not helpful here. It is not our internal decision procedure that is wanting. Rather, we need to be able to signal that a) it is reasonable for the predictor to believe that it is reasonable for us to play one box, and b) this belief is still reasonable for the predictor to hold, knowing that it would be reasonable for us to trick the predictor into holding this belief.

  • kevindick

    @Michael

    1) My approach doesn’t require a perfect correlation, just a very high correlation.

    2) This isn’t a commitment from a game theoretic view, it’s a modification of your internal thought processes via cognitive dissonance. I assume that W will be aware of how cognitive dissonance modifies thought processes and therefore make this modification on myself. As long as W observes that boundedly rational humans hold both one-box and two-box beliefs, publicly making modifications to your own psychology should affect W’s behavior.

    3) As I implied in (2), I think it’s a mistake to frame this problem as a classic single-shot competitive game. You must take advantage of the fact that W observes humans behaving in ways that aren’t perfectly characterized by competitive game theory.

  • “Presume that due to determinism, there exists a perfectly accurate prediction algorithm. Assuming W uses that algorithm, what choice should you make?”

    For me the problem lies right here. If determinism is true, then you are not making a choice. You will simply pick the box that you are destined to pick.

    Conversely, if you are indeed making a choice, then you have free will and no prediction algorithm could possibly predict your choice beforehand with 100% certainty.

    I suppose that were I in this situation, and I for some reason had complete confidence in this being’s prediction ability, I would have to take both boxes. Because then the prediction machine, which cannot possibly be wrong, will have put $0 in box B. I get $1000. But of course, this has all been predestined in advance and my “choice” was pure illusion.

  • Given recent advances in neuroscience, which enable a machine to detect your decision to (say) push a button with your left hand (as opposed to your right hand) a fraction of a second before you are aware of having made the decision, the intriguing possibility arises of actually carrying out Newcomb’s dilemma in the lab.

    That is, we have a clock that counts down to zero. At the moment the clock shows zero, you have t seconds to decide whether to push Button 1 or Button 2, corresponding to the two choices of Newcomb’s problem. At time zero, the machine puts money (or not) into the opaque box according to its prediction of your decision. (If you press both buttons, or neither button, by the time t seconds elapse, then you get nothing.) The value of t is chosen to be small enough so that the machine can reliably predict your choice, but large enough so that you have the subjective impression that you are making your decision after the clock reaches zero.

    This experiment ought to be feasible with current technology, but I haven’t heard of anyone actually performing it.