Life and the Prisoner’s Dilemma

Sometimes another person does something that you’re not happy with. At times you might not even know exactly what they did, but something they did made you unhappy.

Maybe you’re thinking, “I would like to know how to deal with people when they do things that I’m not happy with. But how do I know whose advice to believe? If only there were a mathematically proven technique that I could use with complete confidence.”

The Prisoner’s Dilemma

The classic story of the Prisoner’s Dilemma involves two suspects. The police don’t quite have enough evidence to convict either of them. So they offer each guy a deal if he rats out the other guy.

Here’s how it works. If suspect A gives the police information that they can use to convict suspect B, they let suspect A go free and suspect B goes to jail for a year, as long as suspect B keeps quiet. Similarly, if only suspect B provides information about A to the police, A goes to jail for a year and B goes free. But both A and B provide the police with information about each other, they both go to jail for three months. And if neither suspect gives the police any information, the police hold them both for only a month and then let them go.

How should suspect A decide what to do? Well, suppose B remains silent. If A gives the police information, he’ll go free! If A says nothing, he’ll be held for a month along with B. So if B is silent, A is better off giving information to the police.

But suppose now that B does give the police information about A. If A also gives the police information, he goes to jail for three months (along with B). And if A keeps quiet, he goes to jail for a year! So if B provides the police with information, A is again better off giving information to the police rather than keeping silent.

So regardless of whether B gives the police any information, A spends less jail time if he talks than if he remains silent. So logically, A should rat out B for his own benefit.

Here’s the strange part: the same logic applies from B’s perspective! So if both A and B do the logical thing, they both give information to the police about each other and end up spending three months in jail. But this is somehow not the best outcome for either of them. If only they had both kept quiet, they’d only have had to stay in custody for one month, not three!

Repeating the process

Now suppose that A and B made some decision (we don’t care what, for the moment) and one or both of them spent some time in jail, and now they’re both free. You’d think they’d learn from their mistakes, but you’d be mistaken. They both get arrested again! And the police offer them both exactly the same deal as before!

Now what should they do? The logic isn’t so clear-cut, because each person knows what the other one chose to do last time. At any rate, they both make some choice, and one or both of them again spend some time in jail.

Well, now they’re done, right? Surely by now they’ve figured out that crime doesn’t pay. But yet for a third time they find they are both arrested and offered yet again the same deal by the police.

So given that this situation could get repeated endlessly, what strategy should they use? Let’s call their two choices “cooperate” (that is, remain silent) and “defect” (that is, rat out the other guy to the police). If they both always cooperate, they both keep going to jail for only a month. But at any time, one guy could defect, and if his friend remains silent, the guy who defected goes free! Of course now his friend may not trust him anymore and may not be so willing to cooperate in the future.

A computer simulation

One could simulate this situation on a computer easily enough. This would make it easy to repeat the scenario many times.

You could write different programs or “apps” to use different strategies. One app could always cooperate. Another might cooperate most of the time but defect occasionally. Yet another app could add up the total number of times the other guy defected and base its decision on that. Another app might randomly cooperate or defect at its whim. You could then have these apps go against one another and see which app spends the least time in jail.

Someone did just that! A contest was held, and people submitted their apps and they all competed against each other.

And was there a winner? Yes, there was! The most consistently winning strategy was called “tit for tat.” This particular app starts out by cooperating, and after that it does whatever its “opponent” did on the previous iteration. The “tit for tat” app beat many other apps with more complex strategies.

This app was aware of what its opponent did, but it had a short memory. A defection by its opponent would be “retaliated” against with one defection, and then “tit for tat” would go back to steadily cooperating as long as its opponent did that too.

The final answer

So it’s now been mathematically proven what you should do when someone does something that makes you unhappy.

Well, partly, anyway. It doesn’t say what action you should take initially.

But the winning “tit for tat” app clearly shows this: if someone did something you didn’t like and you’ve already done something about it, the best thing you can do after that is to let it go.