Learning patterns and the game of Flip

A foolproof way of separating the men from the boys, the women from the girls, or perhaps just an old programmer from a not quite so old programmer, is to mention the books Basic Computer Games and More Basic Computer Games and see if it results in frowning never-heard-of-them faces or strange dreamy looks of people recollecting an era long gone by. As their titles imply, these books were collections of games listings, all written in BASIC. (And originally published in the equally famous Creative Computing magazine, now defunct, R.I.P.) By today’s standard the games in these books range from sad to pathetic, featuring neither graphics nor sound, just simple ASCII graphics at best. Still, at the time these games had an almost magical appeal. Though the magic might now be gone, there is still one or two lessons to be learned from these ancient games, one of them buried deep in the bowels of the game of Flip (pages 6162 of MBCG).

Flip is played against the computer for 50 turns. Each turn the computer first secretly decides on one of two alternatives and then you, the human opponent, also have to decide on one of the two alternatives. If you pick the same alternative as the computer you score a point, otherwise the point goes to the computer. So far it sounds quite boring, but the interesting part is that as you play, the computer opponent of Flip detects patterns in your behavior and starts making its decisions to increase the chance of your next guess being wrong, also incorporating a little randomness (not always maximizing its chances) so that even if you know how the program works you cannot quite use that to your advantage.

How Flip works

So, how does Flip detect the patterns, you ask? Let’s start by calling the two alternatives ‘0’ and ‘1’ thereby making it clear that what we have is a binary decision. Flip maintains the last two choices of both players, i.e. four binary values which together form an integer in the range 0-15 that we can call the history of the last two rounds; initially this number is set to a random value. Furthermore, for each possible history value we have a corresponding probability value, p[], between 0 and 1 (initially set to 0.5), estimating the probability of the person guessing ‘1’ in the situation described by the history value. From these values, the computer calculates its decision in the following manner (translated into Java code, yes Java, ick):

float R = 0.06f;
// Default to completely random decision
float probabilityOfOne = 0.5f;
// If player guesses...
if (p[history] > 0.501f) {
    // ...'1' with "high" probability, decrease chance of picking '1'
    probabilityOfOne = p[history] * R + 0 * (1.0f - R);
} else if (p[history] < 0.499f) {
    // ...'0' with "high" probability, increase chance of picking '1'
    probabilityOfOne = p[history] * R + 1 * (1.0f - R);
}
float URand01 = generator.nextFloat();
int computersValue = (probabilityOfOne >= URand01) ? 1 : 0;

Here history is the current history value, p[history] the corresponding probability value, URand01 is a uniformly distributed random number between 0 and 1, and R is a randomness factor influencing the likelihood of making the decision suggested by the probability value (having a value of 0.3 in the original Basic program). In short, what the code does (if it’s not clear from the code itself): if the player guesses ‘1’ with a probability of close to 0.5 the computer selects a value totally at random, if the player guesses ‘1’ with a probability higher than 0.5 the computer selects ‘0’, and if the player guesses ‘1’ with a probability less than 0.5 the computer selects ‘1’, with the latter two decisions not being absolute but affected by the randomness value.

To update the probability and history variables, we do:

// Update probability and history
float decayRate = 0.8f;
p[history] = decayRate * p[history] + (1.0f - decayRate) * playersLastGuess;
history = ((history << 2) | (computersValue << 1) | playersLastGuess) & 0x0F;

The decay rate determines how fast we forget earlier behavior and learn new behavior in the context given by history.

A Flip applet

Below is a little Java applet I hacked together that will play Flip with you. (It looks pretty crappy, but hey, you're getting what you paid for.) Click the applet window to get it into context, then push '0' or '1' to guess, and your score will be tallied over 50 rounds. You've done well if you score 25 or more correct guesses. It turns out that it is difficult for a human player to make truly random decisions, to avoid patterns that Flip can pick up. (Unless you cheat and resort to a trick, like hammering on the keyboard, or knowing the 50 first decimals of pi and advancing through the decimals guessing 0 if the current decimal is 0-4 and 1 if it is 5-9.)

Flip Applet

Note especially that if you guess regular patterns like "0000...00", "1111...11" or "0101...01", Flip very quickly picks up on these and reverses its guesses accordingly.

I don't retain the p[] array between games, which you probably normally would, because I figure anyone reading this blog is enough of a programming bastard to be guessing '1' for 1000 trials and then complain when Flip is performing poorly on the game immediately after that. (Yeah, go on, blush, you know I have your number!)

Theory

Flip was possibly my first encounter with machine learning some 25 years ago. It took me until university a few years later to learn that what was really going on here was a simple statistical model where we study a stream of inputs and look at what elements follow other elements with what probability. Models such as these are known and studied under the terms N-grams models and Markov chains. I won't attempt to go into them here, but it should be pretty clear that what Flip uses is just a very simple statistical model and that we could form much more sophisticated models along similar principles. If you haven't experienced these concepts before and are intrigued, I thoroughly recommend spending a few hours on the web. You'll find them used in everything from compression algorithms to statistical spelling checkers. It's cool stuff!

Applications

Hopefully the applications for this type of learning are pretty self-evident, but I'll give a few examples anyway. In an extended form we could use pattern learning in a beat-em-up to predict the next punch/kick of the player, to allow the computer to block/attack accordingly (without cheating). It can be used as a profile of a player, to distinguish human players apart, or even to learn several human player styles that then can be adopted by a computer player. It could be used in user interfaces to ensure the most common actions are identified, and used to change ordering of files, folders, or menus to streamline the user experience. It could be used as part of a RoShamBo player. And much more. All for very little coding effort.

As always, comments are appreciated! Perhaps you have used N-gram models in your code, or have cool ideas for where they could improve the user experience. If so, I'm curious to hear about it.

Oh, at some later time I might blog about a little-known but neat generalization to the above approach, well-suited for use in games. How's that for a tease? :)

Posted in: AI

2 thoughts on “Learning patterns and the game of Flip”

  1. Another important application is branch prediction in CPUs. Most PC CPUs since the PPro/Pentium MMX use a history table similar to p[] here, except that each entry stores the probability of the branch being taken as a two-bit saturating counter. The length of the history pattern used to index the table varies between 4 and 16 bits. The update doesn’t use a decay factor, it just increments or decrements the two-bit counter.

Leave a Reply