Wednesday, July 8, 2015

Candy Crush Codes

Candy Crush Saga is a game in which... no, I don't think I have to tell you! But I bet you didn't know yet that the game is actually NP-hard! The original proof can be found on arXiv (or, again, on arXiv; these authors claim to prove a little more general result), but there are also plenty of pop-sci articles around. (Now, that is quite interesting: Walsh posted his paper on March 8th, 2014, and the press wrote articles about it only five days later.) Anyways, that's not really what I wanted to tell you.

A typical game field of Candy Crush Saga (copyright with King Inc., taken from Wikipedia)
If you look at above picture, you will see no three candies in a row or in a column having the same flavor. Why? Because they would disappear immediately (that's more-less the only game rule). But this game rule more-less immediately leads to the following question:

If and $n\times m$ field, filled with candies of $k$ different flavors that are randomly, independently and uniformly chosen, what is the probability that there is at least one horizontal or vertical triple of the same flavor?

In other words, what is the probability that, starting a new level, something happens without you actually touching the screen. The question is obviously combinatorial and interesting in itself. But there is even more to it.

If we look at $n\times m$ fields filled with candies of $k$ different flavors, there are in total $k^{nm}$ possible arrangements. The rules of Candy Crush Saga, however, allow only a subset of these, namely,  $k^{nm}$ minus the answer to above question. For example, if we take just one row with three columns and two flavors, there are eight possible arrangements, of which two are excluded. Taking two rows allows 36 out of  64 arrangements, and three rows permit 102 out of 512. (At least that's what my first calculations show.) In other words, the set of all possible fields allows for a redundancy: If we want to transmit messages by sending pictures of Candy Crush fields, a $3\times 3$ field allows you transmit only up to 102 different messages, although there are 512 different possible arrangements. If now, during the transmission, an error occurs -- the flavor of a candy changes -- you might end up with an invalid field! Candy Crush Saga can be used as a nonlinear block code! Super-exciting, right?

At that, in turn, allows us to ask several questions:

  • How should we encode a message efficiently? Is there anything better than just keeping a table of , say,102 fields? (There is.)
  • How should we decode the message efficiently? That's a more difficult question if you don't want to store a table of all allowed arrangements. Maybe belief propagation can help here, as it does for Sudoku codes.
  • What are the basic properties of an $(n,m,k)$ Candy Crush Code? What is the maximally possible rate given the game rule, i.e., what is the ratio between the number of possible and allowed arrangements? What is the minimum Hamming distance, etc.? How can we improve our encoder in order to guarantee a positive Hamming distance?
  • Given that we change the flavor of every candy with a given probability $p \ll 1$ (and choose the new flavor uniformly among the remaining ones), what is the error probability? I.e., what is the probability that the changed arrangement is not detected as erroneous, because it also satisfies the game rules?
  • What about error propagation of this code? I.e., if a given candy changes its flavor, how many "information bits" are effected? Can we still reconstruct part of the information, or is the whole block destroyed? (I guess the answer is no.)
  • Connecting to the last question: What happens if we let one dimension of the field go to infinity? We would need sequential encoding and decoding techniques, such as they are usual for convolutional codes.
  • What happens to all of these topics if we span the field on a cylinder (such that the rules extend over one side of the field: two candies of a given flavor on one side rule out a single candy of the same flavor on the other side) or on a torus?
Most probably, the code thus constructed is crappy. But it is nevertheless a nice topic for a research internship, I guess. And it makes learning nonlinear codes for symmetric channels as fun as Sudoku codes do for erasure channels!