Poker has become amazingly popular recently. While some significant progress has been made towards algorithms which play well, very few of them can adapt to their opponents playing styles the way the best human players do.
In this problem, you will play a simplified version of the game, which we will call two-card draw. In this game, each player first antes 1 unit (puts 1 unit in the pot), and is then dealt two 'cards' where a card is simply a random integer from 0 to 4. After the players receive their cards, there is a round of betting. If no one folds (quits) in the first round of betting, each player may then exchange (or draw) 0, 1 or 2 of his cards for new cards. You will draw first, and then your opponent will draw, knowing how many cards you have exchanged. After this exchange, there is another round of betting, followed by a showdown (assuming no one has folded). In the showdown, the two hands are compared just like in regular poker. A hand with two cards that are the same always beats a hand with two different cards. If both hands have a pair (two cards of the same value) the higher pair wins. If neither hand has a pair, then the hand with the highest card wins. If each hand has the same highest card, then the other card is compared. If both hands are the same, then it is a tie. Whoever has the winning hand wins the pot, while the pot is split evenly if there is a tie.
In each round of betting you act first and have two options: check or bet. If you check, you put no more money into the pot. If you choose to bet, you place either bet1 or bet2 units in the pot, depending on whether it is the first or second round of betting. After you decide to check or bet, your opponent can either fold, call, or raise. If he folds, you win the pot. If he calls, he puts the same amount in the pot as you, and the round of betting is over. If he raises, he matches your bet, and puts in one more of his own (whose size is either bet1 or bet2) , and then the decision is put back to you: fold (opponent wins), call (draw or showdown next), or raise. The two players may raise a combined total of at most 3 times in each round of betting (where choosing to bet initially counts as a raise). Thus, the following are the only reasonable valid betting sequences:
YOU | OPP | YOU | OPP | YOU ------+-------+-------+-------+------ check | call | | | check | raise | fold | | check | raise | call | | check | raise | raise | fold | check | raise | raise | call | check | raise | raise | raise | fold check | raise | raise | raise | call bet | fold | | | bet | call | | | bet | raise | fold | | bet | raise | call | | bet | raise | raise | fold | bet | raise | raise | call |
In this problem, you will be playing against the TopCoder server, which will play a fixed strategy in each test case. Though the server's strategy will be fixed, it will be probabilistic. Thus, in each situation, the server will make a random choice based on a fixed probability distribution and perform the action dictated by this choice. More specifically, we will use the term situation to indicate the combination of the cards held by the server, and all the past actions that have led to the current decision being made by the server. For instance, one situation is: server's cards = (1,1), you raised, the server called, you then drew one card. In this situation, the server must make a decision about how many cards to draw. It will make this decision based on a probability distribution associated with the above situation.
To play against the TC server, you must implement a number of methods, as described below.
Notice that you will not always be explicitly told everything that happens. For example, if the TC server decides to fold, you will not be explicitly told, but you may deduce it from the fact that the new_hand method will be called instead of one of the other methods. Similarly, when you raise in the first round and the TC server calls, you won't be told explicitly, but can deduce what happened since the draw method will then be called.Here is a simple example of a hand:
First, new_hand is called, with c1 = 1 and c2 = 2, indicating that you have a 1 and a 2. Let's assume you want to bet in round 1 of the betting, so you would return 1 from the new_hand method. Assuming the server calls, you would now have to decide how many cards to draw, and so the draw method would be called with c1=1, c2=2, bets=1. To draw 1 cards, you return 1. Now, assume that the server draws 1 card. The round2 method would then be called, and let's assume that you draw a 4 so, the parameters would be card1=2, card2=4, bets1=1, bets2=0, drew=1 (since there are no bets yet in round 2, and the server drew 1 card). Lets say that you check (return 1 from the round2 method), and the server calls. We then go to the showdown, where you would learn what the server holds. It turns out that the server a 4 and a 3, and so you lose. You put a total of 1 (your ante) plus one small bet (of size bet1) in the pot, and so that is how much you would lose on this hand.
ScoringIn each test case, you will play 10,000 hands. Your score for a test case will be your total profit (or loss) plus 10,000. If this ends up being negative, it is increased to 0. Your overall score will simply be your average score for all the tests.
Further Server Play DetailsThis section describes the behavior of the TC Server in further detail, and is not strictly necessary for understanding the problem, but it may help you play better.
In the first round of betting a situation for the server is fully specified by the two cards held, along with the number of bets placed so far. Note that if the number of bets is even, it means that you checked as your initial action, while if it is odd, it means that you bet as your initial action. For these situations, the server has three probabilities, which form a probability distribution: the probabilities to fold, call, or raise. Another type of situation occurs when the first round of betting is over and the server must draw cards. In this case, a situation is defined by the cards held, the betting action of the first round (number of bets, and who bet first), along with the number of cards drawn by you. Finally, the situations in the second round of betting are defined by the cards held, the betting action of the first round, and the number of cards drawn by each player. Note that in the second round of betting, the decision is in no way dependent on the cards which were exchanged.
The exact probability distributions for the various actions in all the situations combine to form a strategy for the TC server. The value of a server strategy is defined to be the amount that it wins (or loses) when playing against an opposing strategy that is perfectly tuned to it. In other words, the value of the strategy is the worst that strategy could do (in expectation) against an opponent (like you) who played perfectly (against this strategy), assuming that that opponent fully understood the strategy (knew all the probabilities). In general, a random strategy is very bad, and has an extremely low value. However, a random strategy can serve as a starting point to find a good strategy, with a higher value. More specifically, we start with a random probability distribution chosen uniformly from the unit simplex. From this starting point, we consider the probability distributions for each situation, one distribution at a time. For each situation, we generate a new probability distribution, and see if using this distribution instead of the current one improves the value of the strategy. If it does, we install it, otherwise, we simply move on to the next situation. For each test case, we do this between 10,000 and 200,000 times (total, not per situation). Finally, in practice this algorithm performs better with a couple of slight tweaks. Instead of evaluating the value of the strategy against an opponent who plays perfectly, the value is evaluated against an opponent who plays almost perfectly, but does the wrong thing a very small fraction of the time (1 time in a thousand for each wrong possible wrong thing).
Another way that this strategy generation algorithm is improved is to choose the random probability distributions subject to the following constraints, which attempt to prevent the strategy from doing anything that seem to be clearly wrong. Additionally, in each iteration of the improvement algorithm, these constraints will be maintained:
|-||You may return any integer from the showdown and init methods -- it will be ignored.|
|-||The memory limit is 64MB and the time limit is 20 seconds (which only includes time spent in your code).|
|-||A sample solution is available in Java, C#, Python, or C++. The strategy is very simple: bet/raise with a pair of fours and check/call otherwise.|
|-||There are 4 example test cases and 16 full submission test cases.|
|-||If you return 0 on the first call to round2 (when you should return 1 to check, or 2 to bet), it will be treated as if you returned 1 and you will check.|
|-||BET1 will be between 1 and 5, inclusive.|
|-||BET2 will be between BET1 and 3*BET1, inclusive.|
|-||card1 will always be the smaller valued of the two cards.|