Training an AI for the card game Dominion¶
I'm fascinated by stories in the news of how the world's best AIs for chess and Go were trained by playing against themselves. It seems like a wonderful tool for learning more about the strategy of a game, as well as a useful tool for designing new games.
On a whim, I decided a fun quarantine project would be training a deep neural network to play the card game Dominion. Unfortunately, I started off knowing nothing about how to build AIs, reinforcement learning (RL), or deep neural networks. It turns out I didn't even know much about Dominion strategy.
Despite that, I've made some progress toward the goal!
- Part 1 (this) covers (simplified) game rules, genetic algorithms, and classical reinforcement learning.
- Part 2 covers neural networks and deep reinforcement learning approaches.
- Part 3 extends the AI to the full game of Dominion, and all the secondary decisions that entails.
This first part describes how I've built and trained genetic algorithms and conventional reinforcement learning algorithms on a stripped down version of Dominion. I'll start with an overview of Dominion and its strategy, then move on to the algorithms.
If you want to follow along, all the Python code is available on Github.
Overview of Dominion¶
Dominion is a deck-building card game that can be played with 2, 3, or 4 players. Each player starts with a deck of 10 cards. They take turns drawing a hand of 5 cards, which they use to "buy" one or more new cards to add to their discards (past hands). Each time a player's deck is exhausted it is reshuffled, thus incorporating the newly bought cards.
Most Dominion games feature 16 types of cards: 6 basic cards that appear in every game, and 10 other cards selected (often randomly) from a pool of hundreds that have been designed over the years. You can play for free online against other people or a pretty good AI.
This post focuses on the 6 basic cards that form the core game mechanic. Any winning strategy in a full game of Dominion will have to beat the optimal strategy in the stripped-down game, because those 6 cards are always present.
The six core cards form two groups. First group is Copper, Silver, and Gold. These cost nothing, $3, and $6 to buy, and in your hand are worth $1, $2, and $3 worth of buying power respectively. Collecting these cards increases the average buying power of your hands, but does not directly help you win the game.
The second group is Estate, Duchy, and Province. These cost $2, $5, and $8 respectively, and give you 1, 3, or 6 Victory Points toward winning the game. When the limited supply of Provinces is bought up, the player with the most total points wins. Collecting these cards moves you toward victory, but otherwise they don't "do" anything as part of a hand.
This is the basic tension in Dominion: you must collect Victory Points to win, but doing so dilutes your deck with "useless" cards that make your future hands less able to afford those Victory Points!
The full game also includes a random selection of 10 Action cards. These have varying costs and varying effects on the game. Many of them also introduce secondary decisions that make AI development more difficult, so I'll ignore them for now.
The "Big Money" Strategy (or, Dominion is Boring)¶
The basic strategy that arises from these rules is pretty simple. First, buy a Province (8) if you can, because they're worth a lot of Victory Points and (usually) control the end of the game. If you can't afford a Province, buy a Gold (6). If you can't afford a Gold, buy a Silver ($3). If you can't afford a Silver, pass and wait for your next turn: additional Coppers, Estates, and even Duchies typically have a negative impact on your chance of winning.
The Dominion community calls this strategy "Big Money". Even in the full game of Dominion, this strategy will typically beat novice players, who get distracted by all the interesting game mechanics of the action cards.
As a human, this is a fairly unsatisfying strategy to play, which makes Dominion feel boring. Fortunately, judicious application of a few actions can strengthen Big Money significantly, which we'll cover in a future post.
However, in the reduced game "Big Money" is the optimal strategy, regardless of what your opponent(s) do. This is a form of boring that is harder to fix, and anecdotally affects most full games of Dominion too: there is typically one strategy that dominates all others. At least for our purposes, though, it means there is a "right answer" that the algorithms can converge to.
Fixed-rank genetic algorithm¶
In our simplified game, there at most 7 moves to choose from: 6 cards to buy, or not buying anything. Depending on how much money is in our hand, some of those moves may not be available to us. Thus, one useful way to formulate a strategy is as a ranked preference for the 7 actions. In any given turn, we choose the highest-ranked move that is available to us.
There are only 7! = 5040 possible ranking orders for those 7 actions. (Actually, there are even fewer, since the order after the END TURN action doesn't matter. I think it's 1957 possibilities.) I could have thrown all possible strategies into a big, random tournament, and whittled it down until I found the one that wins most often. But it's hard to scale that up to more complex strategies, or more complex versions of the game.
Instead, I implemented a basic genetic algorithm (GA). Genetic algorithms model Darwinian "survival of the fitest" in a simplified way. You start with a population of random stragies, each defined by a set of numerical parameters. You pit them against each repeatedly other in random matches. You keep the most successful strategies, and replenish the population by "mating" those strategies (crossover).
The great thing about this is that although I've never implemented a genetic algorithm before, the idea is so simple that I didn't need to look anything up -- I just went off what I remembered. I gave each strategy 7 random numbers ("weights", uniform from 0 to 1), representing a preference for each action. Sorting the actions based on those weights gives a rank ordering of the actions. I arbitrarily chose to keep the top 20% of strategies after each tournament, and threw in a 5% chance of random "mutation" for each parameter.
def evolve(strategies):
popsize = len(strategies)
parents = strategies[:popsize//5]
newstrat = list(parents)
while len(newstrat) < popsize:
p1 = random.choice(parents)
p2 = random.choice(parents)
w = cross(p1.weights, p2.weights)
mutate(w, 0.05)
newstrat.append(p1.__class__(w))
return newstrat
def cross(w1, w2):
threshold = random.random()
w3 = [e1 if random.random() < threshold else e2 for e1, e2 in zip(w1, w2)]
return w3
def mutate(w, rate):
for ii in range(len(w)):
if random.random() < rate:
w[ii] = random.random()
The crossover strategy is not an accurate reflection of biological crossover, either within a single chromosome or with many independent chromosomes. However, those strategies either cause groups of parameters to frequently be inherited together (which I didn't want), or cause offspring to tend toward a 50:50 mix of their parents (which I also didn't want). In this version, each parameter is inherited independently of the others, but it's relatively common to get (say) offspring that are 95% parent 1 and 5% parent 2, allowing for smaller, incremental changes.
Starting with a population size of about 400 strategies, I ran 100 rounds of alternating evolution and evaluation. To evaluate a population, I let each strategy play 100 games, with randomly selected opponents from the pool. Empirically, I found 50 games was also sufficient to reliably evaluate the strength of a strategy, but 20 was not -- with only 20 games, the algorithm would not converge to a best strategy.
In this case, the process converged within just a few rounds. Each round took a few seconds with PyPy, or about 5 times longer with CPython. They got faster over time, as the population became more efficient and games finished more quickly. The whole 100-cycle training process took just 6 minutes on my elderly Mac laptop.
Here are the stats for the leading strategy at the end of the last tournament. The typical (modal) length of a game is 18 turns per player, followed by 17 or 16 -- this is a useful baseline number. The longest game took 31 turns, probably a combination of bad luck and a badly mutated opponent. (I cut games off after 50 turns so bad strategies can't get stuck in an infinite loop.) This strategy won 66 out of 100 games (with ties counting as ½), and accumulated at total of 3153 victory points (average 31.5 per game).
The middle lines show each turn of the game, and how often each action was taken. Actions are listed in order from most preferred to least. The bottom shows sum totals over all turns of all 100 games. It has learned the Province - Gold - Silver order, in keeping with "Big Money". It averages 4.7 of 8 possible Provinces, 6 Golds, and 8 Silvers per game. It also buys an Estate in about half the games -- I suspect this one extra point helps it win games that would otherwise be ties!
round 99 wins 66.0 fitness 3153 game len 18,17,16 2 players 2.98 sec
buys:
1: 88 Silver 12 Estate
2: 92 Silver 8 Estate
3: 14 Gold 79 Silver 7 Estate
4: 17 Gold 81 Silver 2 Estate
5: 6 Province 38 Gold 54 Silver 2 Estate
6: 5 Province 39 Gold 55 Silver 1 Estate
7: 12 Province 34 Gold 51 Silver 3 Estate
8: 24 Province 44 Gold 31 Silver 1 Estate
9: 24 Province 40 Gold 34 Silver 2 Estate
10: 38 Province 38 Gold 22 Silver 2 Estate
11: 25 Province 47 Gold 28 Silver
12: 36 Province 41 Gold 23 Silver
13: 39 Province 33 Gold 28 Silver
14: 46 Province 34 Gold 19 Silver 1 Estate
15: 28 Province 43 Gold 25 Silver 1 Estate
16: 37 Province 32 Gold 20 Silver 1 Copper
17: 39 Province 24 Gold 13 Silver 1 Estate
18: 31 Province 17 Gold 12 Silver
19: 17 Province 15 Gold 7 Silver 1 Estate
20: 15 Province 9 Gold 4 Silver
21: 7 Province 6 Gold 6 Silver 1 Estate 1 Copper
22: 4 Province 9 Gold 3 Silver 1 Copper
23: 10 Province 3 Gold 3 Silver
24: 7 Province 4 Gold 4 Silver
25: 5 Province 3 Gold 4 Silver
26: 3 Province 5 Gold 2 Silver
27: 2 Province 6 Gold 1 Silver
28: 4 Province 1 Gold 2 Silver
29: 2 Province 1 Gold 1 Silver
30: 1 Province 1 Silver
31: 1 Province
Sum 468 Province 597 Gold 793 Silver 45 Estate 3 Copper
Linear-rank genetic algorithm¶
One problem with this strategy is that it can't distinguish the early game from the late game. A human player will typically avoid buying Victory Points too early, because they clutter the deck; better to stock up on Gold and Silver early, while Duchies and even Estates are attractive near the end.
One way to accomplish this is to add a second parameter (weight) for each card, which represents how much its preference should change each turn.
In other words, the preference goes from being a fixed value (b
) to a linear equation in the number of turns t
(a*t + b
).
While b
is drawn from a uniform random distribution on [0,1), I made a
come from a normal distribution with mean 0 and standard deviation of 0.05.
This means the value of a card could either increase or decrease as the game progressed, and 0.05 times 20 turns would be enough to change a preference of 0 into 1 or vice versa.
A few additional lines of code is all it took, and here are the results. This strategy wins a little more often, a little faster, and scores a few more points on average. The fixed strategy bought its first Provinces in round 5, but this one waits until round 7. Starting in round 11, it prioritizes Duchies over Silver, and by round 13 prioritizes them over Gold. In round 13 it also starts buying Estates if it gets a bad draw, but unlike the fixed rank strategy it prefers Copper over Estates in the early game.
round 99 wins 73.0 fitness 4174 game len 16,18,20 2 players 3.93 sec
buys:
1: 92 Silver 8 Copper
2: 93 Silver 7 Copper
3: 25 Gold 75 Silver
4: 16 Gold 81 Silver 3 Copper
5: 42 Gold 57 Silver 1 Copper
6: 48 Gold 51 Silver 1 Copper
7: 13 Province 36 Gold 51 Silver
8: 35 Province 35 Gold 30 Silver
9: 23 Province 49 Gold 28 Silver
10: 28 Province 43 Gold 29 Silver
11: 35 Province 32 Gold 20 Duchy 13 Silver
12: 46 Province 31 Gold 12 Duchy 11 Silver
13: 37 Province 57 Duchy 6 Estate
14: 35 Province 47 Duchy 18 Estate
15: 38 Province 51 Duchy 9 Estate 1 Silver
16: 30 Province 52 Duchy 12 Estate
17: 23 Province 32 Duchy 3 Gold 22 Estate
18: 19 Province 21 Duchy 9 Gold 17 Estate 2 Silver
19: 7 Province 15 Duchy 33 Estate 2 Silver
20: 18 Province 5 Duchy 25 Estate 1 Gold 5 Silver
21: 6 Province 4 Duchy 23 Estate 4 Gold 4 Silver 1 Copper
22: 4 Province 1 Duchy 9 Estate 4 Gold 12 Silver 2 Copper
23: 5 Province 2 Duchy 4 Estate 8 Gold 10 Silver 4 Copper
24: 2 Province 2 Duchy 1 Estate 9 Gold 12 Silver 3 Copper
25: 4 Province 1 Estate 9 Gold 12 Silver 2 Copper
26: 5 Province 1 Estate 6 Gold 11 Silver 1 Copper
27: 3 Province 11 Gold 5 Silver 3 Copper
28: 4 Province 6 Gold 9 Silver
29: 3 Province 4 Gold 8 Silver
30: 3 Province 5 Gold 6 Silver
31: 3 Province 8 Gold 3 Silver
32: 3 Province 5 Gold 5 Silver
33: 7 Province 1 Gold 4 Silver
34: 1 Province 5 Silver 4 Copper
35: 1 Province 4 Gold 4 Silver 1 Copper
36: 4 Province 3 Gold 1 Silver 1 Copper
37: 4 Province 2 Silver
38: 3 Gold 2 Silver
39: 2 Province 1 Gold 2 Silver
40: 2 Province 1 Gold
41: 1 Gold
42: 1 Gold
43: 1 Gold
44: 1 Province
45: 1 Province
Sum 738 Silver (-0.004) 465 Gold (-0.023) 42 Copper (0.000) 181 Estate (-0.059) 455 Province (-0.132) 321 Duchy (-0.105)
Now, there is already a sophisticated GA for Dominion out there, which implements a somewhat different approach than I did. However, my primary purpose here is learning, and establishing a performance baseline to test future algorithms I implement.
Reinforcement learning¶
The genetic algorithm still has a lot potential, which I want to return to in a future post. But as we continue adding parameters to our strategy, evolutionary search becomes less efficient. As I tried to read up on state-of-the-art Deep Reinforcement Learning (RL) approaches, it became clear I needed to understand classical RL first. Fortunately, I found the wonderful book Reinforcement Learning: An Introduction (Sutton & Barto, 2020) for free online. It looks like there's also an online course, and all the examples have been translated into Python.
I found the book surprisingly accessible. Although it's a mathy subject, the authors do a good job of explaining the underlying intuitions and providing straightforward pseudocode. It's well worth a read, but I'll try to recap a few key ideas here.
RL algorithms try to maximize the total, long-term "reward" they achieve in the game. They do this by learning estimates of the expected future reward for each game state, or each (state, action) pair. Apparently, a common reward structure is +1 at the end of the game for winning, 0.5 for ties, and 0 for losing. Under this scheme, it turns out the estimates of future rewards are also estimated probabilities of winning. There's nothing magic about this, but it makes the numbers easier for me to think about.
If you have such estimates, it's straightforward to derive a strategy: for each game state, you choose the action that maximizes your probabilty of winning, according to your current estimates.
Getting the estimates is not very hard, either. I found the Monte Carlo methods worked very well for this problem. In those methods, you keep a tally for each (state, action) pair of how many wins and losses it has led to, and you update that tally after each game. While playing, 90% of the time you choose the action you currently estimate to be best, and 10% of the time you choose an action at random (to ensure all possibilites are explored).
I also tried the closely related Temporal Difference (TD) methods, which update the estimates during the game, without waiting for a final outcome. For whatever reason, these did not work well for me, and often would not converge at all. I suspect it may be the very long-term nature of the rewards. They might have worked better if I awarded points as Victory cards were acquired, but I was concerned that could incentivize the algorithms to drive the score up over the course of a long game rather than winning quickly and efficiently.
In examples from the book, the same actions were typically available in all states, although in a few cases, certain state-action pairs had large negative rewards (walking off a cliff). To do that in Dominion, one has to make the available money part of the game state, and presumably an illegal choice of action just leads to forfeiting a turn. Instead, I kept the notion of ranking actions, in this case by their expected rewards, so the algorithm just goes down the list until it finds a legal one. Perhaps this departure from the formal framework also plays a role in breaking the TD algorithms, although it doesn't seem to hurt the Monte Carlo ones.
Representing a game "state" also seems to be a bit of an art form. With too many states, you don't get enough samples to learn a good strategy for any of them. With too few states, the AI doesn't make relevant distinctions. I decided on 3 key inputs: the game turn (1 to 20+), the current difference in score versus the leading opponent (-13 to +13), and the number of Provinces remaining (1 to 3+). To reduce the number of states, I capped game turn at 20 (all endgame turns share the same strategy) and capped Provinces remaining at 3 (behavior changes only when 1 or 2 are left).
With nothing but these states, a +1 reward for winning, and the basic Monte Carlo algorithm, I got this strategy:
round 499 wins 56.5 fitness 3953 game len 21,22,26 2 players 0.28 sec
buys:
1: 89 Silver 3 Estate 6 END 2 Copper
2: 80 Silver 6 Estate 9 END 5 Copper
3: 17 Gold 70 Silver 3 Estate 3 END 7 Copper
4: 14 Gold 71 Silver 10 END 5 Copper
5: 33 Gold 3 Duchy 53 Silver 2 Estate 5 END 4 Copper
6: 1 Province 37 Gold 1 Duchy 49 Silver 2 Estate 5 END 5 Copper
7: 7 Province 31 Gold 1 Duchy 51 Silver 3 Estate 3 END 4 Copper
8: 13 Province 41 Gold 41 Silver 2 Estate 1 END 2 Copper
9: 13 Province 43 Gold 8 Duchy 27 Silver 6 Estate 2 END 1 Copper
10: 28 Province 33 Gold 26 Silver 2 Estate 9 END 2 Copper
11: 25 Province 39 Gold 14 Duchy 16 Silver 1 Estate 2 END 3 Copper
12: 14 Province 51 Gold 10 Duchy 14 Silver 4 Estate 6 END 1 Copper
13: 19 Province 31 Gold 7 Duchy 20 Silver 9 Estate 5 END 9 Copper
14: 21 Province 21 Gold 17 Duchy 21 Silver 9 Estate 8 END 3 Copper
15: 24 Province 26 Gold 19 Duchy 18 Silver 7 Estate 3 END 3 Copper
16: 32 Province 22 Gold 16 Duchy 18 Silver 2 Estate 4 END 6 Copper
17: 23 Province 17 Gold 22 Duchy 14 Silver 7 Estate 9 END 8 Copper
18: 20 Province 17 Gold 31 Duchy 7 Silver 10 Estate 9 END 5 Copper
19: 20 Province 16 Gold 29 Duchy 8 Silver 13 Estate 8 END 3 Copper
20: 32 Province 4 Gold 32 Duchy 6 Silver 15 Estate 5 END 1 Copper
21: 26 Province 1 Gold 30 Duchy 10 Silver 18 Estate 2 END 2 Copper
22: 14 Province 3 Gold 27 Duchy 6 Silver 20 Estate 5 END 2 Copper
23: 14 Province 2 Gold 20 Duchy 6 Silver 10 Estate 5 END 9 Copper
24: 15 Province 4 Gold 10 Duchy 5 Silver 9 Estate 6 END 11 Copper
25: 9 Province 1 Gold 7 Duchy 5 Silver 18 Estate 6 END 9 Copper
26: 8 Province 3 Gold 2 Duchy 10 Silver 12 Estate 5 END 6 Copper
27: 13 Province 1 Gold 2 Duchy 6 Silver 9 Estate 3 END 3 Copper
28: 5 Province 5 Gold 2 Duchy 6 Silver 5 Estate 4 END 4 Copper
29: 4 Province 3 Gold 9 Silver 1 Estate 5 END 3 Copper
30: 2 Province 3 Gold 6 Silver 2 END 5 Copper
31: 4 Province 2 Gold 4 Silver 2 END 5 Copper
32: 3 Province 1 Gold 3 Silver 1 Estate 4 END 3 Copper
33: 6 Province 1 Silver 4 END 1 Copper
34: 2 Gold 1 Silver 2 END 1 Copper
35: 1 Silver 3 END 2 Copper
36: 1 Province 1 Gold 1 Silver 2 END
37: 1 Silver 1 END 1 Copper
38: 1 Silver 1 END 1 Copper
39: 1 Province 1 END 1 Copper
40: 1 Copper
41: 1 Copper
42: 1 Copper
43: 1 Copper
44: 1 Copper
45: 1 Copper
46: 1 Province
47: 1 Silver
48: 1 Province
Sum 419 Province 525 Gold 310 Duchy 782 Silver 209 Estate 175 END 154 Copper
Visited 763 states
Relative to the genetic algorithm, these appear slightly worse: the win rate is lower (could just be stronger opponents) and the game lengths are longer (suggesting the strategies are less efficient). But overall, the learned strategy is very similar.
In this formulation, there's no single preference order on each turn, because there are a variety of states. Instead, I list actions in order of decreasing buy cost. Also, keep in mind that about 10% of actions on any turn were randomly selected instead of optimal, which could account for the games running about 10% longer. Again, there's a strong uptick in Duchies starting in round 11, and Estates in round 13.
The algorithm converges even faster that the genetic algorithm, running 500 cycles in just 3 minutes. This is in part due to a much smaller population of just 4 strategies, instead of 400 for the GA. Although there are 20*27*3 = 1620 theoretically possible states, only 763 were ever visited, and perhaps only half of those were visited with any frequency. This is not terribly surprising: it's not possible to have a large score differential or few Provinces remaining during the first several turns. However, it illustrates a nice property of these algorithms, that you only pay for the states that are actually relevant during the game.
Showdown: GA strategy versus RL strategy¶
Before moving on to a deep RL strategy, I wanted to see how these two strategies fare against each other. The reinforcement learning strategy has more flexibility, so I was disappointed to see that it lost rather badly to the linear-rank genetic algorithm: RL won only 15% of games, and lost 85%.
I speculated that perhaps the RL algorithm hadn't fully converged yet, after only 50,000 games played. So I trained 10 times longer. After playing 500,000 games over about 20 minutes, the RL algorithm was nearly a match for the genetic algorithm, winning 49% of games.
After training for 5 million games, the RL algorithm comes out ahead, winning 53% of games. This is encouraging, but it required 3 hours of training time, which does limit the number of iterations I can test. It might do even better with 50 million games, but I'm not willing to wait and find out!
One advantage of the RL strategy is that it can learn not to commit "suicide" -- ending the game when it's behind, converting a likely loss into a certain loss. By contract, the genetic algorithm just races to acquire points as quickly as possible, without regard for the game state. As a result, the GA is perfectly willing to buy the last Province and end the game, even when it's behind by more than 6 points and will lose. In our tournament below, the GA lost about 13% of games this way. On the other hand, the reinforcement learning strategy lost only about 2% of games by suicide. This could have happened from rare states, which had not been visited often enough to learn optimal behavior, or could have been lost on a tie. In a tie, if one player has played fewer turns, that player wins, and our RL algorithm doesn't have enough information in its state to know who will win in a tie.
The table below depicts a 1000-game tournament between our two strategies:
MonteCarloStrategy wins 533.0 suicides 21 fitness 37392 game len 17,18,19 (13 - 40)
buys:
1: 917 Silver 83 Copper
2: 921 Silver 79 END
3: 176 Gold 804 Silver 20 Copper
4: 201 Gold 775 Silver 24 END
5: 363 Gold 621 Silver 16 Copper
6: 546 Gold 449 Silver 5 END
7: 557 Gold 435 Silver 8 Copper
8: 245 Province 468 Gold 283 Silver 2 Copper 2 END
9: 269 Province 442 Gold 282 Silver 1 Copper 6 END
10: 390 Province 393 Gold 51 Duchy 162 Silver 4 END
11: 387 Province 361 Gold 115 Duchy 131 Silver 5 Copper 1 END
12: 396 Province 365 Gold 131 Duchy 99 Silver 7 Estate 1 Copper 1 END
13: 370 Province 17 Gold 471 Duchy 73 Silver 64 Estate 2 Copper 3 END
14: 374 Province 2 Gold 491 Duchy 31 Silver 88 Estate 3 Copper 10 END
15: 403 Province 18 Gold 447 Duchy 40 Silver 71 Estate 4 Copper 4 END
16: 327 Province 52 Gold 432 Duchy 50 Silver 78 Estate 6 Copper
17: 213 Province 25 Gold 411 Duchy 12 Silver 155 Estate 17 Copper 9 END
18: 150 Province 39 Gold 211 Duchy 13 Silver 264 Estate 18 Copper 5 END
19: 135 Province 48 Gold 91 Duchy 29 Silver 228 Estate 29 Copper 13 END
20: 102 Province 27 Gold 26 Duchy 46 Silver 208 Estate 36 Copper 17 END
21: 77 Province 27 Gold 6 Duchy 23 Silver 187 Estate 31 Copper 15 END
22: 39 Province 22 Gold 25 Silver 139 Estate 36 Copper 20 END
23: 35 Province 15 Gold 29 Silver 79 Estate 46 Copper 26 END
24: 28 Province 16 Gold 30 Silver 36 Estate 46 Copper 31 END
25: 23 Province 16 Gold 22 Silver 14 Estate 38 Copper 37 END
26: 20 Province 9 Gold 19 Silver 8 Estate 36 Copper 29 END
27: 10 Province 10 Gold 14 Silver 3 Estate 28 Copper 21 END
28: 11 Province 7 Gold 11 Silver 21 Copper 21 END
29: 6 Province 10 Gold 7 Silver 13 Copper 12 END
30: 2 Province 5 Gold 9 Silver 9 Copper 9 END
31: 2 Province 3 Gold 2 Silver 7 Copper 10 END
32: 1 Province 4 Gold 1 Silver 5 Copper 9 END
33: 2 Province 2 Gold 1 Silver 5 Copper 7 END
34: 3 Gold 1 Silver 3 Copper 5 END
35: 2 Province 1 Gold 1 Copper 5 END
36: 1 Gold 1 Copper 3 END
37: 1 Gold 1 Copper 1 END
38: 1 Gold 1 Copper
39: 1 Copper 1 END
40: 1 Gold
Sum 4019 Province 4254 Gold 2883 Duchy 6367 Silver 1629 Estate 580 Copper 445 END
Visited 907 states
LinearRankStrategy wins 467.0 suicides 131 fitness 38952 game len 18,17,19 (14 - 41)
buys:
1: 923 Silver 77 Copper
2: 919 Silver 81 Copper
3: 207 Gold 769 Silver 24 Copper
4: 188 Gold 786 Silver 26 Copper
5: 392 Gold 584 Silver 24 Copper
6: 526 Gold 462 Silver 12 Copper
7: 129 Province 443 Gold 423 Silver 5 Copper
8: 274 Province 452 Gold 172 Duchy 95 Silver 7 Copper
9: 272 Province 419 Gold 198 Duchy 105 Silver 6 Copper
10: 320 Province 370 Gold 151 Duchy 143 Silver 14 Estate 2 Copper
11: 346 Province 390 Gold 134 Duchy 122 Silver 6 Estate 2 Copper
12: 352 Province 372 Gold 131 Duchy 135 Silver 9 Estate 1 Copper
13: 357 Province 482 Duchy 145 Silver 13 Estate 3 Copper
14: 340 Province 507 Duchy 151 Estate 2 Copper
15: 349 Province 477 Duchy 4 Gold 157 Estate 4 Copper
16: 322 Province 454 Duchy 5 Gold 153 Estate 3 Copper
17: 221 Province 352 Duchy 28 Gold 221 Estate 10 Copper
18: 153 Province 236 Duchy 65 Gold 236 Estate 15 Copper
19: 121 Province 96 Duchy 116 Gold 233 Estate 12 Copper
20: 98 Province 24 Duchy 140 Gold 189 Estate 12 Copper
21: 73 Province 2 Duchy 108 Gold 171 Estate 3 Silver 12 Copper
22: 58 Province 74 Gold 126 Estate 19 Silver 7 Copper
23: 40 Province 70 Gold 72 Estate 38 Silver 19 Copper
24: 25 Province 62 Gold 36 Estate 45 Silver 16 Copper
25: 27 Province 41 Gold 19 Estate 48 Silver 18 Copper
26: 23 Province 31 Gold 9 Estate 41 Silver 13 Copper
27: 17 Province 30 Gold 2 Estate 35 Silver 9 Copper
28: 19 Province 15 Gold 1 Estate 28 Silver 8 Copper
29: 10 Province 21 Gold 17 Silver 4 Copper
30: 14 Province 10 Gold 16 Silver
31: 4 Province 9 Gold 7 Silver 4 Copper
32: 3 Province 6 Gold 9 Silver 3 Copper
33: 2 Province 4 Gold 8 Silver 2 Copper
34: 2 Province 3 Gold 5 Silver 3 Copper
35: 2 Province 6 Gold 2 Silver
36: 3 Province 1 Gold 3 Silver 1 END
37: 3 Province 1 Gold 1 Silver
38: 2 Silver
39: 1 Province 1 Gold
40: 1 Silver
41: 1 Province
Sum 5939 Silver (-0.001) 4610 Gold (-0.024) 446 Copper (-0.000) 3416 Duchy (-0.047) 3981 Province (-0.082) 1818 Estate (-0.030) 1 END (-0.024)
Endgame¶
From here, I hope to start implementing reinforcement learning algorithms that use a neural network to estimate state and action values. I've found one example of deep RL for Dominion, which looks like a Stanford class project, but I'm not going to look at that in any detail for a while to avoid biasing my initial attempts.
There are also some other fairly obvious options to explore. For instance, with the existing code it's relatively trivial to see how 3- or 4-player strategy differs from the 2-player game.
It's also pretty straightforward to add in some of the action cards, to arrive at the full game of Dominion. Many of the action cards are simple, in the sense that the only decision is to play them or not. Others require secondary decisions, which means they require additional models or heuristics to implement. In the RL case, I think adding (simple) actions basically requires training a second model. Alternately, under a strategy that's close to Big Money, most hands will have 0 or 1 action cards in them, so one could just play actions at random for a simple baseline model. In the GA case, actions are particularly easy to handle, because you can use the same evolutionary mechanism to prioritize which actions to play.
In fact, it's already in the code, and I've had fun playing with it. But this write-up is too long already, so that will wait for a future post!