Training an AI for the card game Dominion¶

I'm fascinated by stories in the news of how the world's best AIs for chess and Go were trained by playing against themselves. It seems like a wonderful tool for learning more about the strategy of a game, as well as a useful tool for designing new games.

On a whim, I decided a fun quarantine project would be training a deep neural network to play the card game Dominion. Unfortunately, I started off knowing nothing about how to build AIs, reinforcement learning (RL), or deep neural networks. It turns out I didn't even know much about Dominion strategy.

Despite that, I've made some progress toward the goal!

Part 1 (this) covers (simplified) game rules, genetic algorithms, and classical reinforcement learning.
Part 2 covers neural networks and deep reinforcement learning approaches.
Part 3 extends the AI to the full game of Dominion, and all the secondary decisions that entails.

This first part describes how I've built and trained genetic algorithms and conventional reinforcement learning algorithms on a stripped down version of Dominion. I'll start with an overview of Dominion and its strategy, then move on to the algorithms.

If you want to follow along, all the Python code is available on Github.

Overview of Dominion¶

Dominion is a deck-building card game that can be played with 2, 3, or 4 players. Each player starts with a deck of 10 cards. They take turns drawing a hand of 5 cards, which they use to "buy" one or more new cards to add to their discards (past hands). Each time a player's deck is exhausted it is reshuffled, thus incorporating the newly bought cards.

Most Dominion games feature 16 types of cards: 6 basic cards that appear in every game, and 10 other cards selected (often randomly) from a pool of hundreds that have been designed over the years. You can play for free online against other people or a pretty good AI.

This post focuses on the 6 basic cards that form the core game mechanic. Any winning strategy in a full game of Dominion will have to beat the optimal strategy in the stripped-down game, because those 6 cards are always present.

The six core cards form two groups. First group is Copper, Silver, and Gold. These cost nothing, $3, and $6 to buy, and in your hand are worth $1, $2, and $3 worth of buying power respectively. Collecting these cards increases the average buying power of your hands, but does not directly help you win the game.

The second group is Estate, Duchy, and Province. These cost $2, $5, and $8 respectively, and give you 1, 3, or 6 Victory Points toward winning the game. When the limited supply of Provinces is bought up, the player with the most total points wins. Collecting these cards moves you toward victory, but otherwise they don't "do" anything as part of a hand.

This is the basic tension in Dominion: you must collect Victory Points to win, but doing so dilutes your deck with "useless" cards that make your future hands less able to afford those Victory Points!

The full game also includes a random selection of 10 Action cards. These have varying costs and varying effects on the game. Many of them also introduce secondary decisions that make AI development more difficult, so I'll ignore them for now.

The "Big Money" Strategy (or, Dominion is Boring)¶

The basic strategy that arises from these rules is pretty simple. First, buy a Province ( $8) if you can, because they're worth a lot of Victory Points and (usually) control the end of the game. If you can't afford a Province, buy a Gold ($ 6). If you can't afford a Gold, buy a Silver ($3). If you can't afford a Silver, pass and wait for your next turn: additional Coppers, Estates, and even Duchies typically have a negative impact on your chance of winning.

The Dominion community calls this strategy "Big Money". Even in the full game of Dominion, this strategy will typically beat novice players, who get distracted by all the interesting game mechanics of the action cards.

As a human, this is a fairly unsatisfying strategy to play, which makes Dominion feel boring. Fortunately, judicious application of a few actions can strengthen Big Money significantly, which we'll cover in a future post.

However, in the reduced game "Big Money" is the optimal strategy, regardless of what your opponent(s) do. This is a form of boring that is harder to fix, and anecdotally affects most full games of Dominion too: there is typically one strategy that dominates all others. At least for our purposes, though, it means there is a "right answer" that the algorithms can converge to.

Fixed-rank genetic algorithm¶

In our simplified game, there at most 7 moves to choose from: 6 cards to buy, or not buying anything. Depending on how much money is in our hand, some of those moves may not be available to us. Thus, one useful way to formulate a strategy is as a ranked preference for the 7 actions. In any given turn, we choose the highest-ranked move that is available to us.

There are only 7! = 5040 possible ranking orders for those 7 actions. (Actually, there are even fewer, since the order after the END TURN action doesn't matter. I think it's 1957 possibilities.) I could have thrown all possible strategies into a big, random tournament, and whittled it down until I found the one that wins most often. But it's hard to scale that up to more complex strategies, or more complex versions of the game.

Instead, I implemented a basic genetic algorithm (GA). Genetic algorithms model Darwinian "survival of the fitest" in a simplified way. You start with a population of random stragies, each defined by a set of numerical parameters. You pit them against each repeatedly other in random matches. You keep the most successful strategies, and replenish the population by "mating" those strategies (crossover).

The great thing about this is that although I've never implemented a genetic algorithm before, the idea is so simple that I didn't need to look anything up -- I just went off what I remembered. I gave each strategy 7 random numbers ("weights", uniform from 0 to 1), representing a preference for each action. Sorting the actions based on those weights gives a rank ordering of the actions. I arbitrarily chose to keep the top 20% of strategies after each tournament, and threw in a 5% chance of random "mutation" for each parameter.

def evolve(strategies):
    popsize = len(strategies)
    parents = strategies[:popsize//5]
    newstrat = list(parents)
    while len(newstrat) < popsize:
        p1 = random.choice(parents)
        p2 = random.choice(parents)
        w = cross(p1.weights, p2.weights)
        mutate(w, 0.05)
        newstrat.append(p1.__class__(w))
    return newstrat

def cross(w1, w2):
    threshold = random.random()
    w3 = [e1 if random.random() < threshold else e2 for e1, e2 in zip(w1, w2)]
    return w3

def mutate(w, rate):
    for ii in range(len(w)):
        if random.random() < rate:
            w[ii] = random.random()

The crossover strategy is not an accurate reflection of biological crossover, either within a single chromosome or with many independent chromosomes. However, those strategies either cause groups of parameters to frequently be inherited together (which I didn't want), or cause offspring to tend toward a 50:50 mix of their parents (which I also didn't want). In this version, each parameter is inherited independently of the others, but it's relatively common to get (say) offspring that are 95% parent 1 and 5% parent 2, allowing for smaller, incremental changes.

Starting with a population size of about 400 strategies, I ran 100 rounds of alternating evolution and evaluation. To evaluate a population, I let each strategy play 100 games, with randomly selected opponents from the pool. Empirically, I found 50 games was also sufficient to reliably evaluate the strength of a strategy, but 20 was not -- with only 20 games, the algorithm would not converge to a best strategy.

In this case, the process converged within just a few rounds. Each round took a few seconds with PyPy, or about 5 times longer with CPython. They got faster over time, as the population became more efficient and games finished more quickly. The whole 100-cycle training process took just 6 minutes on my elderly Mac laptop.

Here are the stats for the leading strategy at the end of the last tournament. The typical (modal) length of a game is 18 turns per player, followed by 17 or 16 -- this is a useful baseline number. The longest game took 31 turns, probably a combination of bad luck and a badly mutated opponent. (I cut games off after 50 turns so bad strategies can't get stuck in an infinite loop.) This strategy won 66 out of 100 games (with ties counting as ½), and accumulated at total of 3153 victory points (average 31.5 per game).

The middle lines show each turn of the game, and how often each action was taken. Actions are listed in order from most preferred to least. The bottom shows sum totals over all turns of all 100 games. It has learned the Province - Gold - Silver order, in keeping with "Big Money". It averages 4.7 of 8 possible Provinces, 6 Golds, and 8 Silvers per game. It also buys an Estate in about half the games -- I suspect this one extra point helps it win games that would otherwise be ties!

round 99    wins 66.0    fitness 3153    game len 18,17,16    2 players    2.98 sec
  buys:
     1:   88 Silver   12 Estate
     2:   92 Silver   8 Estate
     3:   14 Gold   79 Silver   7 Estate
     4:   17 Gold   81 Silver   2 Estate
     5:   6 Province   38 Gold   54 Silver   2 Estate
     6:   5 Province   39 Gold   55 Silver   1 Estate
     7:   12 Province   34 Gold   51 Silver   3 Estate
     8:   24 Province   44 Gold   31 Silver   1 Estate
     9:   24 Province   40 Gold   34 Silver   2 Estate
    10:   38 Province   38 Gold   22 Silver   2 Estate
    11:   25 Province   47 Gold   28 Silver
    12:   36 Province   41 Gold   23 Silver
    13:   39 Province   33 Gold   28 Silver
    14:   46 Province   34 Gold   19 Silver   1 Estate
    15:   28 Province   43 Gold   25 Silver   1 Estate
    16:   37 Province   32 Gold   20 Silver   1 Copper
    17:   39 Province   24 Gold   13 Silver   1 Estate
    18:   31 Province   17 Gold   12 Silver
    19:   17 Province   15 Gold   7 Silver   1 Estate
    20:   15 Province   9 Gold   4 Silver
    21:   7 Province   6 Gold   6 Silver   1 Estate   1 Copper
    22:   4 Province   9 Gold   3 Silver   1 Copper
    23:   10 Province   3 Gold   3 Silver
    24:   7 Province   4 Gold   4 Silver
    25:   5 Province   3 Gold   4 Silver
    26:   3 Province   5 Gold   2 Silver
    27:   2 Province   6 Gold   1 Silver
    28:   4 Province   1 Gold   2 Silver
    29:   2 Province   1 Gold   1 Silver
    30:   1 Province   1 Silver
    31:   1 Province
    Sum   468 Province   597 Gold   793 Silver   45 Estate   3 Copper

Linear-rank genetic algorithm¶

One problem with this strategy is that it can't distinguish the early game from the late game. A human player will typically avoid buying Victory Points too early, because they clutter the deck; better to stock up on Gold and Silver early, while Duchies and even Estates are attractive near the end.

One way to accomplish this is to add a second parameter (weight) for each card, which represents how much its preference should change each turn. In other words, the preference goes from being a fixed value (b) to a linear equation in the number of turns t (a*t + b).

While b is drawn from a uniform random distribution on [0,1), I made a come from a normal distribution with mean 0 and standard deviation of 0.05. This means the value of a card could either increase or decrease as the game progressed, and 0.05 times 20 turns would be enough to change a preference of 0 into 1 or vice versa.

A few additional lines of code is all it took, and here are the results. This strategy wins a little more often, a little faster, and scores a few more points on average. The fixed strategy bought its first Provinces in round 5, but this one waits until round 7. Starting in round 11, it prioritizes Duchies over Silver, and by round 13 prioritizes them over Gold. In round 13 it also starts buying Estates if it gets a bad draw, but unlike the fixed rank strategy it prefers Copper over Estates in the early game.

round 99    wins 73.0    fitness 4174    game len 16,18,20    2 players    3.93 sec
  buys:
     1:   92 Silver   8 Copper
     2:   93 Silver   7 Copper
     3:   25 Gold   75 Silver
     4:   16 Gold   81 Silver   3 Copper
     5:   42 Gold   57 Silver   1 Copper
     6:   48 Gold   51 Silver   1 Copper
     7:   13 Province   36 Gold   51 Silver
     8:   35 Province   35 Gold   30 Silver
     9:   23 Province   49 Gold   28 Silver
    10:   28 Province   43 Gold   29 Silver
    11:   35 Province   32 Gold   20 Duchy   13 Silver
    12:   46 Province   31 Gold   12 Duchy   11 Silver
    13:   37 Province   57 Duchy   6 Estate
    14:   35 Province   47 Duchy   18 Estate
    15:   38 Province   51 Duchy   9 Estate   1 Silver
    16:   30 Province   52 Duchy   12 Estate
    17:   23 Province   32 Duchy   3 Gold   22 Estate
    18:   19 Province   21 Duchy   9 Gold   17 Estate   2 Silver
    19:   7 Province   15 Duchy   33 Estate   2 Silver
    20:   18 Province   5 Duchy   25 Estate   1 Gold   5 Silver
    21:   6 Province   4 Duchy   23 Estate   4 Gold   4 Silver   1 Copper
    22:   4 Province   1 Duchy   9 Estate   4 Gold   12 Silver   2 Copper
    23:   5 Province   2 Duchy   4 Estate   8 Gold   10 Silver   4 Copper
    24:   2 Province   2 Duchy   1 Estate   9 Gold   12 Silver   3 Copper
    25:   4 Province   1 Estate   9 Gold   12 Silver   2 Copper
    26:   5 Province   1 Estate   6 Gold   11 Silver   1 Copper
    27:   3 Province   11 Gold   5 Silver   3 Copper
    28:   4 Province   6 Gold   9 Silver
    29:   3 Province   4 Gold   8 Silver
    30:   3 Province   5 Gold   6 Silver
    31:   3 Province   8 Gold   3 Silver
    32:   3 Province   5 Gold   5 Silver
    33:   7 Province   1 Gold   4 Silver
    34:   1 Province   5 Silver   4 Copper
    35:   1 Province   4 Gold   4 Silver   1 Copper
    36:   4 Province   3 Gold   1 Silver   1 Copper
    37:   4 Province   2 Silver
    38:   3 Gold   2 Silver
    39:   2 Province   1 Gold   2 Silver
    40:   2 Province   1 Gold
    41:   1 Gold
    42:   1 Gold
    43:   1 Gold
    44:   1 Province
    45:   1 Province
    Sum   738 Silver (-0.004)   465 Gold (-0.023)   42 Copper (0.000)   181 Estate (-0.059)   455 Province (-0.132)   321 Duchy (-0.105)

Now, there is already a sophisticated GA for Dominion out there, which implements a somewhat different approach than I did. However, my primary purpose here is learning, and establishing a performance baseline to test future algorithms I implement.

Reinforcement learning¶

The genetic algorithm still has a lot potential, which I want to return to in a future post. But as we continue adding parameters to our strategy, evolutionary search becomes less efficient. As I tried to read up on state-of-the-art Deep Reinforcement Learning (RL) approaches, it became clear I needed to understand classical RL first. Fortunately, I found the wonderful book Reinforcement Learning: An Introduction (Sutton & Barto, 2020) for free online. It looks like there's also an online course, and all the examples have been translated into Python.

I found the book surprisingly accessible. Although it's a mathy subject, the authors do a good job of explaining the underlying intuitions and providing straightforward pseudocode. It's well worth a read, but I'll try to recap a few key ideas here.

RL algorithms try to maximize the total, long-term "reward" they achieve in the game. They do this by learning estimates of the expected future reward for each game state, or each (state, action) pair. Apparently, a common reward structure is +1 at the end of the game for winning, 0.5 for ties, and 0 for losing. Under this scheme, it turns out the estimates of future rewards are also estimated probabilities of winning. There's nothing magic about this, but it makes the numbers easier for me to think about.

If you have such estimates, it's straightforward to derive a strategy: for each game state, you choose the action that maximizes your probabilty of winning, according to your current estimates.

Getting the estimates is not very hard, either. I found the Monte Carlo methods worked very well for this problem. In those methods, you keep a tally for each (state, action) pair of how many wins and losses it has led to, and you update that tally after each game. While playing, 90% of the time you choose the action you currently estimate to be best, and 10% of the time you choose an action at random (to ensure all possibilites are explored).

I also tried the closely related Temporal Difference (TD) methods, which update the estimates during the game, without waiting for a final outcome. For whatever reason, these did not work well for me, and often would not converge at all. I suspect it may be the very long-term nature of the rewards. They might have worked better if I awarded points as Victory cards were acquired, but I was concerned that could incentivize the algorithms to drive the score up over the course of a long game rather than winning quickly and efficiently.

In examples from the book, the same actions were typically available in all states, although in a few cases, certain state-action pairs had large negative rewards (walking off a cliff). To do that in Dominion, one has to make the available money part of the game state, and presumably an illegal choice of action just leads to forfeiting a turn. Instead, I kept the notion of ranking actions, in this case by their expected rewards, so the algorithm just goes down the list until it finds a legal one. Perhaps this departure from the formal framework also plays a role in breaking the TD algorithms, although it doesn't seem to hurt the Monte Carlo ones.

Representing a game "state" also seems to be a bit of an art form. With too many states, you don't get enough samples to learn a good strategy for any of them. With too few states, the AI doesn't make relevant distinctions. I decided on 3 key inputs: the game turn (1 to 20+), the current difference in score versus the leading opponent (-13 to +13), and the number of Provinces remaining (1 to 3+). To reduce the number of states, I capped game turn at 20 (all endgame turns share the same strategy) and capped Provinces remaining at 3 (behavior changes only when 1 or 2 are left).

With nothing but these states, a +1 reward for winning, and the basic Monte Carlo algorithm, I got this strategy:

round 499    wins 56.5    fitness 3953    game len 21,22,26    2 players    0.28 sec
  buys:
     1:   89 Silver   3 Estate   6 END   2 Copper
     2:   80 Silver   6 Estate   9 END   5 Copper
     3:   17 Gold   70 Silver   3 Estate   3 END   7 Copper
     4:   14 Gold   71 Silver   10 END   5 Copper
     5:   33 Gold   3 Duchy   53 Silver   2 Estate   5 END   4 Copper
     6:   1 Province   37 Gold   1 Duchy   49 Silver   2 Estate   5 END   5 Copper
     7:   7 Province   31 Gold   1 Duchy   51 Silver   3 Estate   3 END   4 Copper
     8:   13 Province   41 Gold   41 Silver   2 Estate   1 END   2 Copper
     9:   13 Province   43 Gold   8 Duchy   27 Silver   6 Estate   2 END   1 Copper
    10:   28 Province   33 Gold   26 Silver   2 Estate   9 END   2 Copper
    11:   25 Province   39 Gold   14 Duchy   16 Silver   1 Estate   2 END   3 Copper
    12:   14 Province   51 Gold   10 Duchy   14 Silver   4 Estate   6 END   1 Copper
    13:   19 Province   31 Gold   7 Duchy   20 Silver   9 Estate   5 END   9 Copper
    14:   21 Province   21 Gold   17 Duchy   21 Silver   9 Estate   8 END   3 Copper
    15:   24 Province   26 Gold   19 Duchy   18 Silver   7 Estate   3 END   3 Copper
    16:   32 Province   22 Gold   16 Duchy   18 Silver   2 Estate   4 END   6 Copper
    17:   23 Province   17 Gold   22 Duchy   14 Silver   7 Estate   9 END   8 Copper
    18:   20 Province   17 Gold   31 Duchy   7 Silver   10 Estate   9 END   5 Copper
    19:   20 Province   16 Gold   29 Duchy   8 Silver   13 Estate   8 END   3 Copper
    20:   32 Province   4 Gold   32 Duchy   6 Silver   15 Estate   5 END   1 Copper
    21:   26 Province   1 Gold   30 Duchy   10 Silver   18 Estate   2 END   2 Copper
    22:   14 Province   3 Gold   27 Duchy   6 Silver   20 Estate   5 END   2 Copper
    23:   14 Province   2 Gold   20 Duchy   6 Silver   10 Estate   5 END   9 Copper
    24:   15 Province   4 Gold   10 Duchy   5 Silver   9 Estate   6 END   11 Copper
    25:   9 Province   1 Gold   7 Duchy   5 Silver   18 Estate   6 END   9 Copper
    26:   8 Province   3 Gold   2 Duchy   10 Silver   12 Estate   5 END   6 Copper
    27:   13 Province   1 Gold   2 Duchy   6 Silver   9 Estate   3 END   3 Copper
    28:   5 Province   5 Gold   2 Duchy   6 Silver   5 Estate   4 END   4 Copper
    29:   4 Province   3 Gold   9 Silver   1 Estate   5 END   3 Copper
    30:   2 Province   3 Gold   6 Silver   2 END   5 Copper
    31:   4 Province   2 Gold   4 Silver   2 END   5 Copper
    32:   3 Province   1 Gold   3 Silver   1 Estate   4 END   3 Copper
    33:   6 Province   1 Silver   4 END   1 Copper
    34:   2 Gold   1 Silver   2 END   1 Copper
    35:   1 Silver   3 END   2 Copper
    36:   1 Province   1 Gold   1 Silver   2 END
    37:   1 Silver   1 END   1 Copper
    38:   1 Silver   1 END   1 Copper
    39:   1 Province   1 END   1 Copper
    40:   1 Copper
    41:   1 Copper
    42:   1 Copper
    43:   1 Copper
    44:   1 Copper
    45:   1 Copper
    46:   1 Province
    47:   1 Silver
    48:   1 Province
    Sum   419 Province   525 Gold   310 Duchy   782 Silver   209 Estate   175 END   154 Copper
    Visited 763 states

Relative to the genetic algorithm, these appear slightly worse: the win rate is lower (could just be stronger opponents) and the game lengths are longer (suggesting the strategies are less efficient). But overall, the learned strategy is very similar.

In this formulation, there's no single preference order on each turn, because there are a variety of states. Instead, I list actions in order of decreasing buy cost. Also, keep in mind that about 10% of actions on any turn were randomly selected instead of optimal, which could account for the games running about 10% longer. Again, there's a strong uptick in Duchies starting in round 11, and Estates in round 13.

The algorithm converges even faster that the genetic algorithm, running 500 cycles in just 3 minutes. This is in part due to a much smaller population of just 4 strategies, instead of 400 for the GA. Although there are 20*27*3 = 1620 theoretically possible states, only 763 were ever visited, and perhaps only half of those were visited with any frequency. This is not terribly surprising: it's not possible to have a large score differential or few Provinces remaining during the first several turns. However, it illustrates a nice property of these algorithms, that you only pay for the states that are actually relevant during the game.

Showdown: GA strategy versus RL strategy¶

Before moving on to a deep RL strategy, I wanted to see how these two strategies fare against each other. The reinforcement learning strategy has more flexibility, so I was disappointed to see that it lost rather badly to the linear-rank genetic algorithm: RL won only 15% of games, and lost 85%.

I speculated that perhaps the RL algorithm hadn't fully converged yet, after only 50,000 games played. So I trained 10 times longer. After playing 500,000 games over about 20 minutes, the RL algorithm was nearly a match for the genetic algorithm, winning 49% of games.

After training for 5 million games, the RL algorithm comes out ahead, winning 53% of games. This is encouraging, but it required 3 hours of training time, which does limit the number of iterations I can test. It might do even better with 50 million games, but I'm not willing to wait and find out!

One advantage of the RL strategy is that it can learn not to commit "suicide" -- ending the game when it's behind, converting a likely loss into a certain loss. By contract, the genetic algorithm just races to acquire points as quickly as possible, without regard for the game state. As a result, the GA is perfectly willing to buy the last Province and end the game, even when it's behind by more than 6 points and will lose. In our tournament below, the GA lost about 13% of games this way. On the other hand, the reinforcement learning strategy lost only about 2% of games by suicide. This could have happened from rare states, which had not been visited often enough to learn optimal behavior, or could have been lost on a tie. In a tie, if one player has played fewer turns, that player wins, and our RL algorithm doesn't have enough information in its state to know who will win in a tie.

The table below depicts a 1000-game tournament between our two strategies:

MonteCarloStrategy    wins 533.0    suicides 21    fitness 37392    game len 17,18,19 (13 - 40)
  buys:
     1:   917 Silver   83 Copper
     2:   921 Silver   79 END
     3:   176 Gold   804 Silver   20 Copper
     4:   201 Gold   775 Silver   24 END
     5:   363 Gold   621 Silver   16 Copper
     6:   546 Gold   449 Silver   5 END
     7:   557 Gold   435 Silver   8 Copper
     8:   245 Province   468 Gold   283 Silver   2 Copper   2 END
     9:   269 Province   442 Gold   282 Silver   1 Copper   6 END
    10:   390 Province   393 Gold   51 Duchy   162 Silver   4 END
    11:   387 Province   361 Gold   115 Duchy   131 Silver   5 Copper   1 END
    12:   396 Province   365 Gold   131 Duchy   99 Silver   7 Estate   1 Copper   1 END
    13:   370 Province   17 Gold   471 Duchy   73 Silver   64 Estate   2 Copper   3 END
    14:   374 Province   2 Gold   491 Duchy   31 Silver   88 Estate   3 Copper   10 END
    15:   403 Province   18 Gold   447 Duchy   40 Silver   71 Estate   4 Copper   4 END
    16:   327 Province   52 Gold   432 Duchy   50 Silver   78 Estate   6 Copper
    17:   213 Province   25 Gold   411 Duchy   12 Silver   155 Estate   17 Copper   9 END
    18:   150 Province   39 Gold   211 Duchy   13 Silver   264 Estate   18 Copper   5 END
    19:   135 Province   48 Gold   91 Duchy   29 Silver   228 Estate   29 Copper   13 END
    20:   102 Province   27 Gold   26 Duchy   46 Silver   208 Estate   36 Copper   17 END
    21:   77 Province   27 Gold   6 Duchy   23 Silver   187 Estate   31 Copper   15 END
    22:   39 Province   22 Gold   25 Silver   139 Estate   36 Copper   20 END
    23:   35 Province   15 Gold   29 Silver   79 Estate   46 Copper   26 END
    24:   28 Province   16 Gold   30 Silver   36 Estate   46 Copper   31 END
    25:   23 Province   16 Gold   22 Silver   14 Estate   38 Copper   37 END
    26:   20 Province   9 Gold   19 Silver   8 Estate   36 Copper   29 END
    27:   10 Province   10 Gold   14 Silver   3 Estate   28 Copper   21 END
    28:   11 Province   7 Gold   11 Silver   21 Copper   21 END
    29:   6 Province   10 Gold   7 Silver   13 Copper   12 END
    30:   2 Province   5 Gold   9 Silver   9 Copper   9 END
    31:   2 Province   3 Gold   2 Silver   7 Copper   10 END
    32:   1 Province   4 Gold   1 Silver   5 Copper   9 END
    33:   2 Province   2 Gold   1 Silver   5 Copper   7 END
    34:   3 Gold   1 Silver   3 Copper   5 END
    35:   2 Province   1 Gold   1 Copper   5 END
    36:   1 Gold   1 Copper   3 END
    37:   1 Gold   1 Copper   1 END
    38:   1 Gold   1 Copper
    39:   1 Copper   1 END
    40:   1 Gold
    Sum   4019 Province   4254 Gold   2883 Duchy   6367 Silver   1629 Estate   580 Copper   445 END
    Visited 907 states

LinearRankStrategy    wins 467.0    suicides 131    fitness 38952    game len 18,17,19 (14 - 41)
  buys:
     1:   923 Silver   77 Copper
     2:   919 Silver   81 Copper
     3:   207 Gold   769 Silver   24 Copper
     4:   188 Gold   786 Silver   26 Copper
     5:   392 Gold   584 Silver   24 Copper
     6:   526 Gold   462 Silver   12 Copper
     7:   129 Province   443 Gold   423 Silver   5 Copper
     8:   274 Province   452 Gold   172 Duchy   95 Silver   7 Copper
     9:   272 Province   419 Gold   198 Duchy   105 Silver   6 Copper
    10:   320 Province   370 Gold   151 Duchy   143 Silver   14 Estate   2 Copper
    11:   346 Province   390 Gold   134 Duchy   122 Silver   6 Estate   2 Copper
    12:   352 Province   372 Gold   131 Duchy   135 Silver   9 Estate   1 Copper
    13:   357 Province   482 Duchy   145 Silver   13 Estate   3 Copper
    14:   340 Province   507 Duchy   151 Estate   2 Copper
    15:   349 Province   477 Duchy   4 Gold   157 Estate   4 Copper
    16:   322 Province   454 Duchy   5 Gold   153 Estate   3 Copper
    17:   221 Province   352 Duchy   28 Gold   221 Estate   10 Copper
    18:   153 Province   236 Duchy   65 Gold   236 Estate   15 Copper
    19:   121 Province   96 Duchy   116 Gold   233 Estate   12 Copper
    20:   98 Province   24 Duchy   140 Gold   189 Estate   12 Copper
    21:   73 Province   2 Duchy   108 Gold   171 Estate   3 Silver   12 Copper
    22:   58 Province   74 Gold   126 Estate   19 Silver   7 Copper
    23:   40 Province   70 Gold   72 Estate   38 Silver   19 Copper
    24:   25 Province   62 Gold   36 Estate   45 Silver   16 Copper
    25:   27 Province   41 Gold   19 Estate   48 Silver   18 Copper
    26:   23 Province   31 Gold   9 Estate   41 Silver   13 Copper
    27:   17 Province   30 Gold   2 Estate   35 Silver   9 Copper
    28:   19 Province   15 Gold   1 Estate   28 Silver   8 Copper
    29:   10 Province   21 Gold   17 Silver   4 Copper
    30:   14 Province   10 Gold   16 Silver
    31:   4 Province   9 Gold   7 Silver   4 Copper
    32:   3 Province   6 Gold   9 Silver   3 Copper
    33:   2 Province   4 Gold   8 Silver   2 Copper
    34:   2 Province   3 Gold   5 Silver   3 Copper
    35:   2 Province   6 Gold   2 Silver
    36:   3 Province   1 Gold   3 Silver   1 END
    37:   3 Province   1 Gold   1 Silver
    38:   2 Silver
    39:   1 Province   1 Gold
    40:   1 Silver
    41:   1 Province
    Sum   5939 Silver (-0.001)   4610 Gold (-0.024)   446 Copper (-0.000)   3416 Duchy (-0.047)   3981 Province (-0.082)   1818 Estate (-0.030)   1 END (-0.024)

Endgame¶

From here, I hope to start implementing reinforcement learning algorithms that use a neural network to estimate state and action values. I've found one example of deep RL for Dominion, which looks like a Stanford class project, but I'm not going to look at that in any detail for a while to avoid biasing my initial attempts.

There are also some other fairly obvious options to explore. For instance, with the existing code it's relatively trivial to see how 3- or 4-player strategy differs from the 2-player game.

It's also pretty straightforward to add in some of the action cards, to arrive at the full game of Dominion. Many of the action cards are simple, in the sense that the only decision is to play them or not. Others require secondary decisions, which means they require additional models or heuristics to implement. In the RL case, I think adding (simple) actions basically requires training a second model. Alternately, under a strategy that's close to Big Money, most hands will have 0 or 1 action cards in them, so one could just play actions at random for a simple baseline model. In the GA case, actions are particularly easy to handle, because you can use the same evolutionary mechanism to prioritize which actions to play.

In fact, it's already in the code, and I've had fun playing with it. But this write-up is too long already, so that will wait for a future post!