lastlegume

The site of all of the projects made by lastlegume.

11 July 2024

Prize Card Probability in the Pokémon TCG

by lastlegume


Update: For a more general overview, see my companion video for the post that glosses over a lot of the details in favor of a more general probability lesson.

When I first saw John Kettler prize both of his oddish in the semifinals of NAIC in 2017, I remember thinking that Kettler was the most unlucky Pokémon TCG player alive. Only a few months earlier, he had prized three out of his four copies of rowlet in the finals of a regional, so his luck on stream seemed really terrible.

But how unlucky really was he? Was this just some bad luck or was this actually somewhat likely?

In other words, what is the probability that a card is prized?

In this post, I'm going to explain my process to calculating these probabilities (with the R script I used to calculate them), describe why this problem is so complex, and provide some tools to play around with.

Prerequisite Knowledge

You will need to understand the basic rules of probability (multiplication rule, addition rule, conditional probability, etc.) and the concept of a combination (nCk or ( n k ) ). It would also help to know what a probability distribution is, but this is not required to understand the article.

Basic Pokémon TCG Rules

While most people reading this already know the rules of the Pokémon TCG, I'll take a moment to lay out all of the necessary rules for this article. In the Pokémon TCG, you play with a 60 card deck, with up to 4 copies of each card except for basic energy cards, which you can have up to 59 copies of. To start the game, you draw hands of 7 cards until your hand has at least 1 basic, after which you play down as many basics from your hand as you like, with one going to the active position, and the rest going onto the bench. Each time a player has a hand without a basic pokémon, it is known as a mulligan. After each player has drawn their starting hand of 7 cards, they set the top 6 cards of their deck to the side. These are known as the prize cards, and, in this article, we are trying to find the probability that some number of copies of a given card will end up in those 6 cards. For each mulligan, the opposing player may draw an extra card, but this occurs after the prize cards are set out and is therefore irrelevant for this problem.

The Simplest Case (1 Copy of the Card)

The simplest possible case is the case of having only one copy of a card (referred to henceforth as the "target card") and seeing whether it is prized. To do this we can decompose the problem into two steps: is the target card in the starting hand, and, if not, is the target in the prizes? [1]. If the target card is in the hand, it cannot be in the prize cards, so the probability that the card is prized is therefore P(Not in hand) · P (In prizes|Not in hand). Using the formula for conditional probability, we can see that this expression is equal to P(In prizes AND Not in hand) or P(In prizes ∩ Not in hand). This detail isn't super important right now with only one copy of the target card, but when we calculate prizing probabilities with more than one copy, this becomes relevant. Now, having come up with this equation, we can solve for each component individually.

P(Not In Hand)

The probability of not being in hand is relatively simple to calculate when there is only one copy in deck. There are a lot of ways to do it, but one way is to imagine drawing each of the 7 cards in hand as an independent event. Therefore, drawing the target card as the first card has a probability of 1 60 , so the probability of not drawing it as the first card is 59 60 . Similarly, for the next card, the probability of not drawing the target card is 58 59 . To not draw the target card between all 7 cards, each individual card must not be the target card, so we multiply all 7 probabilities that the drawn card is not the target, giving us 59 60 · 58 59 · 57 58 · 56 57 · 55 56 · 54 55 · 53 54 = 53 60 .

Alternatively, we could think about every possible hand. We can use combinations to find the total number of possiblilies and the number that have the target card in them, which tells us that there are ( 60 7 ) total combinations of 7 card hands to draw, and there are ( 59 6 ) combinations of hands that have the target card in them. Therefore, the probability of having the target card in hand is ( 59 6 ) ( 60 7 ) , and the probability of not having it in hand is therefore 1 - ( 59 6 ) ( 60 7 ) = 53 60 .

It's important to understand why this works because this idea will be used more later. Though (in my opinion) the first method is more intuitive, this method of using combinations is easily scalable to different cases. The denominator, ( 60 7 ) , means by definition the number of possible combinations of 7 cards you can draw from a 60 card deck, so this is the total number of possibilies. Probability can also be expressed as the number of successes divided by the total number, so this number (the total number of possible hands) is vital to finding the probability. The numerator is slightly more complex. It is the number of hands containing the card and therefore consists of the combinations of hands that contain the target card. Since order does not matter in this case, each hand with the target card is composed of the target card and 6 other cards; therefore, the number of combinations with the card in hand is 1 · ( 59 6 ) because any hand of the target card and 6 cards from the remaining 59 is a hand with the target card. Similarly, every hand without the target card is simply ( 59 7 ) , which is the number of combinations of 7 cards possible from the remaining 59 cards. Because these are the only two cases, then they should add to the total number of possible cases, and they do: ( 59 7 ) + ( 59 6 ) = ( 60 7 ) = 386206920 . While this might seem needlessly complex for this simple case of having only 1 copy of the card, this process is scalable and can be used for more complex cases than this.

P(In Prizes|Not in Hand)

In the case that the target card is not in hand, the probability that it is in the prize cards is also something we need to determine. While we could use the first method of multiplying the individual probabilities of not having the card in the prizes, it's again probably easier to use combinations to find the number of cases where it is prized. The total number of cases is ( 53 6 ) because there are 53 cards in deck after drawing the starting hand and we are choosing 6 cards to be prize cards [2]. The number of combinations where the card is in the prize cards is 1 · ( 52 5 ) using similar reasoning from the previous part. Therefore, the probability that a card is prized when the card is not in hand is ( 52 5 ) ( 53 6 ) = 6 53 .

The hypergeometric distribution

The hypergeometric distribution is a probability distribution that describes this process we have been doing of finding the probability of drawing x target cards from a deck of size m+n with m target cards and n other cards in a hand of size k [3]. The formula of the hypergeometric distribution is as follows:

( m x ) · ( n k - x ) ( m + n k )

This is exactly what we have been doing with the combinations in the previous parts. To show this, let's run our previous calculations through this equation.

For the probability of being in hand, m + n = 60 , m = 1 (m is the number of target cards in deck), n = 59 (n is the number of cards in deck that are not the target card), k = 7 (k is the number of cards drawn; in this case, the hand size), and x = 0 (x is the number of the target card in hand: we want 0 in hand). Plugging these numbers in gives us:

( 1 0 ) · ( 59 7 - 0 ) ( 60 7 ) = 1 · ( 59 7 ) ( 60 7 ) = 53 60

This is the same probability as the one we calculated previously. If you apply this same reasoning to the probability of being prized, you'll find that it is also the same. From this point forward, we are going to only be using hypergeometric distributions to simplify the math and increase readability, but the hypergeometric distribution is simply a different name for the math that we were already doing in the previous section. Similarly, I'll use the notation from R (specifically dhyper(x, m, n, k)) to write any probabilities that are found using a hypergeometric distribution, so the computation from the previous paragraph would be represented as dhyper(0,1,59,7).

The Final Probability

To get the actual probability that a card with only a single copy is prized, we simply multiply the two values we just found, giving us

53 60 · 6 53 = 1 10

Therefore, the probability of having the only copy of a card prized is exactly 10%. So, for example, you can expect your ACE SPEC card to be prized in 10% of your games. If you want to confirm this, you can use the simulation below to check, which will shuffle the deck and check how many of the target card end up in the prize cards in each trial, displaying its results in the histogram below. It does this by simply shuffling the deck and checking the number of copies of the target card between indexes 7 and 13 in the deck list since it assumes we never have to redraw our hand.

Scaling this method

From here, we can repeat this method for every possible number of copies of the target card from 1 to 59 and we're done. Simply find the probabilities of 0 to 7 copies of the target card being in hand and then find the probability that the remaining copies end up in the prizes using the hypergeometric distribution. I did this with R code, but it's possible (though tedious) to do by hand as well.

Example with Two Copies

Let's walk through the example of having 2 copies of the target card in deck to demonstrate how scaling this method would work. First, there are 3 possible states based on the number of copies of the target card in the starting hand: having 0, 1, or 2 copies of the card in hand. For each of these cases, we check the probabilities of 0, 1, or 2 copies of the target card being prized given that 0, 1, or 2 copies are in hand.

Probabilities for Two Copies

If we add together the probabilities of all possible ways to have 0, 1, or 2 copies prized, then we will arrive at the actual probability of having that number prized.

Though this does look daunting, keep in mind that we are doing the same thing as before, just with a few more cases. Take some time to read through the equations to see if you understand why this works. If you want some more practice, try to come up with the method to finding the probability of having all 3 copies of a target card prized.

Answer P(0 in hand) · P(3 in prizes|0 in hand) = d h y p e r ( 0 , 3 , 57 , 7 ) · d h y p e r ( 3 , 3 , 50 , 6 ) = 0.0005844535 = 0.05844535 %

Flaws with this Method

So, if you are comfortable with probability or just read the first footnote, you will have already realized that the method presented above has some flaws. Firstly, most of the math is unnecessary because we don't actually need to know what is in the hand. If we never have to redraw the hand (i.e. every possible starting hand is a valid starting hand) and our deck is perfectly shuffled and completely random, then the order of drawing the hand and prize cards doesn't matter. For a better explanation of why this is true, read this article on the topic, which I read after writing all of my code and showed me that my method was overly complex. So, instead of painstakingly finding the probabilities for each hand arrangement, you can find the probability of drawing the target card in 6 random cards (for the example with 2 copies, the probabilities are dhyper(0,2,58,6), dhyper(1,2,58,6), and dhyper(2,2,58,6) for 0, 1, and 2 prized respectively).

However, astute readers may have caught that the conditions to do this aren't met: specifically, the condition about never having to redraw the starting hand. In the Pokémon TCG, if there are no basics in the starting hand, then the hand is not valid, and the hand must be redrawn. Therefore, everything we've done so far is actually slightly off because we have not been taking into account the probability that a hand is a mulligan (no basics in hand). If we go back to the method we used to calculate these probabilities, we found the total number of hands and the total number of hands that have the desired number of the target card in them. However, what we really needed was the total number of valid hands and number of valid hands with the desired number of the target card, which is different. In the first case, a hand of 7 energy might be included in the total number of hands, but since this has no basics and is therefore not a valid hand, we need to ignore it to find the actual probability that the target is prized.

My first idea was to find the expected number of basics in hand, and, for each possible number of basics in hand, find the probabilities of drawing copies of the target card in 7 - b cards, where b was the number of basics in hand. Then, using these probabilities of target cards being in hand, I did the same method as described of using hypergeometric distributions to find the probabilities that the remaining copies end up in the prize cards. This method doesn't work though, and (somewhat embarrassingly) I'm not exactly sure what rule of probability I broke. Though this didn't work, it was an important part of my process to solving this problem, and I felt that it warranted inclusion for that reason.

The Correct Method

Given this major flaw, we need to think about more conditional probability. What we need is the probability of having some number of the target card in the starting hand given that the hand is valid. The reason we need this is that hands that are not valid (like 7 energy cards) can never be the starting hand, so outcomes such as these must be ignored to get the actual probabilities. The condition that the hand is valid ensures that the hand could be the starting hand, which is important since prize cards are only set out after a valid starting hand is drawn. If we can find these, we can do the same method of finding the probabilities of having some number of copies of the target card in the starting hand and multiply them by the probability of having copies of the target card in the prizes given that number in hand. So, how can we find these probabilities?

The probability of getting a valid hand is the probability that there is at least 1 basic in the starting hand. Therefore, we can find this probability with 1-dhyper(0,b,60-b,7), where b is the number of basics. dhyper(0,b,60-b,7) is the probability that there are 0 basics in the starting hand, and subtracting it from 1 gives us the probability that a hand is valid. If we think back to the formula for conditional probability ( P ( A | B ) = P ( A B ) P ( B ) ), we can see that we need the probability of getting a valid hand and a hand with some number of copies of the target card. Alternatively, we could also find the probability of getting a specific valid hand arrangment (1 basic, 0 targets, 6 other; 1 basic, 1 target, 5 other;...; 7 basics, 0 targets, 0 others) and divide that probability by the probability of getting a valid hand to find the conditional probability of having that hand arrangement given that the hand is valid. So, the question now is how do we find the probabilities of getting these specific hand arrangements?

The Multivariate Hypergeometric Distribution

As the name suggests, this is a probability distribution for the probability of having some number of many different types of cards in a set of cards. The formula is as such

i = 1 m ( n i x i ) ( N k )

Though it looks daunting, it is fundamentally the same as what we have been doing this whole time. The capital Pi symbol (∏) simply means a product, and if we expand it into the form for two different types, we will find that it's the exact same as the hypergeometric distribution presented earlier. In this case, we can use it to find the probability of getting each arrangement of cards that has a basic. In our case, there are 3 different types of cards: basics, targets, and other cards. Therefore, we want to find the probability that the starting hand has at least one basic and some number of copies of the target card. Then, we divide this probability by the probability of drawing a valid hand to find the conditional probability of drawing that as your final starting hand. In R, we can do this with a single function (dmvhyper from extraDistr).

Edge Cases

If we really want to find the exact answer, it's important to think about edge cases as well. What if (this is unlikely but technically possible) a deck has only one basic, and we want to know whether that card is prized? The probability that it is prized is 0 because it has to be in the starting hand; for this reason, when the target card is a basic, the calculations to find the probability are slightly different. Therefore, we will have to do all of the math twice: once for when the target card is a basic, and once when it is not.

To change it to work when the target card is a basic, all we need to do is also find the probabilities that the starting hand has 0 basics with at least 1 copy of the target card since these are also valid starting hands. Additionally, the probability that a hand is valid is 1-dhyper(0,b+t,60-b-t,7), where t is the number of copies of the target card in deck and b is the number of other basics in deck.

I came up with this method myself, and I'm not very good at math, so it's possible that there is a more efficient way to find these probabilities. However, I am confident that these numbers are correct, and you can verify them yourself here.

The Correct Probabilities

After this discussion, we finally understand how to calculate the final probabilites. So, to review, let's go over the steps to find the actual probabilities[4]:

If we follow these steps, then we will arrive at the true probabilities for every possible number of basics and copies of the target card. The only change we need to make when the target card is a basic is also find the probabilities of having 0 basics and at least 1 copy of the target card since these are also valid hand combinations.

Now that we know the math, here are the probabilities, calculated by my R script. Simply select the scenario you want to see, and the probabilities will appear in a formatted table. On the way, I also generated a bit of extra data about the expected number of mulligans for every number of basics, the probabilities of having any number of basics in hand based on the number of basics, the expected number of copies prized, and the probabilities of having copies of the target card in hand.

Please note that the number of basics does not include the target card if the target card is a basic (so if you have 12 total basics and 4 rowlet (like John Kettler did), look at the 8 other basics sheet to find the probability that the rowlets are prized).

You can also test these with my simulation, which can be configured to the number of copies of the card that you have. If you set the number of basics to 0 and the target card to not a basic, then the simulation will not consider basics and mulligans like the naive method presented earlier. I have also made a tool to find these probabilities when given a decklist.






Conclusion

So, how unlucky was John Kettler? The probability that he prized both oddish was 0.007784993, and the probability of prizing 3 rowlet was 0.001980315 (P(prizing 3 or more) was almost exactly 1/500). Given these probabilities, he was quite unlucky but not absurdly so.

Nothing I've presented here is particularly revolutionary, but I hope that you at least found it interesting. Before starting on this project (which spanned over a month for a variety of reasons), I didn't even think about the number of basics, and even now, I still find it a bit surprising how much the number of basics affects the probability of being prized. Unfortunately, this means that the probability of prizing something like an ACE SPEC is not actually exactly 10%, but it's probably close enough.

More than just these specific cases, I hope that this serves as a lesson of how even basic probability skills can be used to solve relatively complex problems. Even though I cheated a bit using the hypergeometric distribution and R, nothing in this post is especially complex; instead, just by applying the principles of probability, we've been able to find some useful probabilities. I hope you enjoyed reading! An explanation of the R code will be released shortly (Update: The explanation can be found here.), but you can also view the raw R markdown here.

Verify

Check my math with the press of a button. Each time you press this button, the program will run a simulation for each row of each csv file with the same parameters as that row and find the difference between its probabilities and the probabilities of the csv file. Since these simulations are random, some results will usually be somewhat red, but as the number of trials increases, the error decreases, suggesting that these probabilities are correct. The errors for the second half (the probabilities given that the card is a basic) are more variable because there are fewer simulations being run given that the maximum copies of the target card is only 4. With the first half, the number of copies goes up to 59 because the target could be basic energy, so the errors are lower because of the greater number of simulations being run.


Companion Video

[1] If you've read this article or its errata, you'll know that this process is actually overly complex, but I am keeping it both because it is the way I approached the problem, and because this process is important for the method used later in the post. Though I think these articles are amazing, I believe that the result in this case is slightly off as explained later in the article.

[2] There is an argument to be made here about how the order of prize cards matters because many players start from the bottom and work their way up. Therefore, for many players, having a prize in the top 2 prize cards is significantly different than having a card in the bottom 2 because of the ability to get the bottom or middle prizes back over the course of the game. In spite of this pattern, I've chosen to ignore the order of prizes both because it is way too inconsistent among players (for example, do they start placing cards from the top of their deck to the top of the prizes or the bottom) and it complicates this already difficult problem even more. If order were to matter, we would need to be using permuations instead of combinations, so this math would be quite different.

[3] I am using the variable names in R's hypergeometric distribution functions (specifically the dhyper function) in my description of the function to assist in reading the R code and because I like the R variable names more than wikipedia's variables (N, n, K, and k). Technically, x is a vector of quantiles for the number of target cards drawn and can therefore be a vector/list of numbers, but for the sake of simplicity, I'll always use only one number as x in this blog post; however, in my R code, I usually ended up using a vector like 0:7 as x to get all 8 probabilities at once, so keep that in mind when reading the code. Additionally, when I write equations using dhyper, you can plug in those equations into R to solve them if you wish.

[4] Addendum 8/12/24: I didn't notice at the time, but there is actually a simpler method when there is only one basic in the deck. Since every valid hand has this card in the top 6 cards, we can just treat the other 59 cards as being truly randomized and use dhyper(0:1, m, 59-m, 6) to find the probabilities that they are prized (m is the total number of copies of the target card). However, this doesn't work for higher values of m because the number of basics in the starting hand is not constant and cannot be assumed to be any value like this without further calculations. I include this just as another reminder that the method here might not be the most efficient method, but I do believe that it is completely accurate.

tags: pokémon - tcg - math