## Hypergeometric Distribution: How and Why

There’s been so much discussion recently (which was lead by Dayani here and I’ve been trying to contribute to) in the theorycraft community about how Chain Heal works and how it will affect High Tide. One of the problems that the theorycraft has run into so far is expressing exactly how the random selection process affects the average of Chain Heal’s healing. I suggested previously that a specific kind of distribution might help, and later showed that the this distribution provides a good estimate of real (simulated) Chain Heal and High Tide interaction. In addition, I’ve been talking at length about why we should be caring about statistical ideas in Warlords of Draenor – especially because it will evidently inform our understanding of the value of Stats.

We know that this distribution – known as the “Hypergeometric” distribution – gives a good description of how High Tide interacts with Riptides and Chain Heal, in both imaginary and “realistic” situations, and for any number of applied Riptides upon the raid. It’s pretty damn useful, so it’s fairly clear that we should be using it.

However, so far I haven’t actually explained (successfully) what this distribution is and how to use it! This post is dedicated to explaining (in part) what the Hypergeometric distribution does and how to actually use the thing. I think that this is important to getting it more widely used in the community.

The key thing to realise with Chain Heal is that the interaction with Riptide and High Tide mechanics is probabilistic, and in the terms of probability theory Chain Heal “samples without replacement”. That means that anyone who’se been healed by Chain Heal cannot be healed a second time. The corollary of this is that the probability of hitting a Riptide target on the next bounce changes each bounce because Chain Heal doesn’t go back on itself. This feeds directly into the number of High Tide targets which you get out. So the facts are as follows – we have a situation where;

• There are two possible outcomes on each try (RT and no-RT).
• One outcome on one try changes the probability of getting each result on the next try.
• We want to know the probability of getting a specific number of results in a specific number of tries with a specific population of possible outcomes (e.g. 3 Riptide hits out of 3 Chain heal jumps when 4 Riptides are placed on the raid).
• We want to know this regardless of the specific order of results.

That’s a tall order, but it just so happens that the Hypergeometric distribution matches all of these characteristics. It’s fantastic!

So that’s why it’s so powerful to us. To show you how to use it, let’s first get some groundwork made;

• We want to know a probability which I’ll denote p
• We want to know the probability p of a certain number of Riptide targets r being chained to; p(r).
• We want to know that given that we know the number of Chain Heal jumps in total, n=3; p(r|n).
• We want to know all that given that we know the number of players with Riptide active, M; p(r|M,n).
• We want to know all that given the number of players which Chain Heal can chain to, N; p(r|N,M,n).

Now Chain Heal has three jumps which we cannot control; n=3, so we want p(r|N,M,n=3). The resultant equation has to be a function of all of these variables. If that’s all you want or need to know, I made a table (actually a set of tables) from which you can look up the probabilities of a relevant set of parameters, which you can download here (or here if you want a form without any notes or labels for easier parsing).

Some important things to note for practical modelling of raids;

• If you want to model a real cast on 20 players, you want to calculate for N=19 because the initial hit will be on a person and remove them from the pool.
• If you’re modelling a real cast on a Riptide target and have 5 Riptides up, you want to calculate for M = 4 because the initial heal will remove one Riptide target from the Tide Pool.
• Never calculate for M>N or n>N, because you won’t get a sensible answer.

Another thing that we have to define is something called a Binomial Coefficient, which basically tells you how many combinations are possible to give a certain result from two different possible outcomes. It’s noted as;

Above, a and b are just arbitrary numbers. The ! symbol isn’t there for emphasis – it means “factorial” – “multiply all whole numbers between 1 and this number by each other”. So 3! = 1x2x3 = 6, 6! = 1x2x3x4x5x6 = 720, and so on.

Now, the Hypergeometric distribution basically looks for the number of combinations giving each result can happen alongside the number of combinations of removals that can happen. It’s a difficult thing to picture, really, but it works out like this;

So that’s telling you to take the binomial coefficient for several sets of values. It makes a long complicated equation like this;

OK so you don’t want to do ALL THOSE MULTIPLICATIONS. There’s a faster way, which I will skip the details of, which you can find in a pdf file here. I advise that if you want to implement your own calculation you either use that method, or Excel’s HYPGEOM.DIST() function, or Matlab’s hygepdf() function.

I hope to see people finding interesting uses for this distribution in future. Good luck!