The other day I found myself daydreaming about the Poisson binomial distribution. As data scientists, you should be especially interested in this distribution as it gives the distribution of successes in N Bernoulli trials where each trial has a (potentially) different probability of success.

That is, the Poisson binomial random variable, N, is the sum of n independent and non-identically distributed random indicators:

where

Uses

There are a number of different instances when this distribution comes in handy. For instance, imagine a naïve advertising engine that displays ads based on maximizing click-through rate. An implementation of such an engine might work by estimating, for a given viewer, the probabilities of each ad in the inventory and selecting the argmax ad. Let’s say that once the engine has shown a bunch of ads to a bunch of people, we want to estimate the distribution of the total number of click-throughs. This is a straightforward application of the Poisson binomial distribution. We just create a vector of the estimated conditional click-through probabilities associated with each displayed ad and pass the vector as the parameter to Poisson binomial distribution.

Another use might be in modeling the converstation dynamics of online dating. Imagine a user on a dating site receives N dating suggestions, and we want to model the distribution of the number of ensuing conversations. This is exactly the PMF of the Poisson binomial distribution.

An Example of the Naïve Exact Method

Let’s look at the exact computation on three events A, B, and C, with corresponding probabilities ${ p }_{ a }$, ${ p }_{ b }$, and ${ p }_{ c }$.

We can define the Poisson binomial as $PoissonBinomial\left( \left\{ { p }_{ a },{ p }_{ b },{ p }_{ c } \right\} \right)$ and write down equations for the four possible values of the PMF, $f\left(x\right)$, for $x\in \left\{ 1,2,3,4 \right\}$:

It should be readily apparent that the number of additive terms in the formula for $f\left(x\right)$ is $\left( \begin{matrix} n \\ x \end{matrix} \right)$. For more information, look at the binomial theorem. The number of multiplicative terms per additive terms is n, and the number of subtractions per additive term is $n - x$. So, the total number of operations is:

To verify this, we can write some quick Scala code:

And look at the Scala REPL session:

Confirmed! But so what? We did {5, 14, 11, 2} arithmetic operations. My processor can do that in nanoseconds.
But what if we try to calculate the distribution of clicks in 100 ad views. How many arithmetic operations would that take?

That would take 190,147,590,034,234,410,224,505,480,806,299 (1.90 × 1032) arithmetic operations! See here for proof. Let’s say that we had a laptop that could do a petaflop (That’s faster than any laptop today). On that laptop, it would take 6 billion years to compute. There’s got to be a better way! Otherwise, this distribution would be strictly academic.

A Better Way!

As you may have noticed from above, there is some repeated work that can be memoized. Additionally, factoring can help a lot. If we transform the probabilities to probability ratios, we can omit multiplications for the probabilities of negative events. Then we just have to normalize, and we can compute the normalizing constant onc e for the entire PMF. Arthur Dempster, of EM fame, divised an algorithm in Weighted finite population sampling to maximize entropy (1994) to efficiently compute the Poisson binomial distribution. Here’s my implementation.

Code

Use this one:

Update: I now recommend a non-recursive, $O\left( n \right)$ space variant found inPoisson Binomial Revisited.

I wrote a java variant of the “Method 1” algorithm a few years ago but only used it on a small number of events. When testing the code for this article, I noticed a few things, namely the output of the algorithm, on a large number of events, violates the first two probability axioms, and not by just a little! The numerical instability of this algorithm has been written about (cf. §2.5 Hong, 2011). The reasons for this are two-fold. The first is that the algorithm employs an alternating series and the second is that the algorithm raises the probability ratios to a large exponent, ${ \left( { p }/{ \left( 1-p \right) } \right) }^{ n }$. This can potentially cause the result to be very large, or very close to zero, depending on whether the numerator or denominator dominates. Each of these issues lead to numerical instability.

Noticing this, I implemented the Method 2 in (Chen, et al. 1997). This is much more stable but has to do some additional arithmetic operations and uses $O\left( { n }^{ 2 } \right)$ auxiliary space for memoization rather than the $O\left( n \right)$ used in Method 1. But if we desire an exact method, correctness should trump both speed and memory usage. So here’s the code for the Method 2.

Results

For Method 1 Calculating PoissonBinomial(50; [100 0.25 probabilities]) takes 13ms on my Mid-2010 MacBook Pro. That’s better. We can work with that. One of the problems is that this algorithm is recursive, but not tail recursive so it has a potential stack overflow when calculating for a large number of events.

Calculating the entire PMF using Method 2 takes about 7ms on my Mid-2010 MacBook Pro. Since Method 2 is also not tail recursive, it also has the potential to stack overflow.

Other Algorithms

There are plenty of other algorithms out there that calculate the exact Poisson binomial distribution. For more information, see papers sited in the references section.

Approximations

When n is rather large, there are a number of ways to approximate the distributions including:

• The Poisson approximation method $\left( \frac { { \mu }^{ k }{ e }^{ -\mu } }{ k! } \right)$. This only works when μ (or λ in the wikipedia article) is small.
• The normal approximation with continuity correction $\left( CDF=\Phi \left( \frac { k + 0.5 - \mu }{ \sigma } \right) \right)$, where $\Phi$ is the CDF of the standard normal distribution and the first two moments are:
• $\mu =\sum _{ i }^{ n }{ { p }_{ i } }$
• $\sigma ={ \left( \sum _{ i=1 }^{ n }{ { p }_{ i }\left( 1-{ p }_{ i } \right) } \right) }^{ \frac { 1 }{ 2 } }$
• A refined normal approximation from A. Yu. Volkova (1996).

Refined Normal Approximation

As with the normal approximation, listed above, the first three moments are:

• $\mu =\sum _{ i }^{ n }{ { p }_{ i } }$
• $\sigma ={ \left( \sum _{ i=1 }^{ n }{ { p }_{ i }\left( 1-{ p }_{ i } \right) } \right) }^{ \frac { 1 }{ 2 } }$
• $\gamma ={ \sigma }^{ -3 }\sum _{ i=1 }^{ n }{ { p }_{ i }\left( 1-{ p }_{ i } \right) \left( 1-2{ p }_{ i } \right) }$

and the PDF and CDF of the standard normal distribution are $\phi$ and $\Phi$, respectively.

Given these definitions, we compute the refined normal approximation as

where $G\left( x \right)$ is defined as

I was thinking, if we’re going to make this fast, let’s add a normal approximation for the CDF (Not that there’s really a choice since erf does not have an analytical solution). So I looked around and found High Accurate Simple Approximation of Normal Distribution Integral (2012) with an approximation of the error function as:

The CDF of the standard normal distribution is:

Since the PDF of the standard normal distribution has an analytical solution, we can easily compute it:

RNA Code

So we are now equipped with all the tools necessary to create an implementation (which is probably still a little buggy).

Remarks

The Poisson binomial distribution has a lot of applications. I’ve given a couple of implementations to calculate the distribution exactly (up to numerical precision errors). Make sure to use Method 2 if you use the (MIT Licensed) code. I also gave the equations and some untested code for the refined normal approximation. If you find any errors, let me know and I’ll work on it.