# Poisson Binomial Distribution

The other day I found myself daydreaming about the
Poisson binomial distribution. As
data scientists, you should be especially interested in
this distribution as it gives the distribution of successes in *N*
Bernoulli trials where each trial has a (*potentially*) different
probability of success.

That is, the Poisson binomial random variable, *N*, is the sum of *n* independent and non-identically distributed
random indicators:

where

The probability mass function is:

## Uses

There are a number of different instances when this distribution comes in handy. For instance, imagine a naïve advertising engine that displays ads based on maximizing click-through rate. An implementation of such an engine might work by estimating, for a given viewer, the probabilities of each ad in the inventory and selecting the argmax ad. Let’s say that once the engine has shown a bunch of ads to a bunch of people, we want to estimate the distribution of the total number of click-throughs. This is a straightforward application of the Poisson binomial distribution. We just create a vector of the estimated conditional click-through probabilities associated with each displayed ad and pass the vector as the parameter to Poisson binomial distribution.

Another use might be in modeling the converstation dynamics of online dating. Imagine a user on a dating site
receives *N* dating suggestions, and we want to model the distribution of the number of ensuing conversations. This
is exactly the PMF of the Poisson binomial distribution.

## An Example of the Naïve Exact Method

Let’s look at the exact computation on three events *A*, *B*, and *C*, with corresponding probabilities
, , and .

We can define the Poisson binomial as and write down equations for the four possible values of the PMF, , for :

It should be readily apparent that the number of additive terms in the formula for is .
For more information, look at the binomial theorem. The number of
multiplicative terms per additive terms is *n*, and the number of subtractions per additive term is . So,
the total number of operations is:

To verify this, we can write some quick Scala code:

1
2
3
4
5
6
7
8
9
10

// Scala Code
import scala.language.postfixOps
implicit class Factorial(val n: Long) extends AnyVal {
def ! = {
def h(i: Long, x: Long): Long =
if (i < 2) x else h(i - 1, x * i)
h(n, 1)
}
def choose(k: Long) = (n!) / (((n - k)!) * (k!))
}

And look at the Scala REPL session:

1
2
3
4
5
6
7
8
9
10
11

scala> val n = 3
n: Int = 3
scala> 0 to 3 map(k => n choose k) toList
res0: List[Long] = List(1, 3, 3, 1)
scala> 0 to 3 map(k => n - k) toList
res1: List[Int] = List(3, 2, 1, 0)
scala> 0 to 3 map { k => (2*n - k)*(n choose k) - 1 } toList
res2: List[Long] = List(5, 14, 11, 2)

*Confirmed!* But so what? We did {5, 14, 11, 2} arithmetic operations. My processor can do that in nanoseconds.

But what if we try to calculate the distribution of clicks in 100 ad views. How many arithmetic operations would
that take?

That would take **190,147,590,034,234,410,224,505,480,806,299** (*1.90 × 10 ^{32}*) arithmetic operations!
See here for
proof. Let’s say that we had a laptop that could do a petaflop (

*That’s faster than any laptop today*).

*On that laptop*, it would take 6 billion years to compute.

*There’s got to be a better way!*Otherwise, this distribution would be strictly academic.

## A Better Way!

As you may have noticed from above, there is some repeated work that can be
memoized. Additionally, factoring can help a lot. If we transform the
probabilities to probability ratios, we can omit multiplications for the probabilities of negative events. Then we
just have to normalize, and we can compute the normalizing constant onc e for the entire
PMF.
Arthur Dempster, of
EM fame, divised an algorithm in
*Weighted finite population sampling to maximize entropy*
(1994) to efficiently compute the Poisson binomial distribution. Here’s my implementation.

### Code

#### Method 1

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73

// Scala Code
/**
* Based on Dempster's algorithm (Method 1), recapitulated in:
*
* Chen, Sean X.; Liu, Jun S.
* Statistical applications of the Poisson-binomial and conditional
* Bernoulli distributions.
* Statist. Sinica 7 (1997), no. 4, 875–892.
*
* @author R.M.Deak
*/
object PoissonBinomial {
def prob(pr: Seq[Double], k: Int): Double = {
val rM = new Array[Double](k + 1)
rM(0) = 1d
prob(pr, k, rM, new Array[Double](k+1))
}
def pmf(pr: Seq[Double]): IndexedSeq[Double] = {
val n = pr.size
val rM = new Array[Double](n + 1)
rM(0) = 1d
val tM = new Array[Double](n + 1)
val _w = w(pr)
val z = _w.foldLeft(1d)((p, w) => p / (1d + w))
0 to n map(k => prob(_w, k, rM, tM, z))
}
def cdf(pr: Seq[Double]): IndexedSeq[Double] =
pmf(pr).scanLeft(0d)(_+_).tail
def prob(pr: Seq[Double],
k: Int,
rM: Array[Double],
tM: Array[Double]): Double = {
val _w = w(pr)
val z = _w.foldLeft(1d)((p, w) => p / (1d + w))
prob(_w, k, rM, tM, z)
}
def prob(w: IndexedSeq[Double],
k: Int,
rM: Array[Double],
tM: Array[Double],
z: Double): Double =
z * r(w, k, rM, tM)
def w(pr: Seq[Double]): IndexedSeq[Double] =
pr.map(p => p / (1d - p)).toIndexedSeq
def t(w: IndexedSeq[Double], i: Int): Double =
w.foldLeft(0d)((s, v) => s + math.pow(v, i))
def r(w: IndexedSeq[Double],
k: Int,
rM: Array[Double],
tM: Array[Double]): Double = {
if (0 != rM(k)) rM(k)
else {
val rValue = one / k * (1 to k).foldLeft(0d)((s, i) => {
val neg = if (0 == i % 2) -1d else 1d
val tVal = if (0 != tM(i)) tM(i)
else { val tmp = t(w, i); tM(i) = tmp; tmp }
val rVal = r(w, k - i, rM, tM)
s + neg * tVal * rVal
})
rM(k) = rValue
rValue
}
}
}

#### Method 2

Use this one:

**Update**: I now recommend a non-recursive, space variant found in Poisson Binomial Revisited.

**Update**: I now recommend a non-recursive, space variant found in

I wrote a java variant of the “*Method 1*” algorithm a few years ago but only used it on a small number of events.
When testing the code for this article, I noticed a few things, namely the output of the algorithm, on a large number
of events, violates the first two probability axioms, *and not
by just a little!* The numerical instability of this algorithm has been written about (cf. §2.5 Hong, 2011).
The reasons for this are two-fold. The first is that the algorithm employs an
alternating series and the second is that the algorithm raises
the probability ratios to a large exponent, . This
can potentially cause the result to be very large, or very close to zero, depending on whether the numerator or
denominator dominates. Each of these issues lead to numerical instability.

Noticing this, I implemented the Method 2 in (Chen, et al. 1997). This is much more stable but has to do some additional arithmetic operations and uses auxiliary space for memoization rather than the used in Method 1. But if we desire an exact method, correctness should trump both speed and memory usage. So here’s the code for the Method 2.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50

/**
* Based on Method 2, recapitulated in:
*
* Chen, Sean X.; Liu, Jun S.
* Statistical applications of the Poisson-binomial and conditional
* Bernoulli distributions.
* Statist. Sinica 7 (1997), no. 4, 875–892.
*
* @author R.M.Deak
*/
object PoissonBinomialMethod2BigDecimal {
val negOne = BigDecimal(-1)
val zero = BigDecimal(0)
val one = BigDecimal(1)
def pmf(pr: Seq[Double]) = {
val _w = w(pr)
val z = _w.foldLeft(one)((p, w) => p / (one + w))
val _m = m(pr.size, pr.size)
pr.indices foreach { k => R(_w, _m, k, _m(k).length - 1) }
z +: _m.map(z * _.last)
}
def cdf(pr: Seq[Double]) = pmf(pr).scanLeft(zero)(_+_).tail
def prob(pr: Seq[Double], k: Int) = {
val _w = w(pr)
val z = _w.foldLeft(one)((p, w) => p / (one + w))
val _m = m(pr.size, k)
if (k == 0) z
else z * R(_w, _m, k - 1, _m(k - 1).length - 1)
}
def m(n: Int, k: Int) = 0 until k map (i => Array.fill(n-i)(zero))
def R(w: IndexedSeq[BigDecimal],
m: IndexedSeq[Array[BigDecimal]],
r: Int,
c: Int): BigDecimal =
if (r == -1) one
else if (c == -1) zero
else if (m(r)(c) != zero) m(r)(c)
else {
m(r)(c) = R(w, m, r, c - 1) + w(c+r) * R(w, m, r - 1, c)
m(r)(c)
}
def w(pr: Seq[Double]) =
pr.map(v => {val p = BigDecimal(v); p / (one - p)}).toIndexedSeq
}

### Results

For Method 1 Calculating *PoissonBinomial*(*50; [100 0.25 probabilities]*) takes 13ms on my
Mid-2010 MacBook Pro. That’s
better. We can work with that. One of the problems is that this algorithm is
recursive, but not tail recursive
so it has a potential stack overflow when calculating for a large
number of events.

Calculating the entire PMF using Method 2 takes about 7ms on my Mid-2010 MacBook Pro. Since Method 2 is also not tail recursive, it also has the potential to stack overflow.

### Other Algorithms

There are plenty of other algorithms out there that calculate the exact Poisson binomial distribution. For more information, see papers sited in the references section.

## Approximations

When *n* is rather large, there are a number of ways to approximate the distributions including:

- The Poisson approximation method
. This only works when
*μ*(or*λ*in the wikipedia article) is small. - The normal approximation with
continuity correction
, where is the
CDF of the
standard normal distribution
and the first two moments are:
- A refined normal approximation from A. Yu. Volkova (1996).

### Refined Normal Approximation

As with the normal approximation, listed above, the first three moments are:

and the PDF and CDF of the standard normal distribution are and , respectively.

Given these definitions, we compute the refined normal approximation as

where is defined as

I was thinking, if we’re going to make this fast, let’s add a normal approximation for the
CDF (*Not that there’s
really a choice since* erf *does not have an
analytical solution*). So I looked around and found
*High Accurate Simple Approximation of Normal Distribution Integral* (2012)
with an approximation of the error function as:

The CDF of the standard normal distribution is:

Since the PDF of the standard normal distribution has an analytical solution, we can easily compute it:

### RNA Code

So we are now equipped with all the tools necessary to create an implementation (which is probably still a little buggy).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39

// Scala Code
object PoissonBinomialApprox {
import scala.math._
private[this] val sqrt2 = sqrt(2)
private[this] val sqrt2pi = sqrt(2*Pi)
def prob(p: Seq[Double], k: Int): Double = {
val m = mu(p)
val s = sigma(p)
val ga = gamma(p, s)
g(m, s, ga, k)
}
def pmf(p: Seq[Double]): IndexedSeq[Double] = {
val m = mu(p)
val s = sigma(p)
val ga = gamma(p, s)
0 to p.size map (k => rna(m, s, ga, k))
}
def cdf(p: Seq[Double]): IndexedSeq[Double] =
pmf(p).scanLeft(0d)(_+_).tail
def mu(p: Seq[Double]) = p.sum
def sigma(p: Seq[Double]) =
sqrt(p.foldLeft(0d)((s, x) => s + x*(1-x)))
def gamma(p: Seq[Double], s: Double) =
pow(s, -3) * p.foldLeft(0d)((s, x) => s + x * (1-x)*(1-2*x))
def erf(x: Double) =
tanh(77*x/75 + 116/25*tanh(147*x/73 - 76d/7 * tanh(51*x/278)))
def normalCdf(x: Double, m: Double, s: Double) =
0.5 * (1 + erf((x - m)/(sqrt2*s)))
def normalPdf(x: Double, m: Double, s: Double) =
1d/(s*sqrt2pi)*exp(-pow(x - m,2)/(2*s*s))
def g(x: Double, m: Double, s: Double, ga: Double) =
(normalCdf(x, m, s) + ga * (1-x*x) * normalPdf(x, m, s)) / 6d
def rna(m: Double, s: Double, ga: Double, k: Int) =
g((k + 0.5 - m) / s, m, s, ga)
}

## Remarks

The Poisson binomial distribution has a lot of applications. I’ve given a couple of implementations to calculate the
distribution exactly (up to numerical precision errors). Make sure to use Method 2 if you use the
(*MIT Licensed*) code. I also gave the equations and some untested code for the
refined normal approximation. If you find any errors, let me know and I’ll work on it.

## License

The above code is released under the MIT License, Copyright (c) 2015 Ryan Deak.

## References

- Chen, Sean X., and Jun S. Liu. “Statistical applications of the Poisson-binomial and conditional Bernoulli distributions.” Statistica Sinica 7.4 (1997): 875-892.
- Hong, Yili. “On computing the distribution function for the Poisson binomial distribution.” Computational Statistics & Data Analysis 59 (2013): 41-51.
- Vazquez-Leal, Hector, et al. “High accurate simple approximation of normal distribution integral.” Mathematical problems in engineering 2012 (2012).