A few years ago, just before going on a vacation, I came across some code in a codebase similar to this:

I stared at it for a second or two and had two thoughts:

  1. This is an \(O(K)\) time algorithm.
  2. This can be computed (at least approximately) in just \(O(1)\) time.

To battle boredom on the plane, I confirmed this with the help of a few free tools: Wikipedia and Wolfram Alpha. With these tools and some basic algebra I found a pretty decent constant-time approximation capable of replacing our \(O(K)\) code.

The purpose of this post isn’t to show some revolutionary new idea or result, but to share the process of tracking down and optimizing certain types of algorithmic inefficiencies in a codebase.

Steps

  1. Formalize the Inefficiency
  2. Use a C.A.S.
  3. Research
  4. Update and Simplify
  5. Repeat (if Necessary)

Formalize the Inefficiency

Inside the \(something\) function, the \(h\) function does all of the computation (recursively). \(something\) merely calls the \(h\) function with the initial parameters: \(n\), \(k\) and \(0\).

Notice that inside the function \(h\), the \(n_i\) and \(k_i\) variables are updated via the same function, the “decrement by 1” function. This is an indication that \(n_i\) and \(k_i\) can be written in terms of each other, given the initial conditions, \(n\) and \(k\). So, the \(something\) function can be rewritten as:

In this variant, the \(n_i\) variable is removed and the \(k_i\) variable becomes the \(i\) variable. Notice how the calculation for \(p\) is the only place in the old version that \(n_i\) is used. Therefore, the \(denom\) function in the new version needs to be rewritten in terms of \(i\) and the initial conditions \(n\) and \(k\), and it needs to work like \(n_i\) in the old version. In the first loop of the old version the initial values of \(n_i\) and \(k_i\) are \(n\) and \(k\), respectively. Therefore, \(n_i - k_i = n - k\). By rearranging, and changing \(k_i\) to \(i\), it is necessarily the case that \(denom(n, k, i) = n - k + i\).

Finally, when the indices are inverted, it’s easy to see the formalism take shape. The code looks like:

which can be specified mathematically as:

\[something\left( n,k \right) =\frac { 1 }{ k } \sum _{ i=1 }^{ k }{ \frac { { i } }{ n-k+i } }\]

Use a C.A.S.

Given the formal definition, a computer algebra system can be used to simplify the equation. Wolfram Alpha will do the job just fine. The above definition can be transformed to the Wolfram Alpha input: 1/k * Sum[i/(n - k + i), i, 1, k].

On the page, Wolfram Alpha provides an exact result:

\[\frac { \left( k-n \right) { \psi }^{ \left( 0 \right) }\left( n+1 \right) +\left( n-k \right) { \psi }^{ \left( 0 \right) }\left( n-k+1 \right) +k }{ k }\]

Without manipulating \({ \psi }^{ \left( 0 \right) }\left( x \right)\) at all, some simplifications can be made via simple algebraic manipulations, leading to:

\[1-\frac { n-k }{ k } \left( { \psi }^{ \left( 0 \right) }\left( n+1 \right) -{ \psi }^{ \left( 0 \right) }\left( n-k+1 \right) \right)\]

Research

With an alternate form of \(something\) in hand, it’s time to do a little research. Wolfram Alpha provides a note stating that \({ \psi }^{ \left( n \right) }\left( x \right)\) is the nth derivative of the digamma function. \({ \psi }^{ \left( 0 \right) }\left( x \right) = \psi \left( x \right)\), since the zeroth derivative of a function is the function itself. So, it makes sense to search Wikipedia for the digamma function. The article states that for any \(x > 1/2\), the result of the digamma function lies in the interval:

\[{ \psi }\left( x \right) \in \left( \ln { \left( x-\frac { 1 }{ 2 } \right) , } \ln { x } \right)\]

and that “the exponential \({ \psi }\left( x \right)\) is approximately \(x − 1/2\) for large \(x\).” This means that, for large \(x\), \({ \psi }\left( x \right) \approx \ln \left(x − \frac { 1 }{ 2 }\right)\).

Update and Simplify

So, plugging in the information from Wikipedia into the output from Wolfram Alpha and simplifying, we get:

\[\begin{align*} something &= 1-\frac { n-k }{ k } \left( { \psi }^{ \left( 0 \right) }\left( n+1 \right) -{ \psi }^{ \left( 0 \right) }\left( n-k+1 \right) \right) \\ \Rightarrow \quad & \approx 1-\frac { n-k }{ k } \left( \ln { \left( n+1-\frac { 1 }{ 2 } \right) } -\ln { \left( n-k+1-\frac { 1 }{ 2 } \right) } \right) \\ \Leftrightarrow \quad & = 1-\frac { n-k }{ k } \left( \ln { \left( n+\frac { 1 }{ 2 } \right) } -\ln { \left( n-k+\frac { 1 }{ 2 } \right) } \right) \\ \Leftrightarrow \quad & = 1-\frac { n-k }{ k } \ln { \left( \frac { n+{ 1 }/{ 2 } }{ n-k+{ 1 }/{ 2 } } \right) } \end{align*}\]

Result

The result of all of this is a high-quality constant-time approximation that replaces the linear time operation. No special mathematical ability was necessary, just some simple algebraic manipulations. The real magic was provided by the tools (Wolfram Alpha and Wikipedia). The only part that required a little intuition was noticing that \(n_i\) and \(k_i\) were updated by the same function and could thus be rewritten in terms of each other and the initial conditions, \(n\) and \(k\). These types of series pop up a lot; sometimes it is a simple arithmetic series whose replacement is obvious. When it’s not as obvious, try to simplify the loops, and lean on tools like CAS tools to simplify the equations modeled by your code. Finding these constant time algorithmic speed ups is rather rewarding and may yield huge performance improvements in your codebase.

Finally, how good is the approximation? Try it out for yourself. All of the visualizations are interactive.

Exact vs Approximate Surface

Exact Approximation

Notice that the surface created by the approximation is virtually indistinguishable by visual inspection. To see just how good the function is, a difference plot is in order. The \(z\)-values are created by the function:

\[z\left( n, k \right) = -sign \left( A \left( n, k \right) - E \left( n, k \right) \right) \log_{10} \left( 10^{-300} + \left| A \left( n, k \right) - E \left( n, k \right) \right| \right)\]

where \(A \left( n, k \right)\) is the constant time approximation function and the \(E \left( n, k \right)\) is the exact function. \(\epsilon = 10^{-300}\) is added to the absolute value of the difference in order to avoid taking the log of zero. The magnitude of the \(z\)-values represent the number of digits of precision of the approximation and the sign represents whether the approximation overestimates ( + ) or underestimates ( - ).

Runtime

The ratio of runtime of the exact and the approximate algorithms is shown on a log scale. This plot shows that there is nearly a \(10^9\) speed up of the approximate over the exact algorithm’s runtime for \(n = 10^7\) and \(k = 10^7\).

Data

All benchmarking source data used to create these visualizations is available here: