A few years ago, just before going on a vacation, I came across some code in a codebase similar to this:
I stared at it for a second or two and had two thoughts:
- This is an time algorithm.
- This can be computed (at least approximately) in just time.
To battle boredom on the plane, I confirmed this with the help of a few free tools: Wikipedia and Wolfram Alpha. With these tools and some basic algebra I found a pretty decent constant-time approximation capable of replacing our code.
The purpose of this post isn’t to show some revolutionary new idea or result, but to share the process of tracking down and optimizing certain types of algorithmic inefficiencies in a codebase.
- Formalize the Inefficiency
- Use a C.A.S.
- Update and Simplify
- Repeat (if Necessary)
Formalize the Inefficiency
Inside the function, the function does all of the computation (recursively). merely calls the function with the initial parameters: , and .
Notice that inside the function , the and variables are updated via the same function, the “decrement by 1” function. This is an indication that and can be written in terms of each other, given the initial conditions, and . So, the function can be rewritten as:
In this variant, the variable is removed and the variable becomes the variable. Notice how the calculation for is the only place in the old version that is used. Therefore, the function in the new version needs to be rewritten in terms of and the initial conditions and , and it needs to work like in the old version. In the first loop of the old version the initial values of and are and , respectively. Therefore, . By rearranging, and changing to , it is necessarily the case that .
Finally, when the indices are inverted, it’s easy to see the formalism take shape. The code looks like:
which can be specified mathematically as:
Use a C.A.S.
Given the formal definition, a computer algebra system can be used to simplify the equation. Wolfram Alpha will do the job just fine. The above definition can be transformed to the Wolfram Alpha input: 1/k * Sum[i/(n - k + i), i, 1, k].
On the page, Wolfram Alpha provides an exact result:
Without manipulating at all, some simplifications can be made via simple algebraic manipulations, leading to:
With an alternate form of in hand, it’s time to do a little research. Wolfram Alpha provides a note stating that is the nth derivative of the digamma function. , since the zeroth derivative of a function is the function itself. So, it makes sense to search Wikipedia for the digamma function. The article states that for any , the result of the digamma function lies in the interval:
and that “the exponential is approximately for large .” This means that, for large , .
Update and Simplify
So, plugging in the information from Wikipedia into the output from Wolfram Alpha and simplifying, we get:
The result of all of this is a high-quality constant-time approximation that replaces the linear time operation. No special mathematical ability was necessary, just some simple algebraic manipulations. The real magic was provided by the tools (Wolfram Alpha and Wikipedia). The only part that required a little intuition was noticing that and were updated by the same function and could thus be rewritten in terms of each other and the initial conditions, and . These types of series pop up a lot; sometimes it is a simple arithmetic series whose replacement is obvious. When it’s not as obvious, try to simplify the loops, and lean on tools like CAS tools to simplify the equations modeled by your code. Finding these constant time algorithmic speed ups is rather rewarding and may yield huge performance improvements in your codebase.
Finally, how good is the approximation? Try it out for yourself. All of the visualizations are interactive.
Exact vs Approximate Surface
Notice that the surface created by the approximation is virtually indistinguishable by visual inspection. To see just how good the function is, a difference plot is in order. The -values are created by the function:
where is the constant time approximation function and the is the exact function. is added to the absolute value of the difference in order to avoid taking the log of zero. The magnitude of the -values represent the number of digits of precision of the approximation and the sign represents whether the approximation overestimates ( + ) or underestimates ( - ).
The ratio of runtime of the exact and the approximate algorithms is shown on a log scale. This plot shows that there is nearly a speed up of the approximate over the exact algorithm’s runtime for and .
All benchmarking source data used to create these visualizations is available here: