# K-Length Subsequences

I was working on fixing a bug in
Aloha when I came across a usage of Scala’s
`combinations`

function in the SeqLike trait.
I’d never used it before and was a little disappointed when I found that:

If there is more than one way to generate the same subsequence, only one will be returned.

What I thought (*and hoped*) it did was create the list of *k*-subsequences.
Nope. So I figured I’d write something that actually computes what I wanted. I wanted four things in the
implementation:

- Accuracy
- Speed
- Low Space Overhead
- Thread-safety

Accuracy will be discussed below the code. Click here for that discussion. Speed is
also an obvious concern. To ensure iteration is fast, I used an adaptation of
Nils Pipenbrinck’s response to the stack overflow post
*Creating multiple numbers with certain number of bits set*.
The algorithm employs bit hacks and takes 5 64-bit long operations to find a the next value and *k*
masking operations to build the
*k*-subsequence. So the total runtime is . The
total auxiliary space is 2 64-bit integers and one
AtomicLong for a total of
auxiliary space.

The last concern is thread-safety. While the algorithm produces iterators that are thread-safe, I still want this to be fast. So I use AtomicLong and CAS operations to avoid blocking.

## Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70

import java.util.concurrent.atomic.AtomicLong
import java.util.concurrent.atomic.AtomicBoolean
object SubSeqIterator {
private val Masks = 0 to 62 map { i => 1L << i }
def apply[A](values: IndexedSeq[A], k: Int): Iterator[Seq[A]] =
if (0 == k) OneIterator[A]()
else if (k > values.size) Iterator.empty
else new SubSeqIterator(values, k)
case class OneIterator[A]() extends Iterator[Seq[A]] {
val more = new AtomicBoolean(true)
def hasNext: Boolean = more.get
def next(): Seq[A] =
if (more.getAndSet(false)) Seq[A]()
else throw new NoSuchElementException
}
case class SubSeqIterator[A](values: IndexedSeq[A], k: Int)
extends Iterator[Seq[A]] {
require(values.size <= 63)
require(k <= values.size)
private[this] val current = new AtomicLong(smallest(k))
private[this] val last = largest(values.size, k)
private[this] val numValues = values.size
def hasNext: Boolean = current.get <= last
def next(): IndexedSeq[A] = {
var c = current.get
var n = 0L
if (c <= last) {
n = next(c)
while (c <= last && !current.compareAndSet(c, n)) {
c = current.get
n = next(c)
}
}
if (c <= last) subset(c)
else throw new NoSuchElementException
}
protected[this] def subset(n: Long) = {
var i = 0
var els = values.companion.empty[A]
while(i < numValues) {
if (0 != (Masks(i) & n))
els = els :+ values(i)
i += 1
}
els
}
protected[this] def largest(n: Int, k: Int): Long =
smallest(k) << (n - k)
protected[this] def smallest(n: Int): Long = (1L << n) - 1
protected[this] def next(x: Long): Long = {
if (x == 0) 0
else {
val smallest = x & -x
val ripple = x + smallest
val newSmallest = ripple & -ripple
val ones = ((newSmallest / smallest) >> 1) - 1
ripple | ones
}
}
}
}

## Accuracy

To test accuracy, we want to make sure that the number of *k*-subsequences
is , which is equivalent to the number *k*-element subsets of
of *n*-element set. See the wikipedia article on the
binomial coefficient for why this
is true. The following code calculates the binomial coefficients:

1
2
3
4
5
6
7
8
9
10
11

import scala.language.postfixOps
implicit class Ops(val n: Long) {
def ! = {
def h(x: Long, s: Long): Long =
if (x <= 1) s
else h(x-1, s * x)
h(n, 1)
}
def choose(k: Long) = (n!) / ((k!)*((n-k)!))
}

Then we can test that the proper number of *k*-subsequences are outputted.

1
2
3
4
5
6
7
8
9
10
11

val tests = for {
n <- 1 to 20
k <- 0 to n
numSubseqs = SubSeqIterator(1 to n, k).size
} yield (n, k, numSubseqs, n choose k)
// Find places where # of k-subsequences aren't n choose k.
val wrong = tests.filter(x => x._3 != x._4)
// Make sure there's nothing wrong.
assert(wrong.isEmpty, wrong.mkString("\n"))

## Conclusion

*“Well, there you have it.”* A fast, thread-safe,
memory efficient *k*-subsequence iterator. *Enjoy!*