I was working on fixing a bug in Aloha when I came across a usage of Scala’s combinations function in the SeqLike trait. I’d never used it before and was a little disappointed when I found that:

If there is more than one way to generate the same subsequence, only one will be returned.

What I thought (and hoped) it did was create the list of k-subsequences. Nope. So I figured I’d write something that actually computes what I wanted. I wanted four things in the implementation:

  1. Accuracy
  2. Speed
  3. Low Space Overhead
  4. Thread-safety

Accuracy will be discussed below the code. Click here for that discussion. Speed is also an obvious concern. To ensure iteration is fast, I used an adaptation of Nils Pipenbrinck’s response to the stack overflow post Creating multiple numbers with certain number of bits set. The algorithm employs bit hacks and takes 5 64-bit long operations to find a the next value and k masking operations to build the k-subsequence. So the total runtime is \(O\left( k \right)\). The total auxiliary space is 2 64-bit integers and one AtomicLong for a total of \(O\left( 1 \right)\) auxiliary space.

The last concern is thread-safety. While the algorithm produces iterators that are thread-safe, I still want this to be fast. So I use AtomicLong and CAS operations to avoid blocking.

Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
import java.util.concurrent.atomic.AtomicLong
import java.util.concurrent.atomic.AtomicBoolean

object SubSeqIterator {
  private val Masks = 0 to 62 map { i => 1L << i }

  def apply[A](values: IndexedSeq[A], k: Int): Iterator[Seq[A]] =
    if (0 == k) OneIterator[A]()
    else if (k > values.size) Iterator.empty
    else new SubSeqIterator(values, k)

  case class OneIterator[A]() extends Iterator[Seq[A]] {
    val more = new AtomicBoolean(true)
    def hasNext: Boolean = more.get
    def next(): Seq[A] = 
      if (more.getAndSet(false)) Seq[A]()
      else throw new NoSuchElementException
  }

  case class SubSeqIterator[A](values: IndexedSeq[A], k: Int) 
  extends Iterator[Seq[A]] {
    require(values.size <= 63)
    require(k <= values.size)

    private[this] val current = new AtomicLong(smallest(k))
    private[this] val last = largest(values.size, k)
    private[this] val numValues = values.size

    def hasNext: Boolean = current.get <= last

    def next(): IndexedSeq[A] = {
      var c = current.get
      var n = 0L
      if (c <= last) {
        n = next(c)
        while (c <= last && !current.compareAndSet(c, n)) {
          c = current.get
          n = next(c)
        }
      }
      if (c <= last) subset(c)
      else throw new NoSuchElementException
    }

    protected[this] def subset(n: Long) = {
      var i = 0
      var els = values.companion.empty[A]
      while(i < numValues) {
        if (0 != (Masks(i) & n))
          els = els :+ values(i)
        i += 1
      }
      els
    }

    protected[this] def largest(n: Int, k: Int): Long = 
      smallest(k) << (n - k)
    protected[this] def smallest(n: Int): Long = (1L << n) - 1
    protected[this] def next(x: Long): Long = {
      if (x == 0) 0
      else {
        val smallest = x & -x
        val ripple = x + smallest
        val newSmallest = ripple & -ripple
        val ones = ((newSmallest / smallest) >> 1) - 1
        ripple | ones
      }
    }
  }
}

Accuracy

To test accuracy, we want to make sure that the number of k-subsequences is \(\left( \begin{matrix} n \\ k \end{matrix} \right)\), which is equivalent to the number k-element subsets of of n-element set. See the wikipedia article on the binomial coefficient for why this is true. The following code calculates the binomial coefficients:

1
2
3
4
5
6
7
8
9
10
11
import scala.language.postfixOps

implicit class Ops(val n: Long) { 
  def ! = { 
    def h(x: Long, s: Long): Long = 
      if (x <= 1) s 
      else h(x-1, s * x)
    h(n, 1)
  }
  def choose(k: Long) = (n!) / ((k!)*((n-k)!))
}

Then we can test that the proper number of k-subsequences are outputted.

1
2
3
4
5
6
7
8
9
10
11
val tests = for {
              n <- 1 to 20
              k <- 0 to n
              numSubseqs = SubSeqIterator(1 to n, k).size 
            } yield (n, k, numSubseqs, n choose k)

// Find places where # of k-subsequences aren't n choose k. 
val wrong = tests.filter(x => x._3 != x._4)

// Make sure there's nothing wrong.
assert(wrong.isEmpty, wrong.mkString("\n"))

Conclusion

“Well, there you have it.” A fast, thread-safe, memory efficient k-subsequence iterator. Enjoy!