Type-Parametric Segment Trees In Scala

stree_small

In my post Functional Solution to Range Minimum Query (RMQ) using Segment Trees in Scala I explained how to efficiently solve the problem of finding the minimum element of an arbitrary sub-vector of integers. In that article I used a Segment Tree in which each node and leaf has an integer value, which is the minimum of the sub-vector defined by the corresponding range. In this article, I will explain how to use Segment Trees to solve a more general class of problems. Before introducing the solution, we need to answer the question, what are the functions, other the min:Int+ -> Int,  which can be computed efficiently using Segment Trees? Intuitively, some other functions are max(), sum(), avg(), probability(), etc. What do these functions have in common? It is the fact that they may be computed over each element in a given range or, more efficiently, using intermediate results, like their value computed over defined sub-ranges. In my article about RMQ we have seen that an efficient solution to the range minimum problem is based on a Segment Tree which stores the minimum of certain sub-ranges of the given integer vector. Using this technique, instead of iterating over the whole given sub-vector of n elements it is possible to execute only log2(n) computation steps. The same applies to max(), sum(), avg(), probability() computed over a range, and other similar functions, when their computation is optimised using Segment Trees.

In general, the functions we are targeting with Type-Parametric Segment Trees can be defined like this.

Definition
Let v be a Vector of n elements of type X, say, val v: Vector[X], and f a function f:X+ -> V. Function f can be solved efficiently using a Type-Parametric Segment Tree if and only if:
g: (V x V) -> V, such that
∀n >= 2, ∀i ∈ {0, n-2}    f(x0, x1, …, xn-1) = g(f(x0, x1, …, xi), f(xi+1, …, xn-1))                 (1)

Example: min()
Function min() clearly satisfies the definition above because the minimum of a set of elements can also be determined as the minimum of the minimum values computed over any two partitions of the same set.

Example: avg()
Let:
1) n ∈ Integer, such that n > 0;
2) val a = Vector[Integer], with a.length == n;
3) i, j ∈ Integer such that i, j >= 1 and i+j == n
4) avgn() the average computed over all n elements of vector a
5) avgi() the average computed over the first i elements of vector a
6) avgj() the average computed over the last j elements of vector a

We define our function g as follows:
7) g(avgi(), avgj()) = (i/n)(avgi()) + (j/n)(avgj())

Optional proof (only for the interested reader)
Proposition 1: avgn() = (i/n)avgi()+(j/n)avgj()
By definition of avg we have:
8) avgn() = (1/n)∑xn
Decomposing the sum:
9) (1/n)∑xn = (1/n)(∑xi+∑xj)
Using the distributive property:
10) (1/n)(∑xi+∑xj) = (1/n)∑xi + (1/n) ∑xj
But 1 = (i/i), therefore:
11) (1/n)∑xi = (i/n) (1/i) ∑xi
By definition of avg we have:
12) (1/i) ∑xi = avgi()
Combining 11) and 12) we now have:
13) (1/n)∑xi = (i/n)avgi()
In the same exact way it can easily be proved that:
14) (1/n)∑xj = (j/n)avgj()
Combining 10), 13) and 14) we have:
15) avgn() = (i/n)avgi()+(j/n)avgj()
This ends the proof.

We see from definition 7) that avg of n elements can be expressed in terms of its value on any two splits of the vector. To make the concept clearer, we will now see an example in Scala. First, we will see how the SegmentTree data structure can be redesigned in order to become type-parametric. For convenience’s sake, I will use images in this article, but the interested reader may find source code in github:
Type-parametric Segment Tree
Example with function AVG

The first thing to notice is the creation of dedicated class Range. A range class is already available in Scala, but its intended use is different from what we need for our SegmentTree. The below is a very light-weight implementation which allows to perform useful operations on ranges like, for example, intersection.

classrange
This is the implementation of the Segment Tree. The value of each node or leaf of a segment tree is parametric (class V).

If is worth noticing that even the function f:(V,V)=>V, which computes value V based on the partial results contained in the left and right sub-trees is also parametric. As we will see, this is required to achieve the desired degree of flexibility.
classsegmenttree
The build method of singleton object SegmentTree now contains two functions. Function xToV:(X)=>V takes a value from the original vector and builds the value of its corresponding leaf in the associated SegmentTree. Function fun:(V,V)=>V allows to compute value of a node, based on the partial results contained in its left and right sub-trees.
objectsegmenttree
In order to utilise the parametric segment tree to efficiently compute the average AVG over any given sub-vector of a given vector of integers, the following definitions are required:

  1. class Value which contains the average over a given sub-vector, and the length of such a sub-vector. We have seen above that this length is used by function fun:(V,V)=>V
  2. Method intToValue which, given an integer, builds a value having as avg the conversion to Float of the integer, and length equal to one.
  3. Method fun:(V,V)=>V, see proposition above.
    staverage

If we now use the test application ATAvg with the following input:

10 5
10 20 30 40 11 22 33 44 15 5
0 5
1 2
8 9
0 9
4 6

We obtain the following output:

22.17
25.00
10.00
23.00
22.00

It is easy to verify that it is correct.

Remark
The first row of the input contains the size of the vector, and the number of queries. The second row is the integer vector. Each subsequent row is a query (start and end index of the sub-vector). The output contains the avg value computed over each sub-vector in the query list.

This ends the explanation of the parametric segment tree. The interested reader may find the mathematics of segment trees in my previous article:
Functional Solution to Range Minimum Query (RMQ) using Segment Trees in Scala

If you find other interesting uses of this data structure to solve computation problems, I will be happy to receive your feedback. Thank you for the read, and Up Scala!!!

Functional Solution to Range Minimum Query (RMQ) using Segment Trees in Scala

Segment TreeRMQ

Given an array of N integers, computing the minimum over the whole array is an easy operation requiring N steps. For example, one can take the first value as a tentative minimum, and iterate through the array comparing the tentative minimum with the current value. Any time the current value is smaller than the tentative minimum, it becomes the new tentative minimum, until the end of the array.
Let us now consider the case where it’s not the minimum of the whole array which matters, but the minimum of an arbitrary sub-array of the original one. With the term sub-array I mean an array formed by M consecutive elements taken from the original array, with 1 <= M <= N
The brute-force approach would be to compute the minimum element for any given range. The Range Minimum Problem (RMQ) is about finding a more efficient solution to this problem. How can one do better than O(N)? One obvious way would be to pre-compute the minimum of sub-arrays and sacrifice space for performance. But, wait a moment, how many possible sub-arrays can be defined given an array of size N?
Given an array of size N there are N sub-arrays of size 1, N-1 sub-arrays of size 2, N-2 sub-arrays of size 3 (…), 2 sub-arrays of size N-1 and 1 sub-array of size N.
The total number of sub-arrays is therefore:
Number of sub-arrays with consecutive elements
which is bad, because this is space complexity O(N2). Is there another smarter way to tackle the problem? The answer is, of course, positive. Actually, there are a number of ways documented in the literature. This article will illustrate the technique known as “Segment Trees”. This is the idea that it is not necessary to pre-compute all the possible sub-arrays, but only some of them, and the other ones can be more quickly computed starting with the available partial results. Let us consider an example to clarify the idea. Before I go on, I need to do some explaining regarding the choice of Functional Programming (FP) data structures. Arrays are an imperative data structure which is not suitable for FP proper. The idea is that FP is about immutable values allowing for side-effect free computer programs. Arrays, instead, mimic the way a computer memory is structured, i.e. as a sequence of modifiable cells. The functional equivalent of arrays, in Scala, is Vectors. From now on, I will abide by the FP rules and use Vectors. I will now get back to our example. Given the vector:

val data = Vector(2,5,3,0,8,9,5,3,7,5,9,3)

if we knew already:

val min1 = min(Vector(2,5,3,0,8,9)) // min[0:5]
val min2 = min(Vector(5,3,7,5,9,3)) // min[6:11]

we would not need to scan the sub-arrays, because we could simply compute:

val min = math.min(min1, min2)

Segment Trees are balanced binary trees which are composed by nodes consisting of a range and a value, which is the pre-computed minimum of the sub-array corresponding to such a range. In order to see how this tree looks like, let us consider the following example:

val data = Vector(4,5,2,8,9,0,1,2,5,1,8,6,3) // N=13

Our Segment Tree would look like this (click on image to see it full size):

Example Segment Tree

The first thing to notice is that there are 25 ranges (nodes or leaves) in this tree, which is a lot less than 13(13+1)/2= 91
With higher values for N the difference would be even more noticeable. This number can be approximated in excess with this formula:

Approximation in excess of the number of ranges in a Segment Tree (nodes and leaves)

In our case it would give:

20 + 21 + 22 + 23 + 24 = 1 + 2 + 4 + 8 + 16 = 31

The approximation is in excess because the formula assumes that each node forks in two nodes, which is not always the case. To see how best is this idea than the brute force approach, let us consider a Vector with N=1000 elements. We have seen above that all possible sub-ranges are:

1000*(1000+1)/2 = 500'500

However, our Segment Tree will contain no more than:

1 + 2 + 4 + 8 + 16 + 32 + 64 + 128 + 256 + 512 + 1028 = 2’061

Remarkable achievement, isn’t it? We can now see how to query a Segment Tree in order to get the minimum value of any sub-range of the original vector. The query algorithm is very simple, if expressed recursively, which is the perfect way to write in a Functional Language like Scala. The idea is this: if the query range is the same as the root range, get the minimum value from there. Otherwise, query the sub-trees, each with the intersection of the query range and the respective left and right sub-ranges, recursively. The Scala code will speak for itself. Try it out and have fun! I will copy it here for convenience’s sake, but it is also available at this address:

https://github.com/maumorelli/alaraph/blob/master/hackerrank/src/com/alaraph/hackerrank/rmq/Solution.scala

RMQ-002
RMQ-003

The explanation of how to use the program, the input format and expected output can be found at:
https://www.hackerrank.com/challenges/range-minimum-query
Here is a brief excerpt:

Sample Input

10 5
10 20 30 40 11 22 33 44 15 5
0 5
1 2
8 9
0 9
4 6

Sample Output

10
20
5
5
11

We are now approaching the end of this article. I hope you had a good read. To conclude, I would like to draw your attention to the fact that the depth of the Segment Tree is only Log2(N)+1. This clearly implies that a Segment Tree can be visited very efficiently with a moderate number of recursive calls.

References

Hackerrank, (2014), https://www.hackerrank.com/challenges/range-minimum-query, accessed 01.08.2014

Topcoder, (2014), http://community.topcoder.com/tc?module=Static&d1=tutorials&d2=lowestCommonAncestor, accessed 01.08.2014

Wikipedia, (2014), https://en.wikipedia.org/wiki/Segment_tree, accessed 01.08.2014