In my post Functional Solution to Range Minimum Query (RMQ) using Segment Trees in Scala I explained how to efficiently solve the problem of finding the minimum element of an arbitrary sub-vector of integers. In that article I used a Segment Tree in which each node and leaf has an integer value, which is the minimum of the sub-vector defined by the corresponding range. In this article, I will explain how to use Segment Trees to solve a more general class of problems. Before introducing the solution, we need to answer the question, what are the functions, other the min:Int+ -> Int, which can be computed efficiently using Segment Trees? Intuitively, some other functions are max(), sum(), avg(), probability(), etc. What do these functions have in common? It is the fact that they may be computed over each element in a given range or, more efficiently, using intermediate results, like their value computed over defined sub-ranges. In my article about RMQ we have seen that an efficient solution to the range minimum problem is based on a Segment Tree which stores the minimum of certain sub-ranges of the given integer vector. Using this technique, instead of iterating over the whole given sub-vector of n elements it is possible to execute only log2(n) computation steps. The same applies to max(), sum(), avg(), probability() computed over a range, and other similar functions, when their computation is optimised using Segment Trees.
In general, the functions we are targeting with Type-Parametric Segment Trees can be defined like this.
Definition
Let v be a Vector of n elements of type X, say, val v: Vector[X], and f a function f:X+ -> V. Function f can be solved efficiently using a Type-Parametric Segment Tree if and only if:
∃ g: (V x V) -> V, such that
∀n >= 2, ∀i ∈ {0, n-2} f(x0, x1, …, xn-1) = g(f(x0, x1, …, xi), f(xi+1, …, xn-1)) (1)
♦
Example: min()
Function min() clearly satisfies the definition above because the minimum of a set of elements can also be determined as the minimum of the minimum values computed over any two partitions of the same set.
Example: avg()
Let:
1) n ∈ Integer, such that n > 0;
2) val a = Vector[Integer], with a.length == n;
3) i, j ∈ Integer such that i, j >= 1 and i+j == n
4) avgn() the average computed over all n elements of vector a
5) avgi() the average computed over the first i elements of vector a
6) avgj() the average computed over the last j elements of vector a
We define our function g as follows:
7) g(avgi(), avgj()) = (i/n)(avgi()) + (j/n)(avgj())
Optional proof (only for the interested reader)
Proposition 1: avgn() = (i/n)avgi()+(j/n)avgj()
By definition of avg we have:
8) avgn() = (1/n)∑xn
Decomposing the sum:
9) (1/n)∑xn = (1/n)(∑xi+∑xj)
Using the distributive property:
10) (1/n)(∑xi+∑xj) = (1/n)∑xi + (1/n) ∑xj
But 1 = (i/i), therefore:
11) (1/n)∑xi = (i/n) (1/i) ∑xi
By definition of avg we have:
12) (1/i) ∑xi = avgi()
Combining 11) and 12) we now have:
13) (1/n)∑xi = (i/n)avgi()
In the same exact way it can easily be proved that:
14) (1/n)∑xj = (j/n)avgj()
Combining 10), 13) and 14) we have:
15) avgn() = (i/n)avgi()+(j/n)avgj()
This ends the proof.
♦
We see from definition 7) that avg of n elements can be expressed in terms of its value on any two splits of the vector. To make the concept clearer, we will now see an example in Scala. First, we will see how the SegmentTree data structure can be redesigned in order to become type-parametric. For convenience’s sake, I will use images in this article, but the interested reader may find source code in github:
Type-parametric Segment Tree
Example with function AVG
The first thing to notice is the creation of dedicated class Range. A range class is already available in Scala, but its intended use is different from what we need for our SegmentTree. The below is a very light-weight implementation which allows to perform useful operations on ranges like, for example, intersection.
This is the implementation of the Segment Tree. The value of each node or leaf of a segment tree is parametric (class V).
If is worth noticing that even the function f:(V,V)=>V, which computes value V based on the partial results contained in the left and right sub-trees is also parametric. As we will see, this is required to achieve the desired degree of flexibility.
The build method of singleton object SegmentTree now contains two functions. Function xToV:(X)=>V takes a value from the original vector and builds the value of its corresponding leaf in the associated SegmentTree. Function fun:(V,V)=>V allows to compute value of a node, based on the partial results contained in its left and right sub-trees.
In order to utilise the parametric segment tree to efficiently compute the average AVG over any given sub-vector of a given vector of integers, the following definitions are required:
- class Value which contains the average over a given sub-vector, and the length of such a sub-vector. We have seen above that this length is used by function fun:(V,V)=>V
- Method intToValue which, given an integer, builds a value having as avg the conversion to Float of the integer, and length equal to one.
- Method fun:(V,V)=>V, see proposition above.
If we now use the test application ATAvg with the following input:
10 5 10 20 30 40 11 22 33 44 15 5 0 5 1 2 8 9 0 9 4 6
We obtain the following output:
22.17 25.00 10.00 23.00 22.00
It is easy to verify that it is correct.
Remark
The first row of the input contains the size of the vector, and the number of queries. The second row is the integer vector. Each subsequent row is a query (start and end index of the sub-vector). The output contains the avg value computed over each sub-vector in the query list.
This ends the explanation of the parametric segment tree. The interested reader may find the mathematics of segment trees in my previous article:
Functional Solution to Range Minimum Query (RMQ) using Segment Trees in Scala
If you find other interesting uses of this data structure to solve computation problems, I will be happy to receive your feedback. Thank you for the read, and Up Scala!!!
♦