block by daniarleagk 278709dedf09451b794ff72c4c05cda1

Z-Order curve range query

Full Screen

This example is inspired by the post Z-Order Brush D3 Example. The Z-Order is a space filling curve (SFC) and a commonly used technique in databases to store and query multidimensional point data (Tropf et al., Orenstein et al., Geo-Wave). The basic query pattern is a multidimensional rectangular query. Since SFC generates a linear order for a data sets, the query box should be split into minimal number of segments (ranges) for efficient processing of rectangular query.

For a given query rectangle, we generate a minimal set of ranges that cover all points inside. The goal is to omit range segments that lie outside the query box, therefore avoiding unnecessary database scans (Tropf et al., Orenstein et al.). Data points can be stored and indexed according to their Z-order keys in any database that supports efficient range scans (e.g. relational Database, Apache HBase or Apache Accumulo, both are open source Google Bigtable implementation and support efficient range and parallel scans).

The algorithm follows “divide & conquer” pattern. We assume bit strings as an input. This algorithm is called with a map function as parameter and a query box represented by bit strings of lower left corner and upper right corner. For the sake of simplicity, we use a bit array (bit string) as a map function. E.g. [0,1,0,1,0,1,0] is a “classical” pattern to interleave 4 bits numbers.

The main idea of this algorithm is to check if the query box consists solely of one continuous curve segment. If this true, we stop and return a segment (in worst case this can be a single cell), otherwise we “cut” the query box along the hyperplane, resulting two smaller query boxes. Then we conduct the our algorithm on both of them. After processing both sides, we check if the segments can be combined in one continuous segment. In worst case we return a set (list) of single cell addresses, that covered by the query box (e.g. consider a horizontal line query).

Splitting algorithm can be described as recursive routine with the following steps (function getRanges(hyperPlanePosition, queryRec, mapFunction, highstBit)) :

  1. Check if a query box is a point: check if query box is a point. If yes, return this point as a segment.
  2. Check if a query box consists of a one continuous segment:We use the following fact to implement this check. If the Z-Bit-String of lower left point has the same prefix (that can be empty) as an upper right point and lower left corner has a suffix 00000…0000 and right upper corner has a suffix 11111…11111. If yes, return this segment.
  3. CutTheBox :( function cutBox(hyperplaneDimension, index, queryRec, mapFunction)) : “cuts” the box in two parts. Before we run this procedure, we compute the index of a first position where Z-Bit-String of lower left corner and Z-Bit-String of an upper right corner are different (and a suffix does not follow the pattern above). Then we define the hyperplane (dimension) for this index position and compute two new query boxes (“left” from hyperplane and “right” from hyperplane)
  4. Recursive Call: we call recursively for our procedure for a two new small query boxes (“left” call and “right” call).
  5. Merge: In last step we merge ranges from a both recursive calls in one list. Additionally, we check the last segment of the “left” call with a last segment of a “right” call whether they can be combined in one continuous segment if it could not. We append “right” list to the “left” one.