Self-Improving Algorithms for Delaunay Triangulations

Nir Alon, Bernard Chazelle, Ken Clarkson*, Ding Liu, C. Seshadhri, Wolfgang Mulzer

*IBM Almaden
(all others: Princeton Univ., or both)

Outline

Self-improving algorithms
...for sorting

Sketch of analysis

...for Delaunay triangulation

Sketch of algorithm

Sequences of Computational Problems

A computational problem: given dataset `I`, compute `f(I)`

`I=` a set of `n` values,
`f(I)=` their sorted order, the rank of each input value
`I=` a set of `n` points in the plane,
`f(I)=` their Delaunay triangulation

Problem sequence:

Given datasets `I_1`, `I_2`, `I_3, ...` in turn,
Compute `f()` of each in turn

Sometimes the `I_k` are like each other in some way

Self-Improvement

We show:

When computing `f(I_k)`,
experience of computing `f(I_1)`, `f(I_2), ..., f(I_{k-1})` can sometimes help
More precisely: the previous datasets `I_1...I_{k-1}`, and some additional work

We found self-improving algorithms for sorting and Delaunay triangulations

Self-improvement here means using past experience to run faster

An instance of learning
Often "learning" = "using past experience to classify better"
But generally, "learning" = "using past experience to perform better"

Random Data and Comparisons

We assume that the datasets `I_k` are random, with the same distribution

The random variables are entire sets,
not individual values or points

Thus each `f(I_k)` is also a random variable

Our algorithms are comparison-based

They ask a series of yes/no questions

How many comparisons are needed?

Identifying the Output using Comparisons

Enough questions must be asked about instance `I` to tell the different `f(I)` apart

If there are eight possible outputs, two questions about the input may not be enough

The set of comparisons done by the algorithm determine `f(I)`
To use as few comparisons as possible,
use more for `f(I)` that are less likely

Entropy Lower Bounds

These ideas suggest that the entropy of `f(I)` determines the number of comparisons needed
Suppose you want to send `f(I_1), f(I_2),...` over a communication channel
You could encode each `f(I)` by the bit sequence of the comparison results
The best encoding takes at least the entropy `H(f(I)) := sum_y Pr(y) log(1// Pr(y))` bits
So: the optimal expected number of comparisons is at least `H(f(I))`

Meeting the Entropy Lower Bounds

We give algorithms that use `O(n + H(Y))` comparisons

...and `text{Work} = O(text{Comparisons})
That is, optimal
With a lot of storage: `Theta(n^2)`

A tradeoff: for given `epsilon in (0,1]`,

`1//epsilon` times the `O(n + H(Y))` comparisons
`n^{1+epsilon}` training instances : the `I_1...I_{k-1}`
`n^{1+epsilon} log n` space

These Results Are Not About

Using any structure within a given instance

Such as, the data is nearly sorted

Assuming that the distribution of the instances is known
Assuming that the distribution has any special properties

(That is, beyond the independence condition described next)

Using training instances as random samples

(Well, it kind of is, but not quite)

An Additional Condition: Independence

For input set `I = {x_1, x_2,... , x_n}`,
we also require each `x_i` to be an independent random variable

This does not eman the `x_i` are identically distributed, or that the distributions are known

That is, `I` has a product distribution `D := prod_i bb D_i`

While other restrictions might also help,
some additional condition on `D` is required for good bounds

We show that for general distributions `D`, exponential space is needed for target running time `O(n + H(Y))`

Sorting : The Typical Set `V`

The sorting algorithm uses a set `V` of "typical" values, and a collection of search trees
`V` is built as follows:

Take `lambda` training instances `I_1...I_{lambda}`

`lambda := c log n`, for value `c` to be determined

Merge all values `I_1 cup I_2 cup ... cup I_{lambda}` to make a sorted list `J` with `lambda n` values
We put each `lambda`-th value of `J` into the list `V` of `n` values

`V` represents the overall distribution of `I`

We expect one value of `I` in each interval `[v_j, v_{j+1})

Sorting : Search Trees `T_i`

To use `V`,
we build a binary search tree `T_i` on `V`
for each input distribution `D_i`
`T_i` is built so that its search cost for `x_i` is the optimal `H(D_i)`

More precisely:

The random variable associated with the search is the bucket `b_i:= [v_j, v_{j+1})` containing `x_i`
The search cost is `H(b_i) le H(D_i)`
Additional training instances are used to estimate `D_i` and build `T_i`

Sorting : The Algorithm

The algorithm is:

For each `i=1..n`, locate `x_i` in the buckets using `T_i`

Sort the set of values falling in each bucket

`O(1)` values/bucket implies `O(1)` work/bucket

Total work for sorts in all buckets is `O(n)`, searches are entropy-optimal

So we're done, right?

Analysis

We're not quite done
Although the `T_i` are individually optimal,
it hasn't been show their cost is small
It remains to show that `sum_i H(b_i) = O(n+H(Y))`

Independence implies that `H(b_1, b_2,...b_n) = sum_i H(b_i)`

That is, there is at least as much information in the output ranking `Y=f(I)` as in bucket assigments, up to additive `O(n)`

Analysis via Encoding

Suppose `b := (b_1,...,b_n)` can be computed from the output ranking `Y`,
using `O(n)` additional comparisons
Then the total number of bits needed to encode `b` is at most `O(n + H(Y))`

Encoding of `b` is: a good encoding of `Y`, plus bits representing comparison outcomes

`b` can computed from such an encoding as desired:

Sort the values `x_i` using `Y`;
Merge that sorted list with `V`

Delaunay Triangulations

Given a set `I` of points,
its Delaunay triangulation is a planar subdivision whose vertices are the points in `I`
If a triangle `t` has:

Vertices from `I`, and
No points of `I` in its circumscribed circle

Then `t` is a Delaunay triangle
A Delaunay triangulation comprises all such Delaunay triangles

(Ignoring the unbounded parts)

Sorting vs. Triangulation

Delaunay triangulation is like sorting, only more complicated

Actually, sorting can be reduced to Delaunay triangulation

That is:

We can view sorting as:
find all open intervals `(x_i, x_{i'})` that contain no values of `I`
We can view finding the Delaunay triangulation as:
find all disks inscribed on `{p_i, p_{i'}, p_{i''}}` that contain no points of `I`

The Delaunay disks

Our algorithm and analysis for triangulation generalizes that for sorting

Triangulation: the Typical Set `V`

As for sorting, our algorithm for triangulation also builds and uses a "typical" set `V`
`V` is a subset of `J := I_1 cup I_2 ... cup I_lambda`, `lambda = O(log n)`
`V` is a range space `epsilon`-net of `J`, with `epsilon := 1//n`

Such a net has the following property, for any disk `d`:
If disk `d` contains no points of `V`
Then `d` contains fewer than `epsilon lambda n = O(lambda)` points of `J`
Such sets, of size `O(1//epsilon) = O(n)` exist [MRW90][CV07]
Slightly larger random subsets are also `epsilon`-nets [HW97][C97]

More About `V`

Any disk containing no points of `V` will contain an expected `O(1)` points of `I`
We use `T(V)`, the Delaunay triangulation of `V`
By construction, each Delaunay disk of `V` will contain expected `O(1)` points of `I`

Triangulation: Search Trees `T_i`

As for sorting, our algorithm for triangulation also builds and uses optimal search data structures `T_i`
For sorting, `T_i` was a binary search tree
For triangulation, `T_i` is a data structure for planar point location
`T_i` allows fast search for the location of `p_i` in the triangulation of `V`
The triangle `b_i` containing `p_i` is a random variable, since `p_i` is
The fastest possible expected time to determine `b_i` is `H(b_i)`
`T_i` is a data structure with such search time [AMMW07]

Triangulation: the Algorithm

For sorting, `T_i` is used to bucket `x_i`, and subsorts are done in each bucket
For triangulation, `T_i` is used to find `b_i`, and that information is used to allocate `p_i` to `O(1)` subtriangulation subproblems

Each subproblem built from points in three Delaunay disks of `V`

Each subtriangulation is on `O(1)` expected points
The subtriangulations can be put together, to get a triangulation of `V cup I`
For sorting, it is trivial to get the sorted version of `I` from the sorted list `V cup I`
For triangulation, we apply a linear-time randomized algorithm to get `T(I)` from `T(V cup I)` [ChDHMST02]

Analysis: Encoding

As for sorting, even though the various steps are optimal in some respects, we're not done
To show optimality, we need to show that the entropy of `b` is `O(n + H(T(I)))`
As before: from `T(I)` we can obtain the `b_i` using an algorithm that needs `O(n)` comparisons, and this implies the result
For sorting, a key step was merging the sorted lists `V` and `I`
For triangulation, the analog is merging the triangulations `T(V)` and `T(I)`

Linear time using a polytope intersection algorithm[Ch92]

From `T(V cup I)`, we can obtain the `b_i` without too much pain

All the Analogies

Sorting	Delaunay Triangulation
Intervals `(x_i, x_{i'})` containing no values of `I`	Delaunay disks
Typical set `V`	Range space `epsilon`-net `V` [MRW90, CV07], Ranges are disks, `epsilon = 1//n`
`log n` training instance points in each bucket	`log n` training instance points in each disk
Expect `O(1)` values of `I` in each bucket	Expect `O(1)` points in each D. disk of `V`
Optimal weighted binary trees `T_i`	Entropy-optimal planar point location data structures `T_i` [AMMW07]
Sorting within buckets `->` sorted list of `V cup I`	Triangulation within small regions `-> T(V cup I)`
Removal of `V` from sorted `V cup I` (trivial)	Construction of `T(I)` from `T(V cup I)` [ChDHMST02]
In analysis: merge of sorted `V` and `I`	In analysis: merge of `T(V)` and `T(I)` [Ch92]
In analysis: recovery of buckets `b_i` from sorted `V cup I` (trivial)	In analysis: recovery of triangles `b_i` in `T(I)` from `T(V cup I)`

Concluding Remarks

The results are pleasingly tight, but maybe a little too expensive
Novelty for me: the coding arguments
Are there stronger conditions that imply cheaper algorithms?
Are there broader conditions that allow interesting results?

Without full independence of each `x_i`, for example

Thank you for your attention

Self-Improving Algorithms

Ken Clarkson / IBM Almaden