Robustly computing the centroid for a point set

We ran into an interesting problem at work the other day, where we had some geometry partitioning code that gave vastly different results on the same tessellated plane, depending on the scale factor that was applied to the plane. That would be fine, of course, if the code was taking size into consideration when partitioning, but this code wasn’t! After looking around it turned out that there was a problem with the code that incrementally updated the centroid for the current partition. (Said centroid being used in the partitioning code to find triangles spatially close to each other.)

Recall that the centroid C for N points P[i] is given by

C = (P[1] + P[2] + … + P[N]) / N.

When we don’t know the number of points beforehand, we can iteratively compute the (K+1)th centroid from the Kth centroid by

C[K+1] = (C[K] * K + P[K+1]) / (K + 1).

If that’s unclear, what this does is take the previous centroid, undoes the division by K from the previous iteration, adds the new point in, and redivides. This is the approach that was used in the code in question.

Now, this is bad for two reasons. The first problem is that the undo-redo multiplication and divide on each iteration will accumulate rounding errors. That is really of minor significance here, however. Worse is the magnitudal difference between C[K] * K and P[K+1]. As the iterations increase, C[K] * K will be (much) larger than P[K+1] and as such we will drop digits of P[K+1] during normalization, before the addition takes place. Worst case, P[K+1] could completely disappear, giving C[K+1] ≈ C[K] as a result!

A solution, then, is to rewrite the iterative expression as

C[K+1] = C[K] * (K / (K + 1)) + P[K+1] / (K + 1).

This results in the addition of two points of roughly equal magnitude, and behaves well regardless of the magnitude or position of the input points.

6 thoughts on “Robustly computing the centroid for a point set”

  1. You could do Kahan summation and store the total (K * C[k]) instead of the centroid (C[k]) – then you don’t have to do the division and multiplication to reconstitute the total. It just requires a little more storage.

  2. Hi Ian. I assume what you meant was to store K and the sum of C[K] separately, maintaining the sum using Kahan summation and dividing at the end, right? That would indeed improve things some over the naive iterative version, but still leaves the problem that most of the digits of P[K+1] can be dropped when added to the running sum, for large input meshes. It’s better not to allow the numbers to vary greatly in magnitude in the first place, as in the suggested rewrite.

Leave a Reply