next up previous print clean
Next: The weighted mean Up: MEANS, MEDIANS, PERCENTILES AND Previous: MEANS, MEDIANS, PERCENTILES AND

Percentiles and Hoare's algorithm

The median is the 50-th percentile. After residuals are ordered from smallest to largest, the 90-th percentile is the value with 10% of the values above and 90% below. At SEP the default value for clipping plots of field data is at the 98th percentile. In other words, magnitudes above the 98-th percentile are plotted at the 98-th percentile. Any percentile is most easily defined if the population of values ai, for i=1,2,...,n has been sorted into order so that $a_i \le a_{i+1}$ for all i. Then the 90-th percentile is ak where k=(90n)/100.

We can save much work by using Hoare's algorithm which does not fully order the whole list, only enough of it to find the desired quantile. Hoare's algorithm is an outstanding example of the power of a recursive function, a function that calls itself. The main idea is this: We start by selecting a random value taken from the list of numbers. Then we split the list into two piles, one pile all values greater than the selected, the other pile all less. The quantile is in one of these piles, and by looking at their sizes, we know which one. So we repeat the process on that pile and ignore the other other one. Eventually the pile size reduces to one, and we have the answer.

In Fortran 77 or C it would be natural to split the list into two piles as follows:

We divide the list of numbers into two groups, a group below ak and another group above ak. This reordering can be done ``in place.'' Start one pointer at the top of the list and another at the bottom. Grab an arbitrary value from the list (such as the current value of ak). March the two pointers towards each other until you have an upper value out of order with ak and a lower value out of order with ak. Swap the upper and lower value. Continue until the pointers merge somewhere midlist. Now you have divided the list into two sublists, one above your (random) value ak and the other below.
Fortran 90 has some higher level intrinsic vector functions that simplify matters. When a is a vector and ak is a value, a>ak is a vector of logical values that are true for each component that is larger than ak. The integer count of how many components of a are larger than ak is given by the Fortran intrinsic function count(a>ak). A vector containing only values less than ak is given by the Fortran intrinsic function pack(a,a<ak).

Theoretically about 2n comparisons are expected to find the median of a list of n values. The code below (from Sergey Fomel) for this task is quantile. quantilepercentile


next up previous print clean
Next: The weighted mean Up: MEANS, MEDIANS, PERCENTILES AND Previous: MEANS, MEDIANS, PERCENTILES AND
Stanford Exploration Project
4/27/2004