I'm writing an awk quicksort implementation to take the place of gawn asort() when I run with POSIX awk such as mawk and nawk. I'm posting the code in case it can help other people who have a similar need. The code runs well, as far as I can tell. Suggestions welcome . Best, Joel
# num-quicksort.awk
# Author: Joel Parker Henderson (
jo...@joelparkerhenderson.com)
# License: Apache, BSD, CC, GPL, MIT, any open source license.
# Website:
http://numcommand.com
# Intialize.
#
# Experts suggest switching from quicksort to insertion sort
# for smaller lists, and using a threshold in the range of 8 to 20.
# We choose 12 for our kinds of data. Change this as you like.
function num_quicksort_init() {
NUM_QUICKSORT_INSERTION_SORT_THRESHOLD = 12
}
# Quicksort.
#
# Quicksort selects a pivot and divides the data into values above and
# below the pivot. Sorting then recurses on the left and right sub-lists.
# Thanks to code posted on
http://awk.info/?doc/quicksort.html
# TODO: optimizations such as Tukeys ninther, Shell sort, dual pivot.
function num_quicksort(A) {
num_quicksort_slice(A, 1, num_arr_length(A))
}
function num_quicksort_slice(A, lo, hi, pivot_index) {
if (lo >= hi) return
if (hi - lo < NUM_QUICKSORT_INSERTION_SORT_THRESHOLD) {
num_insertion_sort_slice(A, lo, hi)
return
}
pivot_index = num_quicksort_pivot_index_via_median_of_three_sort_slice(A, lo, hi)
pivot_index = num_partition_slice(A, lo, hi, pivot_index)
num_quicksort_slice(A, lo, pivot_index - 1)
num_quicksort_slice(A, pivot_index + 1, hi)
}
# Partition an array slice using a given pivot index.
# Return the new pivot index.
function num_partition_slice(A, lo, hi, pivot_index, left, right, pivot_value, t) {
if (lo >= hi) return lo
left = lo
right = hi
pivot_value = A[pivot_index]
num_arr_swap(A, left, pivot_index)
while (left < right) {
while (left <= hi && A[left] <= pivot_value) left++
while (right >= lo && A[right] > pivot_value) right--
if (left < right) num_arr_swap(A, left, right)
}
pivot_index = right
num_arr_swap(A, lo, pivot_index)
return pivot_index
}
# Pivot.
#
# Choose a quicksort pivot index by using the "median of three" heuristic
# with a swap sort of the three items for efficiency on the next pivot.
#
# Compared to picking the pivot randomly, the median of three heuristic:
#
# * Ensures that a common case of fully sorted data remains optimal.
# * Is more difficult for an attacker to manipulate into the worst case.
# * Is often faster than a PRNG, which is often relatively slow.
#
# The median of three looks at the first, middle and last elements of
# the array, and choose the median of those as the pivot.
#
# To get the "full effect" of the median of three, it is also important
# to sort those three items, not just use the median as the pivot --
# this does not affect what is chosen as the pivot in the current
# iteration, but can/will affect what is used as the pivot in the next
# recursive call, which helps to limit the bad behavior for a few
# initial orderings (one that turns out to be particularly bad in many
# cases is an array that is sorted, except for having the smallest item
# at the high end, or having the largest item at the low end).
#
# Thanks to
http://stackoverflow.com/questions/7559608/median-of-three-values-strategy
#
# To calculate the midpoint, we prefer (lo+(hi-lo)/2) instead of the naive
# simpler ((hi+lo)/2) because the former does not risk integer overflow.
function num_quicksort_pivot_index_via_median_of_three_sort(A) {
num_quicksort_pivot_index_via_median_of_three_sort_slice(A, 1, num_arr_length(A))
}
function num_quicksort_pivot_index_via_median_of_three_sort_slice(A, lo, hi, mid) {
if (lo == hi) return lo
mid = lo + int((hi - lo) / 2)
if (A[hi] < A[lo]) num_arr_swap(A, lo, hi)
if (A[mid] < A[lo]) num_arr_swap(A, mid, lo)
if (A[hi] < A[mid]) num_arr_swap(A, hi, mid)
return mid
}
# Insertion sort.
#
# This implementation is a slightly faster version that moves A[i]
# to its position in one go and only performs one assignment in the
# inner loop body. See
https://en.wikipedia.org/wiki/Insertion_sort
function num_insertion_sort(A) {
num_insertion_sort_slice(A, 1, num_arr_length(A))
}
function num_insertion_sort_slice(A, lo, hi, i, j, x) {
if (lo >= hi) return
for (i = lo + 1; i <= hi; i++) {
x = A[i]
j = i
while (j > lo && A[j-1] > x) {
A[j] = A[j-1]
j--
}
A[j] = x
}
}
# Swap array items.
function num_arr_swap(A, i, j, t) {
t = A[i]; A[i] = A[j]; A[j] = t
}
# Length of an array.
function num_arr_length(arr, i, len) {
for (i in arr) len++
return len
}