On Tuesday, March 28, 2017 at 6:17:15 PM UTC-7, Ron Shepard wrote:
> On 3/28/17 3:31 PM,
sgihal..@gmail.com wrote:
> > Can someone suggestion an algorithm to sort a two dimensional array of data
> > in selected sort order by column one, then by column two, then by column three . . . then by column n?
> The way I usually do this is to perform an indexed sort, where you
> rearrange integer index values rather than the underlying data. This
> eliminates potentially expensive swaps when the number of columns is
> large. With an indexed sort, the only data the sort routine actually
> sees and rearranges is the integer array. You also pass in a function
> argument that does the comparisons between just two elements, given
> their row indexes. You write this function any way you want. In the case
> you have here, you would compare the first elements, then the second
> elements, and so on. The function typically returns -1, 0, or +1 as a
> result of the comparison to indicate how the two values are ordered.
Yes, many do that with C's qsort. Often enough, I find the problem
I have is small enough, and the indirection enough extra work, to
just sort the array.
> The underlying data does not change during this operation. Upon return,
> the integer index array determines the sorted order. You can go
> backwards or forwards through this array as appropriate. If necessary,
> you can use the index array to reorder the actual data. This deferred
> operation only requires O(m) memory accesses, rather than the
> O(m*log(m)) or O(m^2) memory accesses if the data were sorted directly
> rather than through the index array.
Yes it is O(m), and fast if random access is fast.
As Knuth notes in "Sorting and Searching" that isn't true
for disk files. Once you have the sorted index, moving the
data records around is not easy.
> This approach works fine all the way back to f77 and before. With modern
> fortran, the same approach could be done with, for example, pointers
> rather than integers.
As with integration and minimization routines, the old favorite
was to have the routine call a comparison routine, and pass
the data through COMMON.