new new sorting API

163 views
Skip to first unread message

Stefan Karpinski

unread,
Jul 11, 2013, 7:01:14 PM7/11/13
to Julia Dev
I just merged a major revamp of the sorting API that uses keywords to modify the ordering of elements and the algorithm used for sorting (where that applies). The sorting algorithm is controlled by the `alg` keyword (choices provided by Base are QuickSort, InsertionSort, MergeSort and TimSort; a simple BubbleSort implementation lives in examples and show how easy it is to plug a new algorithm into the framework). The ordering of elements is controlled by:
  • `lt::Function=isless` to specify a custom less-than function (two arguments, returns true of the first is "less than" the second),
  • `by::Function=identity` to specify a function to apply to elements before comparing them (one argument)
  • `rev::Bool=false` to indicate that the normal sort ordering should be reversed.
  • `order` to specify an Ordering object – this is considered advanced and may change.
Function that only take ordering keywords are:
  • issorted,
  • select,
  • select!,
  • searchsorted,
  • searchsortedfirst,
  • searchsortedlast.
I'd really like to get rid of searchsortedfirst and searchsortedlast – it's kind of ridiculous that so much of the API namespace is occupied by these functions. Functions that take both algorithm and ordering keywords are:
  • sort,
  • sort!,
  • sortperm,
  • sortrows,
  • sortcols.
Here's an example transcript giving a flavor of the new API:

julia> words = map(chomp,readlines(open("/usr/share/dict/words")))
235886-element Any Array:
 "A"
 "a"
 "aa"
 "aal"
 ⋮
 "zythum"
 "Zyzomys"
 "Zyzzogeton"

julia> sort(words)
235886-element Any Array:
 "A"
 "Aani"
 "Aaron"
 "Aaronic"
 ⋮
 "zymurgy"
 "zythem"
 "zythum"

julia> sort(words,rev=true)
235886-element Any Array:
 "zythum"
 "zythem"
 "zymurgy"
 "zymotoxic"
 ⋮
 "Aaron"
 "Aani"
 "A"

julia> sort(words,by=length)
235886-element Any Array:
 "A"
 "a"
 "B"
 "b"
 ⋮
 "scientificophilosophical"
 "tetraiodophenolphthalein"
 "thyroparathyroidectomize"

julia> sort(words,by=length,rev=true)
235886-element Any Array:
 "formaldehydesulphoxylate"
 "pathologicopsychological"
 "scientificophilosophical"
 "tetraiodophenolphthalein"
 ⋮
 "y"
 "Z"
 "z"

julia> sort(words,by=length,alg=QuickSort) # unstable, words with equal length will be arbitrarily ordered
235886-element Any Array:
 "C"
 "c"
 "f"
 "F"
 ⋮
 "formaldehydesulphoxylate"
 "pathologicopsychological"
 "scientificophilosophical"

Kevin Squire

unread,
Jul 11, 2013, 10:02:55 PM7/11/13
to juli...@googlegroups.com
I like it!  Did you test for any performance regressions?


Function that only take ordering keywords are:
  • issorted,
  • select,
  • select!,
  • searchsorted,
  • searchsortedfirst,
  • searchsortedlast.
I'd really like to get rid of searchsortedfirst and searchsortedlast – it's kind of ridiculous that so much of the API namespace is occupied by these functions.

One option is to simply not export them, but all versions of `searchsorted` currently depend on both searchsortedfirst and searchsortedlast in some way.

Kevin

Stefan Karpinski

unread,
Jul 11, 2013, 11:32:15 PM7/11/13
to Julia Dev
On Thu, Jul 11, 2013 at 10:02 PM, Kevin Squire <kevin....@gmail.com> wrote:
I like it!  Did you test for any performance regressions?

Thanks. I think it's much better. I always had the feeling that my previous refactor was only halfway there, but this feels really done. I didn't do any performance testing, but the core sorting machinery didn't change at all – only the external interface changed. However, we really should add various sorting tests to the performance suite.

I'd really like to get rid of searchsortedfirst and searchsortedlast – it's kind of ridiculous that so much of the API namespace is occupied by these functions. 

One option is to simply not export them, but all versions of `searchsorted` currently depend on both searchsortedfirst and searchsortedlast in some way.

Yeah, this is the last part of the interface that really needs work. Making searchsorted{first,last} purely internal would be good. The interface of searchsorted needs some work too.

Laurent S

unread,
Jul 12, 2013, 6:51:28 AM7/12/13
to juli...@googlegroups.com
Great work. Please keep searchsortedfirst/last and expand its functionality to multidimensional arrays. Actually, I think findfirst and findlast should behave analogously, i.e., operating along a supplied dimension instead of on the arrays linear indices. I have often needed to find the first or last elements along a given dimension of an array satisfying some condition in MATLAB, which does not have such a feature.
Reply all
Reply to author
Forward
0 new messages