Sort array of tuples

943 views
Skip to first unread message

TR NS

unread,
Jun 15, 2014, 8:22:33 AM6/15/14
to julia...@googlegroups.com
I am having trouble sorting an array of tuples. Each tuple has 2 elements consisting of a string and a integer, and I need to sort them by the integer. I tried:

     sort(stats, by=x->x[2])

But that doesn't seem to work.

 

Stefan Karpinski

unread,
Jun 15, 2014, 10:48:16 AM6/15/14
to Julia Users
Works for me. What version of Julia are you using? Note that if you have the integer before the string in each tuple, you can just sort the array.

TR NS

unread,
Jun 15, 2014, 3:03:05 PM6/15/14
to julia...@googlegroups.com


On Sunday, June 15, 2014 10:48:16 AM UTC-4, Stefan Karpinski wrote:
Works for me. What version of Julia are you using? Note that if you have the integer before the string in each tuple, you can just sort the array.
 

Turns out I was trying to sort a Dict. Once I wrapped it in collect() it worked. Seems a little ugly though to first have to convert it to an array of tuples and then sort using `x->x[2]`. Might there be a way to sort Dicts directly? (Though obviously the result would not be Dict).

Good to know about the integer being before the string, though it won't help in this particular case. The Dict stores a word key and frequency value. I am counting the frequency of words in a corpus.  

Thanks.

Stefan Karpinski

unread,
Jun 15, 2014, 3:23:18 PM6/15/14
to julia...@googlegroups.com
You can use a comprehension over the Dict to get an array of count, string pairs and then sort that in place, which avoids some copying.

Kevin Squire

unread,
Jun 15, 2014, 5:11:31 PM6/15/14
to julia...@googlegroups.com
At one point while I was developing the OrderedDict class in DataStructures.jl, I made it possible to sort it (and sorting regular dictionaries returned an OrderedDict.  It should be pretty easy to add that functionality back.  You would need to load the DataStructures.jl package, of course.

Cheers,
   Kevin

Stefan Karpinski

unread,
Jun 15, 2014, 5:33:46 PM6/15/14
to Julia Users
Isn't an OrderedDict in the order you inserted key-value pairs though? This is more like a priority queue than that since you presumably want to count terms and then take the N most common – or sort them all in order of occurrence.

Kevin Squire

unread,
Jun 15, 2014, 6:10:21 PM6/15/14
to julia...@googlegroups.com
By default, OrderedDicts are in the order you insert the pairs, but I don't see anything wrong about allowing the user to change that order if she chooses.  And sorting by value isn't difficult either.  

Of course, using priority queue might fit the OP's use case better.

Cheers,
   Kevin

Stefan Karpinski

unread,
Jun 15, 2014, 6:23:00 PM6/15/14
to Julia Users
Right, but for getting all the pairs in order, I don't think that ordering them during insertion is helpful – it would be better to just sort them all at once afterwards.

TR NS

unread,
Jun 16, 2014, 11:37:52 AM6/16/14
to julia...@googlegroups.com


On Sunday, June 15, 2014 5:11:31 PM UTC-4, Kevin Squire wrote:
At one point while I was developing the OrderedDict class in DataStructures.jl, I made it possible to sort it (and sorting regular dictionaries returned an OrderedDict.  It should be pretty easy to add that functionality back.  You would need to load the DataStructures.jl package, of course.

That makes a lot of sense to me. Is this something that works now, or are you suggesting this as a possible future functionality?

Also, I assume the time it takes to sort an OrderedDict would be on the same order as sorting a Array of Tuple pairs. Is that right?


TR NS

unread,
Jun 16, 2014, 11:42:21 AM6/16/14
to julia...@googlegroups.com


On Sunday, June 15, 2014 6:23:00 PM UTC-4, Stefan Karpinski wrote:
Right, but for getting all the pairs in order, I don't think that ordering them during insertion is helpful – it would be better to just sort them all at once afterwards.


I think so too. Which is why I don't think a priority queue is the right choice either. (Priority queues proactively keep order at insertion time, correct?)

Stefan Karpinski

unread,
Jun 16, 2014, 11:49:02 AM6/16/14
to Julia Users
Yep, I suspect that this approach is probably best:

d = Dict{String,Int}()
# count the words
a = [ (c,w) for (w,c) in d ]
sort!(a)

For very large data, this may take up too much memory, however. I'm not sure if that's your situation. If your data is large enough to take up more than half of your available memory, then you need to get more clever.

Kevin Squire

unread,
Jun 18, 2014, 2:55:16 AM6/18/14
to julia...@googlegroups.com


On Monday, June 16, 2014 8:37:52 AM UTC-7, TR NS wrote:


On Sunday, June 15, 2014 5:11:31 PM UTC-4, Kevin Squire wrote:
At one point while I was developing the OrderedDict class in DataStructures.jl, I made it possible to sort it (and sorting regular dictionaries returned an OrderedDict.  It should be pretty easy to add that functionality back.  You would need to load the DataStructures.jl package, of course.

That makes a lot of sense to me. Is this something that works now, or are you suggesting this as a possible future functionality?

I worked up something this evening: https://github.com/JuliaLang/DataStructures.jl/pull/43

It needs some tests (and testing), but it seems to work. 

Also, I assume the time it takes to sort an OrderedDict would be on the same order as sorting a Array of Tuple pairs. Is that right?

 That's hard to judge without testing.  The complexity of the sort will be the same order, but the actual wall time depends on a large number of factors--how many temporaries are created, the cost of the comparisons, how much data, whether or not the data is contiguous, etc.

Cheers,
   Kevin
Reply all
Reply to author
Forward
0 new messages