sort on multiple keys

10 views
Skip to first unread message

Punit Naik

unread,
Aug 15, 2016, 12:55:31 PM8/15/16
to PigPen Support
Hi Everyone

I have a file like this:

{:a 10 :b 2 :c 1 :d 3}
{:a 1 :b 1 :c 7 :d 4}

The way I want to sort this is that I want the code to look at :a first. If he values are equal then move to the next key :b and so on and so forth. If they are not equal, I want to sort them in an ascending manner. Rest all keys are sorted in a descending manner.

Currently I am doing it this way:

(->> data 
       (pig/sort-by :d :desc) 
       (pig/sort-by :c :desc) 
       (pig/sort-by :b :desc)
       (pig/sort-by :a :asc))

Is this the correct way to do it? If not, can anyone please suggest me a workaround so that I can define my custom comparator and stuff like that? 

Matt Bossenbroek

unread,
Aug 15, 2016, 1:23:32 PM8/15/16
to Punit Naik, PigPen Support
The method you show is just going to re-sort the data 4 times, so that’s not going to work as expected.

There’s nothing to my knowledge in clojure that has `then-by` semantics, so the idiomatic approach is to put what you want to sort by into a vector. However, pig doesn’t support custom comparators, so that won’t work in pigpen. Distributed sorts aren’t amenable to traditional comparators anyway - it’s insufficient to just compare two values; what you really need is a way to partition them into equal buckets.

Ideally we would want to have some syntax that looks for something like this: (pig/sort-by (juxt :a :b :c :d)) and expand that into the pig syntax for sorting by multiple columns. But that’s kind of hacky & precludes using any other function that takes an input record and returns a vector of values to sort by. Another possibility would be to add an option that specifies the number of values to sort by and takes any arbitrary key-fn to return a vector of that many values. Those could then be exploded into the pig syntax & it should work.

So, right now there’s not a great way to do what you want. I could potentially add the aforementioned features, but it’ll take me a couple weeks as I’m heads down on another project at the moment. As a workaround, you could create a string sort key that would satisfy your sorting requirements using string ordering. This is super hacky and likely to be very error prone though. I know it’s cliche, but pull requests are welcome if you want to take a stab at doing it the right way. :)

Sorry for the unsatisfying answer, let me know if you have any more questions

-Matt

--
You received this message because you are subscribed to the Google Groups "PigPen Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pigpen-suppor...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Punit Naik

unread,
Aug 15, 2016, 1:44:03 PM8/15/16
to PigPen Support, naik.p...@gmail.com
Hi Matt

Not a problem at all. Thanks for the reply and for putting it so nicely.
Reply all
Reply to author
Forward
0 new messages