How to sort comma separated keys in Reducer ouput?

4 views
Skip to first unread message

Punit Naik

unread,
Feb 17, 2016, 5:11:48 PM2/17/16
to Utah Hadoop Users Group - Big Data Utah

I am running an RFM Analysis program using MapReduce. The OutputKeyClass is Text.class and I am emitting comma separated R (Recency), F (Frequency), M (Monetory) as the key from Reducer where R=BigInteger, F=Binteger, M=BigDecimal and the value is also a Text representing Customer_ID. I know that Hadoop sorts output based on keys but my final result is a bit wierd. I want the output keys to be sorted by R first, then F and then M. But I am getting the following output sort order for unknown reasons:

545,1,7652    100000
545,23,390159.402343750    100001
452,13,132586    100002
452,4,32202    100004
452,1,9310    100007
452,1,4057    100018
452,3,18970    100021

But I want the following output:

545,23,390159.402343750    100001
545,1,7652    100000
452,13,132586    100002
452,4,32202    100004
452,3,18970    100021
452,1,9310    100007
452,1,4057    100018

NOTE: The customer_ID was the key in Map phase and all the RFM values belonging to a particular Customer_ID are brought together at the Reducer for aggregation.

Reply all
Reply to author
Forward
0 new messages