You can do something like this by specifying a custom partitioner
class (--partitioner), but we haven't yet been able to make it do what
we want.
For more information on our attempts, see:
https://github.com/Yelp/mrjob/issues/240
-Dave
On Thu, May 10, 2012 at 7:43 AM, eckamm wrote:
> I'd like to use mrjob to do what I think is called a "secondary sort" in my
> job.
>
> That is, the key used in the partitioning (say (field1)) is a part of the
> key used in the sorting (say (field1, field2)).
>
> This allows the reducer to work from a sorted iterator on (field1, field2)
> rather than bringing all the rows grouped with (field1) into RAM for sorting
> in my reducer which is the bad practice I employ right now because my
> mapper's key is (field1).
>
> Has anyone done this with mrjob?
--
Yelp is looking to hire great engineers! See http://www.yelp.com/careers.
You can do something like this by specifying a custom partitioner
class (--partitioner), but we haven't yet been able to make it do what
we want.
For more information on our attempts, see:
https://github.com/Yelp/mrjob/issues/240
-Dave
On Thu, May 10, 2012 at 7:43 AM, eckamm wrote:
> I'd like to use mrjob to do what I think is called a "secondary sort" in my
> job.
>
> That is, the key used in the partitioning (say (field1)) is a part of the
> key used in the sorting (say (field1, field2)).
>
> This allows the reducer to work from a sorted iterator on (field1, field2)
> rather than bringing all the rows grouped with (field1) into RAM for sorting
> in my reducer which is the bad practice I employ right now because my
> mapper's key is (field1).
>
> Has anyone done this with mrjob?
--
Yelp is looking to hire great engineers! See http://www.yelp.com/careers.
You can do something like this by specifying a custom partitioner
class (--partitioner), but we haven't yet been able to make it do what
we want.
For more information on our attempts, see:
https://github.com/Yelp/mrjob/issues/240
-Dave
On Thu, May 10, 2012 at 7:43 AM, eckamm wrote:
> I'd like to use mrjob to do what I think is called a "secondary sort" in my
> job.
>
> That is, the key used in the partitioning (say (field1)) is a part of the
> key used in the sorting (say (field1, field2)).
>
> This allows the reducer to work from a sorted iterator on (field1, field2)
> rather than bringing all the rows grouped with (field1) into RAM for sorting
> in my reducer which is the bad practice I employ right now because my
> mapper's key is (field1).
>
> Has anyone done this with mrjob?
--
Yelp is looking to hire great engineers! See http://www.yelp.com/careers.
You can do something like this by specifying a custom partitioner
class (--partitioner), but we haven't yet been able to make it do what
we want.
For more information on our attempts, see:
https://github.com/Yelp/mrjob/issues/240
-Dave
On Thu, May 10, 2012 at 7:43 AM, eckamm wrote:
> I'd like to use mrjob to do what I think is called a "secondary sort" in my
> job.
>
> That is, the key used in the partitioning (say (field1)) is a part of the
> key used in the sorting (say (field1, field2)).
>
> This allows the reducer to work from a sorted iterator on (field1, field2)
> rather than bringing all the rows grouped with (field1) into RAM for sorting
> in my reducer which is the bad practice I employ right now because my
> mapper's key is (field1).
>
> Has anyone done this with mrjob?
--
Yelp is looking to hire great engineers! See http://www.yelp.com/careers.
You can do something like this by specifying a custom partitioner
class (--partitioner), but we haven't yet been able to make it do what
we want.
For more information on our attempts, see:
https://github.com/Yelp/mrjob/issues/240
-Dave
Would like to hear about other approaches to doing this with mrjob.
--Eric