Clarification for the documentation

8 views
Skip to first unread message

Chris Curtin

unread,
Aug 13, 2009, 9:50:59 AM8/13/09
to cascading-user
Hi Chris,

Can I suggest updating the documentation on merging of two pipes? It
wasn't clear that the two pipes have to have the same field names AND
data types on the Fields in the tuple.

For example doing this:

Pipe[] pipes = Pipe.pipes( lhs, rhs );
Pipe merge = new GroupBy( pipes, new Fields( "group1", "group2" ) );

If the 'group1' field in the lhs pipe is an integer and a string in
the rhs pipe you get a runtime exception about not being able to cast
integers to strings. And the error is deep in Hadoop code so you can't
see in the stack trace where in your code the issue was being created.

I encountered this when combining the output of a previous execution
with a current execution (so compare last month's sales figures with
this month's) where the previous execution was stored in a file (read
via tap/each) and the current execution was the result of a non-
trivial set of operations in Cascading. The previous execution treated
them all as strings, but the current had them as typed based on what
values were passed to the Tuple add() method.

Making the Each be more than a Regex and typing each Field solved the
problem.

Thanks,

Chris

Chris K Wensel

unread,
Aug 13, 2009, 10:54:35 AM8/13/09
to cascadi...@googlegroups.com
Chris

Good point. I'll add a note that when grouping or sorting, the values
must be the same type or null.

Of not, we are experimenting in 1.1 with setting custom comparators on
the Fields object on the field the comparator should be used. This
will provide fine grained control of the comparison and sorting.

Have also considered allowing Fields to hold a type. this would be
handy in this case where you want to guarantee the values are coerced
before the comparison. an alternative would be to have a Comparator
take Object though, and do the coercion. still pondering...

ckw
--
Chris K Wensel
ch...@concurrentinc.com
http://www.concurrentinc.com

Chris K Wensel

unread,
Aug 13, 2009, 12:23:33 PM8/13/09
to cascadi...@googlegroups.com
fyi, I've pushed up a new userguide with these changes and others
brought up over the past week or so.

ckw
Reply all
Reply to author
Forward
0 new messages