CoGroup problem

28 views
Skip to first unread message

mm

unread,
Oct 23, 2009, 4:46:38 PM10/23/09
to cascading-user
Hi

Cascading looks like an awesome tool! However, I am having problems.
I'm trying to join to streams using CoGroup:

Fields artistFields = new Fields("artistSource", "artistId",
"artistName", "artistUrlName");
Fields releaseFields = new Fields("releaseSource", "releaseId",
"artistSource", "artistId", "title", "releaseUrlName", "releaseType",
"releaseFormat", "releaseDate", "releaseArtistName");

...
Pipe setup code omitted
...
...

Fields common = new Fields("artistSource", "artistId");
Pipe merged = new CoGroup(artistPipe, common, releasePipe, common, new
InnerJoin());

The above code gives me:

cascading.tuple.TupleException: unable to select from: [UNKNOWN],
using selector: ['artistSource', 'artistId']
at cascading.tuple.Tuples.extractTuple(Tuples.java:139)
at cascading.pipe.Group.collectReduceGrouping(Group.java:760)
at cascading.flow.stack.GroupMapperStackElement.operateGroup
(GroupMapperStackElement.java:68)
at cascading.flow.stack.GroupMapperStackElement.collect
(GroupMapperStackElement.java:61)
at cascading.flow.stack.FlowMapperStack.map(FlowMapperStack.java:170)
at cascading.flow.FlowMapper.map(FlowMapper.java:75)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run
(LocalJobRunner.java:138)
Caused by: cascading.tuple.TupleException: field not found:
'artistSource', available fields: [UNKNOWN]
at cascading.tuple.Fields.indexOf(Fields.java:705)


I've tried quite a few of the different CoGroup constructors but
without any success. What am I missing? This is my first attempt at
using Cascading so I apologize if I'm missing the obvious

Thanks

- Magnus

Chris K Wensel

unread,
Oct 23, 2009, 5:36:22 PM10/23/09
to cascadi...@googlegroups.com
Magnus,

I think the bits you left out are significant.

you can write a DOT file (Flow#writeDOT) and open it via omnigraffle
or graphviz.

this will show you where you lost the field names in the tuples before
they reached the grouping operation.

ckw
--
Chris K Wensel
ch...@concurrentinc.com
http://www.concurrentinc.com

Dexin Wang

unread,
Oct 23, 2009, 6:41:11 PM10/23/09
to cascading-user
I agree. Once you paste here how you construct the pipes, it will be
obvious. One thing you might have missed is that you need to use
unique fields names from the two pipes you are CoGrouping, since you
don't declare output fields.

mm

unread,
Oct 23, 2009, 6:53:19 PM10/23/09
to cascading-user


On Oct 23, 11:36 pm, Chris K Wensel <ch...@wensel.net> wrote:
> Magnus,
>
> I think the bits you left out are significant.
>
> you can write a DOT file (Flow#writeDOT) and open it via omnigraffle  
> or graphviz.
>
> this will show you where you lost the field names in the tuples before  
> they reached the grouping operation.
>

Chris,

Thanks for your reply. I realize now that I was looking in the wrong
place. I have narrowed it down to a problem with a custom aggregator.

If I use a debugger it looks like the TupleEntries are ok when leaving
the complete method of the aggregator(fields: ['artistSource',
'artistId', 'artistName', 'artistUrlName'] tuple: ['xyz', '0733',
'0733', '0733']). If I comment out the aggregator the CoGrouping works
fine.

I might as well give you the complete code:
http://cascadingtest.s3.amazonaws.com/CascadingTest.java

Any help greatly appreciated :-)

- Magnus


Chris K Wensel

unread,
Oct 23, 2009, 8:03:51 PM10/23/09
to cascadi...@googlegroups.com
Your aggregator must declare result Fields. otherwise Fields.UNKNOWN
is used.

ckw

mm

unread,
Oct 23, 2009, 8:13:59 PM10/23/09
to cascading-user
Thank you. It works now. Thanks for excellent support.

- Magnus

Chris K Wensel

unread,
Oct 23, 2009, 8:15:12 PM10/23/09
to cascadi...@googlegroups.com
anytime

fyi, faster support sometimes on IRC freenode #cascading channel.
Reply all
Reply to author
Forward
0 new messages