Checkpoints local/hadoop mode

177 views
Skip to first unread message

tom kern

unread,
Jul 12, 2012, 6:12:25 AM7/12/12
to cascadi...@googlegroups.com
hi,

i am trying to integrate checkpoints into my app. i just can't figure out how checkpoints should be used.

locally i get no output whatsoever. is this supposed to be the case?

in hadoop (LFS file tap) mode it seems to depend on where exactly i put the checkpoint. for example, this fails:
r = new Rename(r);
pipe = new Merge(r, otherPipe);
cp = new Checkpoint(pipe);
pipe = new CoGroup(cp, id, thirdPipe, .....);
cascading.tuple.TupleException: unable to select from: ['offset', 'line'], using selector: ['id']

and does not:
r = new Rename(r);
cp = new Checkpoint(r);
pipe = new Merge(cp, otherPipe);
pipe = new CoGroup(pipe, id, thirdPipe, .....);

what am i missing here?

Thank you,
Thomas

Chris K Wensel

unread,
Jul 12, 2012, 11:45:38 AM7/12/12
to cascadi...@googlegroups.com
In part this may be a bug. Lfs means local disk, and if you are running on a cluster, local disk is some disk inside the cluster (not hdfs).

typically if you use Lfs (file://...) on a tap, the MR job in the Flow touching that tap will run in hadoop local mode. 

the bug being the HadoopPlanner is not recognizing the checkpoint tap as Lfs and forcing that mr job to run in hadoop local mode. which is probably exactly what you do not want.

to try using Hfs. 

and I'll see if I can fail the planner or force the intermediate jobs to run in hadoop local mode in a maint release.

ckw

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To view this discussion on the web visit https://groups.google.com/d/msg/cascading-user/-/E2CTbydD7DoJ.
To post to this group, send email to cascadi...@googlegroups.com.
To unsubscribe from this group, send email to cascading-use...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en.


tom kern

unread,
Jul 12, 2012, 12:30:04 PM7/12/12
to cascadi...@googlegroups.com
> In part this may be a bug. Lfs means local disk, and if you are running on a cluster, local disk is some disk inside the cluster (not hdfs).

i haven't yet tried to run it on the cluster and locally i use hadoop in standalone mode.
trap works flawlessly in standalone and Lfs. will report regarding Hfs.

Thomas

tom kern

unread,
Jul 12, 2012, 2:03:36 PM7/12/12
to cascadi...@googlegroups.com
problem also occurs for hdfs on the cluster. job does not finish and throws an exception. weirdly enough, i end up with two part-* files in the checkpoint directory, 0000 containing the actual data, 0001 being empty

Chris K Wensel

unread,
Jul 12, 2012, 3:29:58 PM7/12/12
to cascadi...@googlegroups.com
ok, suck. will try and reproduce today and get a fix out. 

thanks for reporting this!

chris

On Jul 12, 2012, at 11:03 AM, tom kern wrote:

problem also occurs for hdfs on the cluster. job does not finish and throws an exception. weirdly enough, i end up with two part-* files in the checkpoint directory, 0000 containing the actual data, 0001 being empty

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To view this discussion on the web visit https://groups.google.com/d/msg/cascading-user/-/iTEcJC4k7RMJ.

To post to this group, send email to cascadi...@googlegroups.com.
To unsubscribe from this group, send email to cascading-use...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en.

Chris K Wensel

unread,
Jul 12, 2012, 9:01:10 PM7/12/12
to cascadi...@googlegroups.com
I can't seem to reproduce this with the info given.

what version of Cascading are you running? make sure you are on 2.0.2.

ckw

Chris K Wensel

unread,
Jul 12, 2012, 9:06:24 PM7/12/12
to cascadi...@googlegroups.com
should have added if on 2.0.2, please see if you can bend one of the tests in MergePipesPlatformTest to look like your example, and/or maybe send me a dot file of the flow as well.

ckw

tom kern

unread,
Jul 13, 2012, 5:12:43 AM7/13/12
to cascadi...@googlegroups.com
i am on 2.0.2 according to maven.

i sent you the dot file of the flow to your email address in your signature, as i don't wanna make it public. if that doesn't work out for you i am gonna look into mergepipsplatformtest.

thomas
Reply all
Reply to author
Forward
0 new messages