Cascading 2.7.1 checkpoints and restartable flow

12 views
Skip to first unread message

Moran Pardes

unread,
Jun 20, 2017, 10:01:03 AM6/20/17
to cascading-user
,Hi

.I'm trying to get a flow working with checkpoints and restart the flow when an error occurs somewhere in the flow

When I run the flow the first time (setting the runId), it seems to work, creating temporary data for the checkpoint. I then simulate an error by throwing exception at the end of the flow, which I assume would require the preceding steps (and checkpoints) to complete first

:Once the job fails, I re-run it with the same runId. However, I get the error

Caused by: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://vbox.localdomain:8020/tmp/hadoop-moranparbi1/cea-flow/restart/checkpoint already exists

Seemingly indicating that the flow isn't reusing that checkpoint data, but trying to overwrite it with the new run 

?Am I missing something

Thanks

Chris K Wensel

unread,
Jun 20, 2017, 12:11:33 PM6/20/17
to cascadi...@googlegroups.com
this could be a bug w/ Cascading on modern versions of Hadoop. 

if you can submit a test case using the version of hadoop you are running, we can better take a shot at resolving it.


ckw


--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at https://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/83c2d7bf-0eaf-4457-9ae7-a0505de172ef%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Moran Pardes

unread,
Jun 21, 2017, 3:11:36 AM6/21/17
to cascading-user
Hi

Thanks for your response

 You all ready have test case for it, called testRestartCheckpoint
You can find it under

I did the same thing unfortunately without success 

Anyway i am using hadoop 2.7.1

בתאריך יום שלישי, 20 ביוני 2017 בשעה 19:11:33 UTC+3, מאת Chris K Wensel:
Reply all
Reply to author
Forward
0 new messages