checkpoint question

33 views
Skip to first unread message

Pushpender Garg

unread,
Apr 17, 2015, 12:38:19 PM4/17/15
to cascadi...@googlegroups.com
I have had issues to use checkpoints in all tools that I have used and I have learnt to avoid checkpoint as much as possible. I think it can create even more issues on Hadoop. I have a question that lets say I did a groupby with 4 reducers and then did a checkpoint. It means all sorted data will go to disk and then it will be read again for next task which can be "every" operation. Now when cascading read from disk and apply groups for "every" operation is not there are possibility of messing up the group because now it will be in a map operation?

Ken Krugler

unread,
Apr 17, 2015, 1:42:05 PM4/17/15
to cascadi...@googlegroups.com


From: Pushpender Garg

Sent: April 17, 2015 9:38:19am PDT

To: cascadi...@googlegroups.com

Subject: checkpoint question


I have had issues to use checkpoints in all tools that I have used and I have learnt to avoid checkpoint as much as possible.

Please provide details on these issues - it could be a problem in Cascading, or a problem in how you're trying to use checkpoints.

I think it can create even more issues on Hadoop. I have a question that lets say I did a groupby with 4 reducers and then did a checkpoint. It means all sorted data will go to disk and then it will be read again for next task which can be "every" operation. Now when cascading read from disk and apply groups for "every" operation is not there are possibility of messing up the group because now it will be in a map operation?

Why would you do a checkpoint between the GroupBy and the Every?

-- Ken

--------------------------
Ken Krugler
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr





Pushpender Garg

unread,
Apr 18, 2015, 6:32:47 AM4/18/15
to cascadi...@googlegroups.com
I wasnt referring to cascading...in past I have had issues with SSIS, ABI, Datastage etc.

GroupBy being a heavy operation I think its a good candidate for checkpoint so that it could be restarted from here in case of failures. Also if there are filter operations after GroupBy and I want to avoid to execute multiple MR (multiple GroupBy). Moreover I dont think anything is stopping users to apply checkpoint after GroupBy and before Every. I will check it soon if get any errors.

Pushpender Garg

unread,
Apr 22, 2015, 10:05:32 AM4/22/15
to cascadi...@googlegroups.com
Its not allowing to have every after checkpoint. So this is not possible anyways. Getting below error:
Every may only be preceded by another Every or a Group pipe
Reply all
Reply to author
Forward
0 new messages