Cascading throws FileAlreadyExistsException while merging two input file

98 views
Skip to first unread message

PUSHPAK GOHEY

unread,
Nov 20, 2015, 10:05:04 AM11/20/15
to cascading-user
Hi,

I have a scenario where I have 2 pipe delimited input files with fields f1, f2 and f3. My expected output is merged tuples from input file 1 and first tuple of each group from input file 2.

Lets say input file 1 is like -
AAA|ASF|DFA
BBB|SDF|QWE
BBB|SDF|QWE
CCC|ASD|DFG
AAA|ASF|DFA
BBB|SDF|QWE

Input file 2 is like-
ZZZ|ERT|TERT
MMM|SDF|DFGF
ZZZ|ERT|TERT
MMM|SDF|DFGF
ZZZ|ERT|TERT
NNN|IUY|OWER

And the expected output is-
AAA|ASF|DFA
BBB|SDF|QWE
BBB|SDF|QWE
CCC|ASD|DFG
AAA|ASF|DFA
BBB|SDF|QWE
ZZZ|ERT|TERT
MMM|SDF|DFGF
NNN|IUY|OWER

For this, I GroupBy on field f1 for input file 2 and apply First() aggregator function to get first tuple from each group. Then I merge both input file 1 and input file2.

But while executing, Cascading throws below exception even though the SinkMode of sink tap is REPLACE-
cascading.flow.FlowException: unhandled exception
at cascading.flow.BaseFlow.complete(BaseFlow.java:954)
at cascading.platform.hadoop2.MergeInputPipes.testMergePipeAssembly(MergeInputPipes.java:72)
....
Caused by: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory <output file> already exists
at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:132)
at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:564)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)


I am facing above exception while executing this scenario on both Cascading 3.0.1 and Cascading-wip-3.0 (3.0.3 [unreleased]) versions.

Please help me in resolving this issue.

Thanks,
Pushpak D Gohey

Chris K Wensel

unread,
Nov 20, 2015, 12:20:32 PM11/20/15
to cascadi...@googlegroups.com
it very well could be a corrupt plan (if you write the DOT files by enabling planner tracing, it may be obviously a bad plan), since we’ve seen Merge go sour. if you can provide a test case, i’ll look into it too when I tackle your Merge test.

ckw

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/0f5c0c57-2896-41b2-9fbb-55521a180543%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Chris K Wensel




Ken Krugler

unread,
Nov 20, 2015, 1:06:38 PM11/20/15
to cascadi...@googlegroups.com
If the output directory you specified is an existing *file* then I think you could get this msg.

-- Ken


From: PUSHPAK GOHEY

Sent: November 20, 2015 7:05:03am PST

To: cascading-user

Subject: Cascading throws FileAlreadyExistsException while merging two input file


--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/0f5c0c57-2896-41b2-9fbb-55521a180543%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--------------------------
Ken Krugler
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr







--------------------------
Ken Krugler
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr





PUSHPAK GOHEY

unread,
Nov 23, 2015, 6:37:01 AM11/23/15
to cascading-user
Hi,

Thanks Chris and Ken for looking into this issue.

Ken, I checked and there is no existing file as the specified output directory. Still Cascading throws the above exception.

Chris, as per you suggestion, I have raised a pull request for this issue.

Thanks,
Pushpak D Gohey.

Chris K Wensel

unread,
Nov 23, 2015, 12:15:00 PM11/23/15
to cascadi...@googlegroups.com
sorry, missed the pull request, can you put it against cwensel/cascading, i think you have it on your fork.


For more options, visit https://groups.google.com/d/optout.

PUSHPAK GOHEY

unread,
Nov 24, 2015, 2:55:46 AM11/24/15
to cascading-user
Hi Chris,

Yes, I accidentally had it on my fork. Sorry! I have raised a new pull request against cwensel/cascading.

Thanks,
Pushpak D Gohey

arun....@gmail.com

unread,
Mar 23, 2016, 3:04:50 AM3/23/16
to cascading-user
Am facing similar issue. Was this addressed ?

please let me know.

thanks
Arun


On Monday, November 23, 2015 at 10:45:00 PM UTC+5:30, Chris K Wensel wrote:
Reply all
Reply to author
Forward
0 new messages