Loading extremely long lines with TextLine in Cascading

16 views
Skip to first unread message

Ivan Nikolaev

unread,
Aug 15, 2014, 8:33:33 AM8/15/14
to cascadi...@googlegroups.com
Hello everyone,

I've been struggling with this for a few days now. I've posted the same topic to stack overflow, with no luck. Any help is appreciated.

I'm using TextLine in Cascading to load files with very large lines in Cascading. The lines are very long - around 30Mb on average, some much longer. When I run the job locally to test it it runs fine, but when I run it on the cluster it fails after a period of intensive crunching. It gives errors like:

cascading.tuple.TupleException: unable to read from input identifier: maprfs:/xxx/xxx/xxx/part-00001
at cascading.tuple.TupleEntrySchemeIterator.hasNext(TupleEntrySchemeIterator.java:127)
at cascading.flow.stream.SourceStage.map(SourceStage.java:76)
at cascading.flow.stream.SourceStage.run(SourceStage.java:58)
at cascading.flow.hadoop.FlowMapper.run(FlowMapper.java:127)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:443)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:353)
at org.apache.hadoop.mapred.Child$4.run(Child.java:282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1122)
at org.apache.hadoop.mapred.Child.main(Child.java:271)

It also sometimes complains about stale file handles. The file it's trying to read is definitely there. Can somebody help me, please?

Here is a link to a more complete stack trace: http://pastebin.com/9JCbsmcr . I've run this job on two different clusters with the same results. I really need to solve this problem because it's blocking me significantly.


Best regards,

Ivan

Andre Kelpe

unread,
Aug 15, 2014, 8:46:59 AM8/15/14
to cascadi...@googlegroups.com
Hi Ivan,

From the stacktrace it looks like the error is happening in MapRFs and not Cascading. Since that is a commercial file system, you will have to contact MapR support for that.

- André


--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/8df4d12c-412b-4ae5-a8c0-9f780387e500%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
André Kelpe
an...@concurrentinc.com
http://concurrentinc.com

Ivan Nikolaev

unread,
Aug 15, 2014, 10:29:30 AM8/15/14
to cascadi...@googlegroups.com
Hello Andre,

thank you for your reply. I've opened a case with MapR support.

Best regards,
Ivan
Reply all
Reply to author
Forward
0 new messages