I'm using TextLine in Cascading to load files with very large lines in Cascading. The lines are very long - around 30Mb on average, some much longer. When I run the job locally to test it it runs fine, but when I run it on the cluster it fails after a period of intensive crunching. It gives errors like:
cascading.tuple.TupleException: unable to read from input identifier: maprfs:/xxx/xxx/xxx/part-00001
at cascading.tuple.TupleEntrySchemeIterator.hasNext(TupleEntrySchemeIterator.java:127)
at cascading.flow.stream.SourceStage.map(SourceStage.java:76)
at cascading.flow.stream.SourceStage.run(SourceStage.java:58)
at cascading.flow.hadoop.FlowMapper.run(FlowMapper.java:127)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:443)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:353)
at org.apache.hadoop.mapred.Child$4.run(Child.java:282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1122)
at org.apache.hadoop.mapred.Child.main(Child.java:271)
It also sometimes complains about stale file handles. The file it's trying to read is definitely there. Can somebody help me, please?
Here is a link to a more complete stack trace: http://pastebin.com/9JCbsmcr . I've run this job on two different clusters with the same results. I really need to solve this problem because it's blocking me significantly.
Best regards,
Ivan
--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/8df4d12c-412b-4ae5-a8c0-9f780387e500%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.