Hi guys, great job with Scalding.
I've installed Scala, Scalding, and can run the tutorials fine in "local" mode. I'm now trying to run it on the Hadoop cluster.
I've extracted the appropriate command from the scald.rb script to be:
HADOOP_CLASSPATH=/usr/share/java/hadoop-lzo-0.4.15.jar:/usr/local/scalding/target/scalding-assembly-0.8.2-SNAPSHOT.jar:/tmp/Tutorial0.jar hadoop jar /usr/local/scalding/target/scalding-assembly-0.8.2-SNAPSHOT.jar -libjars /tmp/Tutorial0.jar -Dmapred.reduce.tasks=20 -Dmapred.min.split.size=2000000000 Tutorial0 --hdfs
I don't have hadoop-lzo-0.4.15.jar but everything else is in order, but executing this on my Hadoop cluster (
2.0.0-mr1-cdh4.1.2) I get, after minutes of waiting:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/scalding/target/scalding-assembly-0.8.2-SNAPSHOT.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See
http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
Exception in thread "main" java.lang.UnsupportedOperationException: Not implemented by the DistributedFileSystem FileSystem implementation
at org.apache.hadoop.fs.FileSystem.getScheme(FileSystem.java:200)
at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2186)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2196)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2213)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:80)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2252)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2234)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:300)
at org.apache.hadoop.fs.FileSystem.getLocal(FileSystem.java:271)
at org.apache.hadoop.util.GenericOptionsParser.validateFiles(GenericOptionsParser.java:383)
at org.apache.hadoop.util.GenericOptionsParser.processGeneralOptions(GenericOptionsParser.java:281)
at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:422)
at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:168)
at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:151)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:64)
at com.twitter.scalding.Tool$.main(Tool.scala:128)
at com.twitter.scalding.Tool.main(Tool.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
What's the problem, and what is it doing for minutes before it throws this exception?