OK, here's the problem in a nutshell. I have my cluster setup and can successfully run a teragen. However, when I run a terasort on the same data, it says that the Alluxio FS isn't found.
1. Install Cloudera 5.7 - 20x Datanodes with 11x 7.2K HDD and 384 GB RAM (limited YARN to 100GB)
2. Install Alluxio 1.0.1 (CDH5 binary, but would this work with 5.7?) - Give it 128GB each node
3. Add the data to core-site.xml per install
4. I distribute the client dependencies JAR to each node (same spot on each) and I add -libjars to point to it.
5. I added an override to CLASSPATH in Cloudera Manager
Everything LOOKs ok. Though I think I may have some redundant steps (both libjar and copying). Here's the kicker. I can run teragen:
time hadoop jar /opt/cloudera/parcels/CDH/jars/hadoop-examples.jar teragen -libjars /opt/cloudera/parcels/CDH/lib/alluxio-core-client-1.0.1-jar-with-dependencies.jar -D mapred.map.tasks=500 10000000000 alluxio://name1.poc.local:19998/input_1TB
[root@data5 ~]# time hadoop jar /opt/cloudera/parcels/CDH/jars/hadoop-examples.jar terasort -libjars /opt/cloudera/parcels/CDH/lib/alluxio-core-client-1.0.1-jar-with-dependencies.jar -D mapred.map.tasks=500 alluxio://name1.poc.local:19998/input_1TB alluxio://name1.poc.local:19998/output_1TB
16/06/02 20:10:03 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
16/06/02 20:10:03 INFO terasort.TeraSort: starting
16/06/02 20:10:03 INFO logger.type: initialize(alluxio://name1.poc.local:19998/input_1TB, Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml). Connecting to Alluxio: alluxio://name1.poc.local:19998/input_1TB
16/06/02 20:10:03 INFO logger.type: Loading Alluxio properties from Hadoop configuration: {}
16/06/02 20:10:03 INFO logger.type: Alluxio client (version 1.0.1) is trying to connect with FileSystemMasterClient master @ name1.poc.local/
172.16.100.156:1999816/06/02 20:10:03 INFO logger.type: Client registered with FileSystemMasterClient master @ name1.poc.local/
172.16.100.156:1999816/06/02 20:10:04 INFO logger.type: alluxio://name1.poc.local:19998 alluxio://name1.poc.local:19998 hdfs://name1.poc.local:8020/alluxio
16/06/02 20:10:04 INFO logger.type: getWorkingDirectory: /
16/06/02 20:10:04 INFO logger.type: getWorkingDirectory: /
16/06/02 20:10:04 INFO logger.type: getFileStatus(alluxio://name1.poc.local:19998/input_1TB): HDFS Path: hdfs://name1.poc.local:8020/alluxio/input_1TB Alluxio Path: alluxio://name1.poc.local:19998/input_1TB
16/06/02 20:10:04 INFO logger.type: listStatus(alluxio://name1.poc.local:19998/input_1TB): HDFS Path: hdfs://name1.poc.local:8020/alluxio/input_1TB
16/06/02 20:10:04 INFO input.FileInputFormat: Total input paths to process : 500
Spent 821ms computing base-splits.
Spent 22ms computing TeraScheduler splits.
Computing input splits took 845ms
Sampling 10 splits of 2000
16/06/02 20:10:04 INFO logger.type: create(alluxio://name1.poc.local:19998/output_1TB/_partition.lst, rw-r--r--, true, 65536, 10, 536870912, null)
16/06/02 20:10:04 INFO logger.type: open(alluxio://name1.poc.local:19998/input_1TB/part-m-00078, 65536)
16/06/02 20:10:04 INFO logger.type: open(alluxio://name1.poc.local:19998/input_1TB/part-m-00483, 65536)
16/06/02 20:10:04 INFO logger.type: open(alluxio://name1.poc.local:19998/input_1TB/part-m-00420, 65536)
16/06/02 20:10:04 INFO logger.type: open(alluxio://name1.poc.local:19998/input_1TB/part-m-00334, 65536)
16/06/02 20:10:04 INFO logger.type: open(alluxio://name1.poc.local:19998/input_1TB/part-m-00200, 65536)
16/06/02 20:10:04 INFO logger.type: open(alluxio://name1.poc.local:19998/input_1TB/part-m-00329, 65536)
16/06/02 20:10:04 INFO logger.type: open(alluxio://name1.poc.local:19998/input_1TB/part-m-00389, 65536)
16/06/02 20:10:04 INFO logger.type: open(alluxio://name1.poc.local:19998/input_1TB/part-m-00324, 65536)
16/06/02 20:10:04 INFO logger.type: open(alluxio://name1.poc.local:19998/input_1TB/part-m-00048, 65536)
16/06/02 20:10:04 INFO logger.type: open(alluxio://name1.poc.local:19998/input_1TB/part-m-00479, 65536)
16/06/02 20:10:04 INFO logger.type: Alluxio client (version 1.0.1) is trying to connect with FileSystemMasterClient master @ name1.poc.local/
172.16.100.156:1999816/06/02 20:10:04 INFO logger.type: Alluxio client (version 1.0.1) is trying to connect with FileSystemMasterClient master @ name1.poc.local/
172.16.100.156:1999816/06/02 20:10:04 INFO logger.type: Alluxio client (version 1.0.1) is trying to connect with FileSystemMasterClient master @ name1.poc.local/
172.16.100.156:1999816/06/02 20:10:04 INFO logger.type: Alluxio client (version 1.0.1) is trying to connect with FileSystemMasterClient master @ name1.poc.local/
172.16.100.156:1999816/06/02 20:10:04 INFO logger.type: Client registered with FileSystemMasterClient master @ name1.poc.local/
172.16.100.156:1999816/06/02 20:10:04 INFO logger.type: Alluxio client (version 1.0.1) is trying to connect with FileSystemMasterClient master @ name1.poc.local/
172.16.100.156:1999816/06/02 20:10:04 INFO logger.type: Alluxio client (version 1.0.1) is trying to connect with FileSystemMasterClient master @ name1.poc.local/
172.16.100.156:1999816/06/02 20:10:04 INFO logger.type: Client registered with FileSystemMasterClient master @ name1.poc.local/
172.16.100.156:1999816/06/02 20:10:04 INFO logger.type: Alluxio client (version 1.0.1) is trying to connect with FileSystemMasterClient master @ name1.poc.local/
172.16.100.156:1999816/06/02 20:10:04 INFO logger.type: Alluxio client (version 1.0.1) is trying to connect with FileSystemMasterClient master @ name1.poc.local/
172.16.100.156:1999816/06/02 20:10:04 INFO logger.type: Alluxio client (version 1.0.1) is trying to connect with FileSystemMasterClient master @ name1.poc.local/
172.16.100.156:1999816/06/02 20:10:04 INFO logger.type: Client registered with FileSystemMasterClient master @ name1.poc.local/
172.16.100.156:1999816/06/02 20:10:04 INFO logger.type: Client registered with FileSystemMasterClient master @ name1.poc.local/
172.16.100.156:1999816/06/02 20:10:04 INFO logger.type: Client registered with FileSystemMasterClient master @ name1.poc.local/
172.16.100.156:1999816/06/02 20:10:04 INFO logger.type: Client registered with FileSystemMasterClient master @ name1.poc.local/
172.16.100.156:1999816/06/02 20:10:04 INFO logger.type: Client registered with FileSystemMasterClient master @ name1.poc.local/
172.16.100.156:1999816/06/02 20:10:04 INFO logger.type: Client registered with FileSystemMasterClient master @ name1.poc.local/
172.16.100.156:1999816/06/02 20:10:04 INFO logger.type: Client registered with FileSystemMasterClient master @ name1.poc.local/
172.16.100.156:1999816/06/02 20:10:04 INFO logger.type: Alluxio client (version 1.0.1) is trying to connect with BlockMasterClient master @ name1.poc.local/
172.16.100.156:1999816/06/02 20:10:04 INFO logger.type: Alluxio client (version 1.0.1) is trying to connect with BlockMasterClient master @ name1.poc.local/
172.16.100.156:1999816/06/02 20:10:04 INFO logger.type: Alluxio client (version 1.0.1) is trying to connect with BlockMasterClient master @ name1.poc.local/
172.16.100.156:1999816/06/02 20:10:04 INFO logger.type: Alluxio client (version 1.0.1) is trying to connect with BlockMasterClient master @ name1.poc.local/
172.16.100.156:1999816/06/02 20:10:04 INFO logger.type: Alluxio client (version 1.0.1) is trying to connect with BlockMasterClient master @ name1.poc.local/
172.16.100.156:1999816/06/02 20:10:04 INFO logger.type: Alluxio client (version 1.0.1) is trying to connect with BlockMasterClient master @ name1.poc.local/
172.16.100.156:1999816/06/02 20:10:04 INFO logger.type: Alluxio client (version 1.0.1) is trying to connect with BlockMasterClient master @ name1.poc.local/
172.16.100.156:1999816/06/02 20:10:04 INFO logger.type: Alluxio client (version 1.0.1) is trying to connect with BlockMasterClient master @ name1.poc.local/
172.16.100.156:1999816/06/02 20:10:04 INFO logger.type: Client registered with BlockMasterClient master @ name1.poc.local/
172.16.100.156:1999816/06/02 20:10:04 INFO logger.type: Client registered with BlockMasterClient master @ name1.poc.local/
172.16.100.156:1999816/06/02 20:10:04 INFO logger.type: Client registered with BlockMasterClient master @ name1.poc.local/
172.16.100.156:1999816/06/02 20:10:04 INFO logger.type: Client registered with BlockMasterClient master @ name1.poc.local/
172.16.100.156:1999816/06/02 20:10:04 INFO logger.type: Alluxio client (version 1.0.1) is trying to connect with BlockMasterClient master @ name1.poc.local/
172.16.100.156:1999816/06/02 20:10:04 INFO logger.type: Alluxio client (version 1.0.1) is trying to connect with BlockMasterClient master @ name1.poc.local/
172.16.100.156:1999816/06/02 20:10:04 INFO logger.type: Client registered with BlockMasterClient master @ name1.poc.local/
172.16.100.156:1999816/06/02 20:10:04 INFO logger.type: Client registered with BlockMasterClient master @ name1.poc.local/
172.16.100.156:1999816/06/02 20:10:04 INFO logger.type: Client registered with BlockMasterClient master @ name1.poc.local/
172.16.100.156:1999816/06/02 20:10:04 INFO logger.type: Client registered with BlockMasterClient master @ name1.poc.local/
172.16.100.156:1999816/06/02 20:10:04 INFO logger.type: Client registered with BlockMasterClient master @ name1.poc.local/
172.16.100.156:1999816/06/02 20:10:04 INFO logger.type: Client registered with BlockMasterClient master @ name1.poc.local/
172.16.100.156:1999816/06/02 20:10:05 INFO logger.type: Connecting to remote worker @ data18.poc.local/
172.16.100.111:2999816/06/02 20:10:05 INFO logger.type: Connecting to local worker @ data5.poc.local/
172.16.100.110:2999816/06/02 20:10:05 INFO logger.type: Connecting to remote worker @ data17.poc.local/
172.16.100.107:2999816/06/02 20:10:05 INFO logger.type: Connecting to remote worker @ data23.poc.local/
172.16.100.114:2999816/06/02 20:10:05 INFO logger.type: Connecting to remote worker @ data19.poc.local/
172.16.100.113:2999816/06/02 20:10:05 INFO logger.type: Connecting to remote worker @ data23.poc.local/
172.16.100.114:2999816/06/02 20:10:05 INFO logger.type: Connecting to remote worker @ data14.poc.local/
172.16.100.119:2999816/06/02 20:10:05 INFO logger.type: Connecting to remote worker @ data7.poc.local/
172.16.100.159:2999816/06/02 20:10:05 INFO logger.type: Connecting to remote worker @ data18.poc.local/
172.16.100.111:2999816/06/02 20:10:05 INFO logger.type: Connecting to remote worker @ data13.poc.local/
172.16.100.121:2999816/06/02 20:10:05 INFO logger.type: Connecting to local worker @ data5.poc.local/
172.16.100.110:2999816/06/02 20:10:05 INFO logger.type: Connected to remote machine data19.poc.local/
172.16.100.113:2999916/06/02 20:10:05 INFO logger.type: Connected to remote machine data13.poc.local/
172.16.100.121:2999916/06/02 20:10:05 INFO logger.type: Connected to remote machine data23.poc.local/
172.16.100.114:2999916/06/02 20:10:05 INFO logger.type: Connected to remote machine data7.poc.local/
172.16.100.159:2999916/06/02 20:10:05 INFO logger.type: Connected to remote machine data18.poc.local/
172.16.100.111:2999916/06/02 20:10:05 INFO logger.type: Connected to remote machine data18.poc.local/
172.16.100.111:2999916/06/02 20:10:05 INFO logger.type: Connected to remote machine data14.poc.local/
172.16.100.119:2999916/06/02 20:10:05 INFO logger.type: Connected to remote machine data23.poc.local/
172.16.100.114:2999916/06/02 20:10:05 INFO logger.type: Connected to remote machine data17.poc.local/
172.16.100.107:2999916/06/02 20:10:05 INFO logger.type: Data 6660554752 from remote machine data13.poc.local/
172.16.100.121:29999 received
16/06/02 20:10:05 INFO logger.type: Data 6375342083 from remote machine data17.poc.local/
172.16.100.107:29999 received
16/06/02 20:10:05 INFO logger.type: Data 1845493763 from remote machine data23.poc.local/
172.16.100.114:29999 received
16/06/02 20:10:05 INFO logger.type: Data 7365197827 from remote machine data19.poc.local/
172.16.100.113:29999 received
16/06/02 20:10:05 INFO logger.type: Data 419430403 from remote machine data14.poc.local/
172.16.100.119:29999 received
16/06/02 20:10:05 INFO logger.type: Data 3758096387 from remote machine data18.poc.local/
172.16.100.111:29999 received
Making 400 from 100000 sampled records
16/06/02 20:10:06 INFO logger.type: Connecting to local worker @ data5.poc.local/
172.16.100.110:29998Computing parititions took 1290ms
Spent 2140ms computing partitions.
16/06/02 20:10:08 INFO logger.type: getFileStatus(alluxio://name1.poc.local:19998/output_1TB/_partition.lst#_partition.lst): HDFS Path: hdfs://name1.poc.local:8020/alluxio/output_1TB/_partition.lst Alluxio P ath: alluxio://name1.poc.local:19998/output_1TB/_partition.lst
16/06/02 20:10:08 INFO logger.type: getFileStatus(/output_1TB/_partition.lst): HDFS Path: hdfs://name1.poc.local:8020/alluxio/output_1TB/_partition.lst Alluxio Path: alluxio://name1.poc.local:19998/output_1T B/_partition.lst
16/06/02 20:10:08 INFO mapreduce.JobSubmitter: number of splits:2000
16/06/02 20:10:08 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1464897966543_0002
16/06/02 20:10:09 INFO logger.type: getWorkingDirectory: /
16/06/02 20:10:09 INFO logger.type: getFileStatus(alluxio://name1.poc.local:19998/output_1TB/_partition.lst#_partition.lst): HDFS Path: hdfs://name1.poc.local:8020/alluxio/output_1TB/_partition.lst Alluxio P ath: alluxio://name1.poc.local:19998/output_1TB/_partition.lst
16/06/02 20:10:09 INFO logger.type: getWorkingDirectory: /
16/06/02 20:10:09 INFO logger.type: getFileStatus(alluxio://name1.poc.local:19998/output_1TB/_partition.lst#_partition.lst): HDFS Path: hdfs://name1.poc.local:8020/alluxio/output_1TB/_partition.lst Alluxio P ath: alluxio://name1.poc.local:19998/output_1TB/_partition.lst
16/06/02 20:10:09 INFO impl.YarnClientImpl: Submitted application application_1464897966543_0002
16/06/02 20:10:09 INFO mapreduce.Job: Running job: job_1464897966543_0002
16/06/02 20:10:12 INFO mapreduce.Job: Job job_1464897966543_0002 running in uber mode : false
16/06/02 20:10:12 INFO mapreduce.Job: map 0% reduce 0%
16/06/02 20:10:12 INFO mapreduce.Job: Job job_1464897966543_0002 failed with state FAILED due to: Application application_1464897966543_0002 failed 2 times due to AM Container for appattempt_1464897966543_00 02_000002 exited with exitCode: -1000
Diagnostics: java.lang.ClassNotFoundException: Class alluxio.hadoop.FileSystem not found
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class alluxio.hadoop.FileSystem not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2199)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2670)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2690)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2733)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2715)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:382)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:249)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:61)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:357)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:356)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: Class alluxio.hadoop.FileSystem not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2105)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2197)
... 22 more
Caused by: Class alluxio.hadoop.FileSystem not found
java.lang.ClassNotFoundException: Class alluxio.hadoop.FileSystem not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2105)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2197)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2670)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2690)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2733)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2715)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:382)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:249)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:61)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:357)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:356)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Failing this attempt. Failing the application.
16/06/02 20:10:12 INFO mapreduce.Job: Counters: 0
16/06/02 20:10:12 INFO terasort.TeraSort: done
real 0m11.090s
user 0m15.378s
sys 0m1.241s
Any ideas or paths I should investigate? Any help is greatly appreciated. Thanks.