Problem Running Terasort from Cloudera Hadoop

107 views
Skip to first unread message

Kris Applegate

unread,
Jun 2, 2016, 4:34:02 PM6/2/16
to Alluxio Users
OK, here's the problem in a nutshell. I have my cluster setup and can successfully run a teragen. However, when I run a terasort on the same data, it says that the Alluxio FS isn't found.

Steps:
1. Install Cloudera 5.7 - 20x Datanodes with 11x 7.2K HDD and 384 GB RAM (limited YARN to 100GB)
2. Install Alluxio 1.0.1 (CDH5 binary, but would this work with 5.7?) - Give it 128GB each node
3. Add the data to core-site.xml per install
<property>
  <name>fs.alluxio.impl</name>
  <value>alluxio.hadoop.FileSystem</value>
</property>
<property>
  <name>fs.alluxio-ft.impl</name>
  <value>alluxio.hadoop.FaultTolerantFileSystem</value>
</property>
4. I distribute the client dependencies JAR to each node (same spot on each) and I add -libjars to point to it.
5. I added an override to CLASSPATH in Cloudera Manager
mapreduce.application.classpath
yarn.application.classpath

Everything LOOKs ok. Though I think I may have some redundant steps (both libjar and copying). Here's the kicker. I can run teragen:
time hadoop jar /opt/cloudera/parcels/CDH/jars/hadoop-examples.jar teragen -libjars /opt/cloudera/parcels/CDH/lib/alluxio-core-client-1.0.1-jar-with-dependencies.jar -D mapred.map.tasks=500 10000000000 alluxio://name1.poc.local:19998/input_1TB
SUCCESS!

But when I run Terasort:
[root@data5 ~]# time hadoop jar /opt/cloudera/parcels/CDH/jars/hadoop-examples.jar terasort -libjars /opt/cloudera/parcels/CDH/lib/alluxio-core-client-1.0.1-jar-with-dependencies.jar -D mapred.map.tasks=500                               alluxio://name1.poc.local:19998/input_1TB alluxio://name1.poc.local:19998/output_1TB
16/06/02 20:10:03 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
16/06/02 20:10:03 INFO terasort.TeraSort: starting
16/06/02 20:10:03 INFO logger.type: initialize(alluxio://name1.poc.local:19998/input_1TB, Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml,                               hdfs-default.xml, hdfs-site.xml). Connecting to Alluxio: alluxio://name1.poc.local:19998/input_1TB
16/06/02 20:10:03 INFO logger.type: Loading Alluxio properties from Hadoop configuration: {}
16/06/02 20:10:03 INFO logger.type: Alluxio client (version 1.0.1) is trying to connect with FileSystemMasterClient master @ name1.poc.local/172.16.100.156:19998
16/06/02 20:10:03 INFO logger.type: Client registered with FileSystemMasterClient master @ name1.poc.local/172.16.100.156:19998
16/06/02 20:10:04 INFO logger.type: alluxio://name1.poc.local:19998 alluxio://name1.poc.local:19998 hdfs://name1.poc.local:8020/alluxio
16/06/02 20:10:04 INFO logger.type: getWorkingDirectory: /
16/06/02 20:10:04 INFO logger.type: getWorkingDirectory: /
16/06/02 20:10:04 INFO logger.type: getFileStatus(alluxio://name1.poc.local:19998/input_1TB): HDFS Path: hdfs://name1.poc.local:8020/alluxio/input_1TB Alluxio Path: alluxio://name1.poc.local:19998/input_1TB
16/06/02 20:10:04 INFO logger.type: listStatus(alluxio://name1.poc.local:19998/input_1TB): HDFS Path: hdfs://name1.poc.local:8020/alluxio/input_1TB
16/06/02 20:10:04 INFO input.FileInputFormat: Total input paths to process : 500
Spent 821ms computing base-splits.
Spent 22ms computing TeraScheduler splits.
Computing input splits took 845ms
Sampling 10 splits of 2000
16/06/02 20:10:04 INFO logger.type: create(alluxio://name1.poc.local:19998/output_1TB/_partition.lst, rw-r--r--, true, 65536, 10, 536870912, null)
16/06/02 20:10:04 INFO logger.type: open(alluxio://name1.poc.local:19998/input_1TB/part-m-00078, 65536)
16/06/02 20:10:04 INFO logger.type: open(alluxio://name1.poc.local:19998/input_1TB/part-m-00483, 65536)
16/06/02 20:10:04 INFO logger.type: open(alluxio://name1.poc.local:19998/input_1TB/part-m-00420, 65536)
16/06/02 20:10:04 INFO logger.type: open(alluxio://name1.poc.local:19998/input_1TB/part-m-00334, 65536)
16/06/02 20:10:04 INFO logger.type: open(alluxio://name1.poc.local:19998/input_1TB/part-m-00200, 65536)
16/06/02 20:10:04 INFO logger.type: open(alluxio://name1.poc.local:19998/input_1TB/part-m-00329, 65536)
16/06/02 20:10:04 INFO logger.type: open(alluxio://name1.poc.local:19998/input_1TB/part-m-00389, 65536)
16/06/02 20:10:04 INFO logger.type: open(alluxio://name1.poc.local:19998/input_1TB/part-m-00324, 65536)
16/06/02 20:10:04 INFO logger.type: open(alluxio://name1.poc.local:19998/input_1TB/part-m-00048, 65536)
16/06/02 20:10:04 INFO logger.type: open(alluxio://name1.poc.local:19998/input_1TB/part-m-00479, 65536)
16/06/02 20:10:04 INFO logger.type: Alluxio client (version 1.0.1) is trying to connect with FileSystemMasterClient master @ name1.poc.local/172.16.100.156:19998
16/06/02 20:10:04 INFO logger.type: Alluxio client (version 1.0.1) is trying to connect with FileSystemMasterClient master @ name1.poc.local/172.16.100.156:19998
16/06/02 20:10:04 INFO logger.type: Alluxio client (version 1.0.1) is trying to connect with FileSystemMasterClient master @ name1.poc.local/172.16.100.156:19998
16/06/02 20:10:04 INFO logger.type: Alluxio client (version 1.0.1) is trying to connect with FileSystemMasterClient master @ name1.poc.local/172.16.100.156:19998
16/06/02 20:10:04 INFO logger.type: Client registered with FileSystemMasterClient master @ name1.poc.local/172.16.100.156:19998
16/06/02 20:10:04 INFO logger.type: Alluxio client (version 1.0.1) is trying to connect with FileSystemMasterClient master @ name1.poc.local/172.16.100.156:19998
16/06/02 20:10:04 INFO logger.type: Alluxio client (version 1.0.1) is trying to connect with FileSystemMasterClient master @ name1.poc.local/172.16.100.156:19998
16/06/02 20:10:04 INFO logger.type: Client registered with FileSystemMasterClient master @ name1.poc.local/172.16.100.156:19998
16/06/02 20:10:04 INFO logger.type: Alluxio client (version 1.0.1) is trying to connect with FileSystemMasterClient master @ name1.poc.local/172.16.100.156:19998
16/06/02 20:10:04 INFO logger.type: Alluxio client (version 1.0.1) is trying to connect with FileSystemMasterClient master @ name1.poc.local/172.16.100.156:19998
16/06/02 20:10:04 INFO logger.type: Alluxio client (version 1.0.1) is trying to connect with FileSystemMasterClient master @ name1.poc.local/172.16.100.156:19998
16/06/02 20:10:04 INFO logger.type: Client registered with FileSystemMasterClient master @ name1.poc.local/172.16.100.156:19998
16/06/02 20:10:04 INFO logger.type: Client registered with FileSystemMasterClient master @ name1.poc.local/172.16.100.156:19998
16/06/02 20:10:04 INFO logger.type: Client registered with FileSystemMasterClient master @ name1.poc.local/172.16.100.156:19998
16/06/02 20:10:04 INFO logger.type: Client registered with FileSystemMasterClient master @ name1.poc.local/172.16.100.156:19998
16/06/02 20:10:04 INFO logger.type: Client registered with FileSystemMasterClient master @ name1.poc.local/172.16.100.156:19998
16/06/02 20:10:04 INFO logger.type: Client registered with FileSystemMasterClient master @ name1.poc.local/172.16.100.156:19998
16/06/02 20:10:04 INFO logger.type: Client registered with FileSystemMasterClient master @ name1.poc.local/172.16.100.156:19998
16/06/02 20:10:04 INFO logger.type: Alluxio client (version 1.0.1) is trying to connect with BlockMasterClient master @ name1.poc.local/172.16.100.156:19998
16/06/02 20:10:04 INFO logger.type: Alluxio client (version 1.0.1) is trying to connect with BlockMasterClient master @ name1.poc.local/172.16.100.156:19998
16/06/02 20:10:04 INFO logger.type: Alluxio client (version 1.0.1) is trying to connect with BlockMasterClient master @ name1.poc.local/172.16.100.156:19998
16/06/02 20:10:04 INFO logger.type: Alluxio client (version 1.0.1) is trying to connect with BlockMasterClient master @ name1.poc.local/172.16.100.156:19998
16/06/02 20:10:04 INFO logger.type: Alluxio client (version 1.0.1) is trying to connect with BlockMasterClient master @ name1.poc.local/172.16.100.156:19998
16/06/02 20:10:04 INFO logger.type: Alluxio client (version 1.0.1) is trying to connect with BlockMasterClient master @ name1.poc.local/172.16.100.156:19998
16/06/02 20:10:04 INFO logger.type: Alluxio client (version 1.0.1) is trying to connect with BlockMasterClient master @ name1.poc.local/172.16.100.156:19998
16/06/02 20:10:04 INFO logger.type: Alluxio client (version 1.0.1) is trying to connect with BlockMasterClient master @ name1.poc.local/172.16.100.156:19998
16/06/02 20:10:04 INFO logger.type: Client registered with BlockMasterClient master @ name1.poc.local/172.16.100.156:19998
16/06/02 20:10:04 INFO logger.type: Client registered with BlockMasterClient master @ name1.poc.local/172.16.100.156:19998
16/06/02 20:10:04 INFO logger.type: Client registered with BlockMasterClient master @ name1.poc.local/172.16.100.156:19998
16/06/02 20:10:04 INFO logger.type: Client registered with BlockMasterClient master @ name1.poc.local/172.16.100.156:19998
16/06/02 20:10:04 INFO logger.type: Alluxio client (version 1.0.1) is trying to connect with BlockMasterClient master @ name1.poc.local/172.16.100.156:19998
16/06/02 20:10:04 INFO logger.type: Alluxio client (version 1.0.1) is trying to connect with BlockMasterClient master @ name1.poc.local/172.16.100.156:19998
16/06/02 20:10:04 INFO logger.type: Client registered with BlockMasterClient master @ name1.poc.local/172.16.100.156:19998
16/06/02 20:10:04 INFO logger.type: Client registered with BlockMasterClient master @ name1.poc.local/172.16.100.156:19998
16/06/02 20:10:04 INFO logger.type: Client registered with BlockMasterClient master @ name1.poc.local/172.16.100.156:19998
16/06/02 20:10:04 INFO logger.type: Client registered with BlockMasterClient master @ name1.poc.local/172.16.100.156:19998
16/06/02 20:10:04 INFO logger.type: Client registered with BlockMasterClient master @ name1.poc.local/172.16.100.156:19998
16/06/02 20:10:04 INFO logger.type: Client registered with BlockMasterClient master @ name1.poc.local/172.16.100.156:19998
16/06/02 20:10:05 INFO logger.type: Connecting to remote worker @ data18.poc.local/172.16.100.111:29998
16/06/02 20:10:05 INFO logger.type: Connecting to local worker @ data5.poc.local/172.16.100.110:29998
16/06/02 20:10:05 INFO logger.type: Connecting to remote worker @ data17.poc.local/172.16.100.107:29998
16/06/02 20:10:05 INFO logger.type: Connecting to remote worker @ data23.poc.local/172.16.100.114:29998
16/06/02 20:10:05 INFO logger.type: Connecting to remote worker @ data19.poc.local/172.16.100.113:29998
16/06/02 20:10:05 INFO logger.type: Connecting to remote worker @ data23.poc.local/172.16.100.114:29998
16/06/02 20:10:05 INFO logger.type: Connecting to remote worker @ data14.poc.local/172.16.100.119:29998
16/06/02 20:10:05 INFO logger.type: Connecting to remote worker @ data7.poc.local/172.16.100.159:29998
16/06/02 20:10:05 INFO logger.type: Connecting to remote worker @ data18.poc.local/172.16.100.111:29998
16/06/02 20:10:05 INFO logger.type: Connecting to remote worker @ data13.poc.local/172.16.100.121:29998
16/06/02 20:10:05 INFO logger.type: Connecting to local worker @ data5.poc.local/172.16.100.110:29998
16/06/02 20:10:05 INFO logger.type: Connected to remote machine data19.poc.local/172.16.100.113:29999
16/06/02 20:10:05 INFO logger.type: Connected to remote machine data13.poc.local/172.16.100.121:29999
16/06/02 20:10:05 INFO logger.type: Connected to remote machine data23.poc.local/172.16.100.114:29999
16/06/02 20:10:05 INFO logger.type: Connected to remote machine data7.poc.local/172.16.100.159:29999
16/06/02 20:10:05 INFO logger.type: Connected to remote machine data18.poc.local/172.16.100.111:29999
16/06/02 20:10:05 INFO logger.type: Connected to remote machine data18.poc.local/172.16.100.111:29999
16/06/02 20:10:05 INFO logger.type: Connected to remote machine data14.poc.local/172.16.100.119:29999
16/06/02 20:10:05 INFO logger.type: Connected to remote machine data23.poc.local/172.16.100.114:29999
16/06/02 20:10:05 INFO logger.type: Connected to remote machine data17.poc.local/172.16.100.107:29999
16/06/02 20:10:05 INFO logger.type: Data 2768240643 from remote machine data7.poc.local/172.16.100.159:29999 received
16/06/02 20:10:05 INFO logger.type: Data 6660554752 from remote machine data13.poc.local/172.16.100.121:29999 received
16/06/02 20:10:05 INFO logger.type: Data 2046820355 from remote machine data23.poc.local/172.16.100.114:29999 received
16/06/02 20:10:05 INFO logger.type: Data 6375342083 from remote machine data17.poc.local/172.16.100.107:29999 received
16/06/02 20:10:05 INFO logger.type: Data 1845493763 from remote machine data23.poc.local/172.16.100.114:29999 received
16/06/02 20:10:05 INFO logger.type: Data 7365197827 from remote machine data19.poc.local/172.16.100.113:29999 received
16/06/02 20:10:05 INFO logger.type: Data 3523215363 from remote machine data18.poc.local/172.16.100.111:29999 received
16/06/02 20:10:05 INFO logger.type: Data 419430403 from remote machine data14.poc.local/172.16.100.119:29999 received
16/06/02 20:10:05 INFO logger.type: Data 3758096387 from remote machine data18.poc.local/172.16.100.111:29999 received
Making 400 from 100000 sampled records
16/06/02 20:10:06 INFO logger.type: Connecting to local worker @ data5.poc.local/172.16.100.110:29998
Computing parititions took 1290ms
Spent 2140ms computing partitions.
16/06/02 20:10:08 INFO logger.type: getFileStatus(alluxio://name1.poc.local:19998/output_1TB/_partition.lst#_partition.lst): HDFS Path: hdfs://name1.poc.local:8020/alluxio/output_1TB/_partition.lst Alluxio P                              ath: alluxio://name1.poc.local:19998/output_1TB/_partition.lst
16/06/02 20:10:08 INFO logger.type: getFileStatus(/output_1TB/_partition.lst): HDFS Path: hdfs://name1.poc.local:8020/alluxio/output_1TB/_partition.lst Alluxio Path: alluxio://name1.poc.local:19998/output_1T                              B/_partition.lst
16/06/02 20:10:08 INFO mapreduce.JobSubmitter: number of splits:2000
16/06/02 20:10:08 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1464897966543_0002
16/06/02 20:10:09 INFO logger.type: getWorkingDirectory: /
16/06/02 20:10:09 INFO logger.type: getFileStatus(alluxio://name1.poc.local:19998/output_1TB/_partition.lst#_partition.lst): HDFS Path: hdfs://name1.poc.local:8020/alluxio/output_1TB/_partition.lst Alluxio P                              ath: alluxio://name1.poc.local:19998/output_1TB/_partition.lst
16/06/02 20:10:09 INFO logger.type: getWorkingDirectory: /
16/06/02 20:10:09 INFO logger.type: getFileStatus(alluxio://name1.poc.local:19998/output_1TB/_partition.lst#_partition.lst): HDFS Path: hdfs://name1.poc.local:8020/alluxio/output_1TB/_partition.lst Alluxio P                              ath: alluxio://name1.poc.local:19998/output_1TB/_partition.lst
16/06/02 20:10:09 INFO impl.YarnClientImpl: Submitted application application_1464897966543_0002
16/06/02 20:10:09 INFO mapreduce.Job: The url to track the job: http://name3.poc.local:8088/proxy/application_1464897966543_0002/
16/06/02 20:10:09 INFO mapreduce.Job: Running job: job_1464897966543_0002
16/06/02 20:10:12 INFO mapreduce.Job: Job job_1464897966543_0002 running in uber mode : false
16/06/02 20:10:12 INFO mapreduce.Job:  map 0% reduce 0%
16/06/02 20:10:12 INFO mapreduce.Job: Job job_1464897966543_0002 failed with state FAILED due to: Application application_1464897966543_0002 failed 2 times due to AM Container for appattempt_1464897966543_00                              02_000002 exited with  exitCode: -1000
For more detailed output, check application tracking page:http://name3.poc.local:8088/proxy/application_1464897966543_0002/Then, click on links to logs of each attempt.
Diagnostics: java.lang.ClassNotFoundException: Class alluxio.hadoop.FileSystem not found
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class alluxio.hadoop.FileSystem not found
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2199)
        at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2670)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2690)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2733)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2715)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:382)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
        at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:249)
        at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:61)
        at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
        at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:357)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:356)
        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: Class alluxio.hadoop.FileSystem not found
        at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2105)
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2197)
        ... 22 more
Caused by: Class alluxio.hadoop.FileSystem not found
java.lang.ClassNotFoundException: Class alluxio.hadoop.FileSystem not found
        at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2105)
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2197)
        at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2670)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2690)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2733)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2715)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:382)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
        at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:249)
        at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:61)
        at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
        at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:357)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:356)
        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

Failing this attempt. Failing the application.
16/06/02 20:10:12 INFO mapreduce.Job: Counters: 0
16/06/02 20:10:12 INFO terasort.TeraSort: done

real    0m11.090s
user    0m15.378s
sys     0m1.241s


Any ideas or paths I should investigate? Any help is greatly appreciated. Thanks.

Gene Pang

unread,
Jun 3, 2016, 11:30:17 AM6/3/16
to Alluxio Users
Hi Kris,

I don't know if this is the same issue I have seen before. I had an issue with the default terasort partitioner, so I had to use the simple partitioner to allow terasort to complete: "-Dmapreduce.terasort.simplepartitioner=true"

I think the issue with the default partitioner it writes a file for the partitioning information, and that was not playing nice with the classpath. I never figured out how to get that to work. If you can discover the solution, I would love to hear about it!

I hope that helps,
Gene

Kris Applegate

unread,
Jun 3, 2016, 2:37:13 PM6/3/16
to Alluxio Users
I could kiss you on the lips. That worked perfectly! Thanks a ton.

Kris Applegate

unread,
Jun 3, 2016, 4:04:58 PM6/3/16
to Alluxio Users
Spoke too soon:

time hadoop jar /opt/cloudera/parcels/CDH/jars/hadoop-examples.jar terasort -libjars /opt/cloudera/parcels/CDH/lib/alluxio-core-client-1.0.1-jar-with-dependencies.jar -D mapred.map.tasks=500 -Dmapreduce.terasort.simplepartitioner=true alluxio://name1.poc.local:19998/input_1TBa alluxio://name1.poc.local:19998/output_1TBa



16/06/03 20:03:36 INFO mapreduce.Job:  map 100% reduce 13%
16/06/03 20:03:37 INFO mapreduce.Job:  map 100% reduce 14%
16/06/03 20:03:38 INFO mapreduce.Job:  map 100% reduce 15%
16/06/03 20:03:39 INFO mapreduce.Job:  map 100% reduce 93%
16/06/03 20:03:40 INFO mapreduce.Job:  map 100% reduce 100%
16/06/03 20:03:40 INFO mapreduce.Job: Job job_1464897966543_0010 failed with state FAILED due to: Task failed task_1464897966543_0010_m_001872
Job failed as tasks failed. failedMaps:1 failedReduces:0

16/06/03 20:03:40 INFO mapreduce.Job: Counters: 46
        File System Counters
                ALLUXIO: Number of bytes read=996495515200
                ALLUXIO: Number of bytes written=0
                ALLUXIO: Number of read operations=3982
                ALLUXIO: Number of large read operations=0
                ALLUXIO: Number of write operations=0
                FILE: Number of bytes read=541436344811
                FILE: Number of bytes written=883016116324
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=238920
                HDFS: Number of bytes written=0
                HDFS: Number of read operations=1991
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=0
        Job Counters
                Failed map tasks=28
                Killed map tasks=8
                Killed reduce tasks=400
                Launched map tasks=2027
                Launched reduce tasks=400
                Other local map tasks=27
                Data-local map tasks=1989
                Rack-local map tasks=11
                Total time spent by all maps in occupied slots (ms)=95094081
                Total time spent by all reduces in occupied slots (ms)=13703045
                Total time spent by all map tasks (ms)=95094081
                Total time spent by all reduce tasks (ms)=13703045
                Total vcore-seconds taken by all map tasks=95094081
                Total vcore-seconds taken by all reduce tasks=13703045
                Total megabyte-seconds taken by all map tasks=97376338944
                Total megabyte-seconds taken by all reduce tasks=14031918080
        Map-Reduce Framework
                Map input records=9964955152
                Map output records=9964955152
                Map output bytes=1016425425504
                Map output materialized bytes=442644956926
                Input split bytes=238920
                Combine input records=0
                Spilled Records=19929910304
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=978292
                CPU time spent (ms)=76633470
                Physical memory (bytes) snapshot=1274137550848
                Virtual memory (bytes) snapshot=3358548094976
                Total committed heap usage (bytes)=1640766636032
        File Input Format Counters
                Bytes Read=996495515200
16/06/03 20:03:40 INFO terasort.TeraSort: done

Gene Pang

unread,
Jun 6, 2016, 6:48:39 PM6/6/16
to Alluxio Users
Hi Kris,

Could you check to see what happened with the output? It seems like both map and reduce both completed 100%. It is possible that the failed task was just restarted elsewhere and completed successfully.

Thanks,
Gene
Reply all
Reply to author
Forward
0 new messages