Alluxio -Hive - Tez - HDP

368 views
Skip to first unread message

Arivoli Murugan

unread,
May 28, 2017, 11:15:48 PM5/28/17
to Alluxio Users
Hi Team,

Currently I'm testing alluxio in HDP 2.3 and 2.4, both are clusters with 3 nodes each.

I was able to run mapreduce test without issues.
When I try to launch hive cli from console I land into below issues.

Mapreduce programs runs without issues.

Configured Hive as per the above instructions,

I have installed Alluxio using root user:, when launching hive as root user:

Scenario1:

root@ip-192-168-0-155[~] # hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/mnt/hdp/2.3.4.0-3485/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/mnt/hdp/2.3.4.0-3485/hive/lib/alluxio-community-1.4.0-hadoop-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/mnt/hdp/2.3.4.0-3485/tez/lib/alluxio-community-1.4.0-hadoop-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
WARNING: Use "yarn jar" to launch YARN applications.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/mnt/hdp/2.3.4.0-3485/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/mnt/hdp/2.3.4.0-3485/hive/lib/alluxio-community-1.4.0-hadoop-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/mnt/hdp/2.3.4.0-3485/tez/lib/alluxio-community-1.4.0-hadoop-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

Logging initialized using configuration in file:/etc/hive/2.3.4.0-3485/0/hive-log4j.properties
Exception in thread "main" java.lang.RuntimeException: org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. Application application_1495997160588_0034 failed 2 times due to AM Container for appattempt_1495997160588_0034_000002 exited with  exitCode: -1000
For more detailed output, check application tracking page:http://ip-192-168-0-155.eu-west-1.compute.internal:8088/cluster/app/application_1495997160588_0034Then, click on links to logs of each attempt.
Diagnostics: Block 50365202432 is not available in Alluxio
java.io.IOException: Block 50365202432 is not available in Alluxio
        at alluxio.client.block.AlluxioBlockStore.getInStream(AlluxioBlockStore.java:139)
        at alluxio.client.file.FileInStream.getBlockInStream(FileInStream.java:594)
        at alluxio.client.file.FileInStream.updateBlockInStream(FileInStream.java:574)
        at alluxio.client.file.FileInStream.updateStreams(FileInStream.java:491)
        at alluxio.client.file.FileInStream.close(FileInStream.java:159)
        at alluxio.hadoop.HdfsFileInputStream.read(HdfsFileInputStream.java:199)
        at java.io.DataInputStream.read(DataInputStream.java:100)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:59)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119)
        at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:366)
        at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:267)
        at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
        at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
        at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:358)
        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Failing this attempt. Failing the application.
        at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:507)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:680)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:624)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. Application application_1495997160588_0034 failed 2 times due to AM Container for appattempt_1495997160588_0034_000002 exited with  exitCode: -1000
For more detailed output, check application tracking page:http://ip-192-168-0-155.eu-west-1.compute.internal:8088/cluster/app/application_1495997160588_0034Then, click on links to logs of each attempt.
Diagnostics: Block 50365202432 is not available in Alluxio
java.io.IOException: Block 50365202432 is not available in Alluxio
        at alluxio.client.block.AlluxioBlockStore.getInStream(AlluxioBlockStore.java:139)
        at alluxio.client.file.FileInStream.getBlockInStream(FileInStream.java:594)
        at alluxio.client.file.FileInStream.updateBlockInStream(FileInStream.java:574)
        at alluxio.client.file.FileInStream.updateStreams(FileInStream.java:491)
        at alluxio.client.file.FileInStream.close(FileInStream.java:159)
        at alluxio.hadoop.HdfsFileInputStream.read(HdfsFileInputStream.java:199)
        at java.io.DataInputStream.read(DataInputStream.java:100)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:59)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119)
        at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:366)
        at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:267)
        at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
        at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
        at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:358)
        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Failing this attempt. Failing the application.
        at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:726)
        at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:217)
        at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:117)
        at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:504)
        ... 8 more
root@ip-192-168-0-155[~] #

Scenario2:

When launching from hdfs user:
hdfs@ip-192-168-0-155[~] $ hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/mnt/hdp/2.3.4.0-3485/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/mnt/hdp/2.3.4.0-3485/hive/lib/alluxio-community-1.4.0-hadoop-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/mnt/hdp/2.3.4.0-3485/tez/lib/alluxio-community-1.4.0-hadoop-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
WARNING: Use "yarn jar" to launch YARN applications.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/mnt/hdp/2.3.4.0-3485/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/mnt/hdp/2.3.4.0-3485/hive/lib/alluxio-community-1.4.0-hadoop-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/mnt/hdp/2.3.4.0-3485/tez/lib/alluxio-community-1.4.0-hadoop-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

Logging initialized using configuration in file:/etc/hive/2.3.4.0-3485/0/hive-log4j.properties
Exception in thread "main" java.lang.RuntimeException: org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. Application application_1495997160588_0033 failed 2 times due to AM Container for appattempt_1495997160588_0033_000002 exited with  exitCode: 1
For more detailed output, check application tracking page:http://ip-192-168-0-155.eu-west-1.compute.internal:8088/cluster/app/application_1495997160588_0033Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e14_1495997160588_0033_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:576)
        at org.apache.hadoop.util.Shell.run(Shell.java:487)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:753)
        at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)


Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
        at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:507)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:680)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:624)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. Application application_1495997160588_0033 failed 2 times due to AM Container for appattempt_1495997160588_0033_000002 exited with  exitCode: 1
For more detailed output, check application tracking page:http://ip-192-168-0-155.eu-west-1.compute.internal:8088/cluster/app/application_1495997160588_0033Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e14_1495997160588_0033_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:576)
        at org.apache.hadoop.util.Shell.run(Shell.java:487)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:753)
        at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)


Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
        at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:726)
        at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:217)
        at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:117)
        at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:504)
        ... 8 more

Container logs:

Log Type: syslog

Log Upload Time: Mon May 29 03:00:47 +0000 2017

Log Length: 5942

2017-05-29 03:00:38,859 [INFO] [main] |app.DAGAppMaster|: Creating DAGAppMaster for applicationId=application_1495997160588_0033, attemptNum=1, AMContainerId=container_e14_1495997160588_0033_01_000001, jvmPid=9610, userFromEnv=hdfs, cliSessionOption=true, pwd=/hadoop/yarn/local/usercache/hdfs/appcache/application_1495997160588_0033/container_e14_1495997160588_0033_01_000001, localDirs=/hadoop/yarn/local/usercache/hdfs/appcache/application_1495997160588_0033, logDirs=/hadoop/yarn/log/application_1495997160588_0033/container_e14_1495997160588_0033_01_000001
2017-05-29 03:00:39,286 [INFO] [main] |app.DAGAppMaster|: Created DAGAppMaster for application appattempt_1495997160588_0033_000001, versionInfo=[ component=tez-dag, version=0.7.0.2.3.4.0-3485, revision=1054117556c7c24127c7d0c768323a59a537f6da, SCM-URL=scm:git:https://git-wip-us.apache.org/repos/asf/tez.git, buildTime=20151216-0351 ]
2017-05-29 03:00:39,297 [INFO] [main] |app.DAGAppMaster|: Comparing client version with AM version, clientVersion=0.7.0.2.3.4.0-3485, AMVersion=0.7.0.2.3.4.0-3485
2017-05-29 03:00:39,412 [INFO] [main] |service.AbstractService|: Service org.apache.tez.dag.app.DAGAppMaster failed in state INITED; cause: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class alluxio.hadoop.FileSystem not found
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class alluxio.hadoop.FileSystem not found
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
	at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2638)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2651)
	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
	at org.apache.tez.common.TezCommonUtils.getTezBaseStagingPath(TezCommonUtils.java:86)
	at org.apache.tez.common.TezCommonUtils.getTezSystemStagingPath(TezCommonUtils.java:145)
	at org.apache.tez.dag.app.DAGAppMaster.serviceInit(DAGAppMaster.java:407)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
	at org.apache.tez.dag.app.DAGAppMaster$6.run(DAGAppMaster.java:2274)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
	at org.apache.tez.dag.app.DAGAppMaster.initAndStartAppMaster(DAGAppMaster.java:2271)
	at org.apache.tez.dag.app.DAGAppMaster.main(DAGAppMaster.java:2086)
Caused by: java.lang.ClassNotFoundException: Class alluxio.hadoop.FileSystem not found
	at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193)
	... 17 more
2017-05-29 03:00:39,418 [WARN] [main] |service.AbstractService|: When stopping the service org.apache.tez.dag.app.DAGAppMaster : java.lang.NullPointerException
java.lang.NullPointerException
	at org.apache.tez.dag.app.DAGAppMaster.initiateStop(DAGAppMaster.java:1842)
	at org.apache.tez.dag.app.DAGAppMaster.serviceStop(DAGAppMaster.java:1855)
	at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
	at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
	at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:171)
	at org.apache.tez.dag.app.DAGAppMaster$6.run(DAGAppMaster.java:2274)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
	at org.apache.tez.dag.app.DAGAppMaster.initAndStartAppMaster(DAGAppMaster.java:2271)
	at org.apache.tez.dag.app.DAGAppMaster.main(DAGAppMaster.java:2086)
2017-05-29 03:00:39,419 [ERROR] [main] |app.DAGAppMaster|: Error starting DAGAppMaster
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class alluxio.hadoop.FileSystem not found
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
	at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2638)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2651)
	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
	at org.apache.tez.common.TezCommonUtils.getTezBaseStagingPath(TezCommonUtils.java:86)
	at org.apache.tez.common.TezCommonUtils.getTezSystemStagingPath(TezCommonUtils.java:145)
	at org.apache.tez.dag.app.DAGAppMaster.serviceInit(DAGAppMaster.java:407)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
	at org.apache.tez.dag.app.DAGAppMaster$6.run(DAGAppMaster.java:2274)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
	at org.apache.tez.dag.app.DAGAppMaster.initAndStartAppMaster(DAGAppMaster.java:2271)
	at org.apache.tez.dag.app.DAGAppMaster.main(DAGAppMaster.java:2086)
Caused by: java.lang.ClassNotFoundException: Class alluxio.hadoop.FileSystem not found
	at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193)
	... 17 more
2017-05-29 03:00:39,421 [INFO] [Thread-3] |app.DAGAppMaster|: DAGAppMasterShutdownHook invoked


Any help would be helpful.

Regards,
Arivoli M

Arivoli Murugan

unread,
May 29, 2017, 12:20:42 AM5/29/17
to Alluxio Users
Hi All,

In addition to this, I have tried the eblow.

1) Revoke all configurations of Tez and Hive to default.
2) Launch  hive CLI and create database and external table in Alluxio which succeeded without issue, however we are having issues in Tez engine.

CREATE DATABASE tpcds_orc_3k;

SET fs.alluxio.impl=alluxio.hadoop.FileSystem;
SET fs.alluxio-ft.impl=alluxio.hadoop.FaultTolerantFileSystem;
SET fs.AbstractFileSystem.alluxio.impl=alluxio.hadoop.AlluxioFileSystem;
SET alluxio.user.file.writetype.default=CACHE_THROUGH;
ADD JAR file:///root/alluxio/client/hadoop/alluxio-core-client-1.4.0-jar-with-dependencies.jar
ADD JAR file:///root/alluxio/client/hadoop/alluxio-community-1.4.0-hadoop-client.jar


CREATE EXTERNAL TABLE `tpcds_orc_3k.tmp`( 
`s_store_sk` int, 
`s_store_id` string, 
`s_rec_start_date` timestamp, 
`s_rec_end_date` timestamp, 
`s_closed_date_sk` int, 
`s_store_name` string, 
`s_number_employees` int, 
`s_floor_space` int, 
`s_hours` string, 
`s_manager` string, 
`s_market_id` int, 
`s_geography_class` string, 
`s_market_desc` string, 
`s_market_manager` string, 
`s_division_id` int, 
`s_division_name` string, 
`s_company_id` int, 
`s_company_name` string, 
`s_street_number` string, 
`s_street_name` string, 
`s_street_type` string, 
`s_suite_number` string, 
`s_city` string, 
`s_county` string, 
`s_state` string, 
`s_zip` string, 
`s_country` string, 
`s_gmt_offset` float, 
`s_tax_precentage` float)

ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' 
LOCATION 'alluxio://192.168.0.155:19998/tmp';

hive> describe formatted tmp;
OK
# col_name              data_type               comment

s_store_sk              int
s_store_id              string
s_rec_start_date        timestamp
s_rec_end_date          timestamp
s_closed_date_sk        int
s_store_name            string
s_number_employees      int
s_floor_space           int
s_hours                 string
s_manager               string
s_market_id             int
s_geography_class       string
s_market_desc           string
s_market_manager        string
s_division_id           int
s_division_name         string
s_company_id            int
s_company_name          string
s_street_number         string
s_street_name           string
s_street_type           string
s_suite_number          string
s_city                  string
s_county                string
s_state                 string
s_zip                   string
s_country               string
s_gmt_offset            float
s_tax_precentage        float

# Detailed Table Information
Database:               tpcds_orc_3k
Owner:                  root
CreateTime:             Mon May 29 03:56:57 GMT+00:00 2017
LastAccessTime:         UNKNOWN
Protect Mode:           None
Retention:              0
Location:               alluxio://192.168.0.155:19998/tmp
Table Type:             EXTERNAL_TABLE
Table Parameters:
        COLUMN_STATS_ACCURATE   false
        EXTERNAL                TRUE
        numFiles                0
        numRows                 -1
        rawDataSize             -1
        totalSize               0
        transient_lastDdlTime   1496030217

# Storage Information
SerDe Library:          org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat:            org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
OutputFormat:           org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
Compressed:             No
Num Buckets:            -1
Bucket Columns:         []
Sort Columns:           []
Storage Desc Params:
        serialization.format    1
Time taken: 0.06 seconds, Fetched: 60 row(s)





MR Engine:

hive> set hive.execution.engine=mr;
hive> select count(*) from tmp;
Query ID = root_20170529040706_377fc2b6-946b-4c7d-bcdc-9343c7d460c1
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1495997160588_0039, Tracking URL = http://ip-192-168-0-155.eu                                                             _0039/
Kill Command = /mnt/hdp/2.3.4.0-3485/hadoop/bin/hadoop job  -kill job_1495997160
Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 1
2017-05-29 04:07:12,054 Stage-1 map = 0%,  reduce = 0%
2017-05-29 04:07:17,286 Stage-1 map = 0%,  reduce = 100%, Cumulative CPU 1.91 se
MapReduce Total cumulative CPU time: 1 seconds 910 msec
Ended Job = job_1495997160588_0039
MapReduce Jobs Launched:
Stage-Stage-1: Reduce: 1   Cumulative CPU: 1.91 sec   HDFS Read: 0 HDFS Write: 2
Total MapReduce CPU Time Spent: 1 seconds 910 msec
OK
0
Time taken: 12.338 seconds, Fetched: 1 row(s)
hive> select count(*) from tmp;
Query ID = root_20170529040843_433179fb-9eb2-42fb-bc3e-22bebde847ee
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Kill Command = /mnt/hdp/2.3.4.0-3485/hadoop/bin/hadoop job  -kill job_1495997160588_0040
Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 1
2017-05-29 04:08:48,897 Stage-1 map = 0%,  reduce = 0%
2017-05-29 04:08:54,027 Stage-1 map = 0%,  reduce = 100%, Cumulative CPU 1.95 sec
MapReduce Total cumulative CPU time: 1 seconds 950 msec
Ended Job = job_1495997160588_0040
MapReduce Jobs Launched:
Stage-Stage-1: Reduce: 1   Cumulative CPU: 1.95 sec   HDFS Read: 0 HDFS Write: 2 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 950 msec
OK
0
Time taken: 11.948 seconds, Fetched: 1 row(s)



Tez engine:


hive> set hive.execution.engine=tez;
hive> select count(*) from tmp;
Query ID = root_20170529040925_ae3aee4a-20c5-4861-94d9-b3c518675eb0
Total jobs = 1
Launching Job 1 out of 1


--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1                 FAILED     -1          0        0       -1       0       0
Reducer 2             KILLED      1          0        0        1       0       0
--------------------------------------------------------------------------------
VERTICES: 00/02  [>>--------------------------] 0%    ELAPSED TIME: 1496030848.00 s
--------------------------------------------------------------------------------
Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1495997160588_0038_3_00, diagnostics=[Vertex vertex_1495997160588_0038_3_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: tmp initializer failed, vertex=vertex_1495997160588_0038_3_00 [Map 1], java.lang.RuntimeException: java.lang.ClassNotFoundException: Class alluxio.hadoop.FileSystem not found
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
        at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2638)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2651)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1028)
        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1086)
        at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:305)
        at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:407)
        at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:155)
        at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:255)
        at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:248)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:248)
        at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:235)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: Class alluxio.hadoop.FileSystem not found
        at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193)
        ... 23 more
]
Vertex killed, vertexName=Reducer 2, vertexId=vertex_1495997160588_0038_3_01, diagnostics=[Vertex received Kill in INITED state., Vertex vertex_1495997160588_0038_3_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]
DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1495997160588_0038_3_00, diagnostics=[Vertex vertex_1495997160588_0038_3_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: tmp initializer failed, vertex=vertex_1495997160588_0038_3_00 [Map 1], java.lang.RuntimeException: java.lang.ClassNotFoundException: Class alluxio.hadoop.FileSystem not found
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
        at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2638)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2651)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1028)
        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1086)
        at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:305)
        at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:407)
        at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:155)
        at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:255)
        at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:248)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:248)
        at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:235)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: Class alluxio.hadoop.FileSystem not found
        at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193)
        ... 23 more
]Vertex killed, vertexName=Reducer 2, vertexId=vertex_1495997160588_0038_3_01, diagnostics=[Vertex received Kill in INITED state., Vertex vertex_1495997160588_0038_3_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1
hive>

Arivoli Murugan

unread,
Jun 4, 2017, 1:24:50 AM6/4/17
to Alluxio Users
Hi All,

The mentioned issue has been resolved by following the below steps.

1) Untar tez.tar.gz under  tez  libs.
2) Copy the alluxio jars into the tez jar folder, tar the folder again, upload the tez.tar.gz to alluxio filesystem, Issue resolved !!!!

On Monday, 29 May 2017 03:15:48 UTC, Arivoli Murugan wrote:

Haoyuan Li

unread,
Jun 4, 2017, 1:30:25 PM6/4/17
to Arivoli Murugan, Alluxio Users
Thanks Arivoli for the updates!

Best regards,

Haoyuan


--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to alluxio-users+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Bin Fan

unread,
Jun 5, 2017, 1:15:09 PM6/5/17
to Alluxio Users
Thanks Arivoli for providing your solutio for the "alluxio.hadoop.FileSystem not found" issue

Here is more explanation of why errors like "java.lang.RuntimeException: java.lang.ClassNotFoundException: Class alluxio.hadoop.FileSystem not found" happening,


Meanwhile, do you still have the Hive issue in your very first email?

- Bin

Arivoli Murugan

unread,
Jun 19, 2017, 11:18:21 PM6/19/17
to Alluxio Users
Hi Bin,

I dont have the hive issue anymore.
Thanks for the support.

Regards,
Ariv

Bin Fan

unread,
Jun 22, 2017, 6:29:19 PM6/22/17
to Alluxio Users
Good to know.

Thanks

- Bin

Bin Fan

unread,
Jun 22, 2017, 6:40:44 PM6/22/17
to Alluxio Users
Sorry but just to clarify, has the Hive issue been fixed or you didn't need to solve the Hive issue?

thanks

- Bin 

Pranjul Ahuja

unread,
Sep 21, 2018, 2:33:31 AM9/21/18
to Alluxio Users
HI,
 I am also facing similar issue. Which tar.gz you are talking about here ?

Bin Fan

unread,
Sep 22, 2018, 3:40:29 AM9/22/18
to Alluxio Users
which issue do you face?
something like "Block 50365202432 is not available in Alluxio"
or  "class alluxio.hadoop.FileSystem not found"?

The former can be because the data of a file is written to Alluxio only without being persisted,
and some data block is lost; the latter is due to 
alluxio client jar not set correct on the classpath

Pranjul Ahuja

unread,
Sep 22, 2018, 6:40:56 AM9/22/18
to Alluxio Users
Actually, it was class alluxio.hadoop.FileSystem not found. I have placed the jar in HADOOP_CLASSPATH as well as in HIVE_AUX_JARS_PATH as well as in tez library paths. Hadoop apis are working fine with alluxio Uri and i am able to create an external hive table as well on the top of uri pointing to alluxio. However, whenever i run any query which requires MR or TEZ in hive, it failes with this issue. What am i missing here ? I am able to run the map reduce word count example using -libjars option as well. Only in hive MR/TEZ gives this error.

Arivoli Murugan

unread,
Sep 22, 2018, 8:26:09 AM9/22/18
to ahuj...@gmail.com, alluxi...@googlegroups.com
Hi Pranjul,

Have you tried.

1) Untar tez.tar.gz under  tez  libs.
2) Copy the alluxio jars into the tez jar folder, tar the folder again, upload the tez.tar.gz to alluxio filesystem, Issue resolved !!!!
On Sat, Sep 22, 2018 at 6:40 PM Pranjul Ahuja <ahuj...@gmail.com> wrote:
Actually, it was class alluxio.hadoop.FileSystem not found. I have placed the jar in HADOOP_CLASSPATH as well as in HIVE_AUX_JARS_PATH as well as in tez library paths. Hadoop apis are working fine with alluxio Uri and i am able to create an external hive table as well on the top of uri pointing to alluxio. However, whenever i run any query which requires MR or TEZ in hive, it failes with this issue. What am i missing here ? I am able to run the map reduce word count example using -libjars option as well. Only in hive MR/TEZ gives this error.

--
You received this message because you are subscribed to a topic in the Google Groups "Alluxio Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/alluxio-users/uhNjxo_wNew/unsubscribe.
To unsubscribe from this group and all its topics, send an email to alluxio-user...@googlegroups.com.

Pranjul Ahuja

unread,
Sep 22, 2018, 8:29:20 AM9/22/18
to Alluxio Users
Hi Arivoli,

Can you please explain which tez.tar.gz are you talking about here ? 

And in the second step where are you uploading that tar.gz ? 

For now, i have just copied the alluxio jars in the tez jar folder on the master node.

Arivoli Murugan

unread,
Sep 22, 2018, 8:32:32 AM9/22/18
to ahuj...@gmail.com, alluxi...@googlegroups.com
Hi,

It will be under your current HDP install.

/usr/hdp/2.x.x/tez/tez.tar.gz

Also please make sure the pre requisite class path configs (HDFS & Hive) are already addressed.

--

Pranjul Ahuja

unread,
Sep 22, 2018, 8:48:16 AM9/22/18
to Alluxio Users
The problem is i am able to run map reduce programs using -libjars options. Only in the case of hive, MR or tez fails. How do you make the jars available to all the workers when hive will fire MR under the hood ? My hadoop classpath already contains the jar. Where should i put all the jars on the workers and which services (on master and workers) i need to restart if i want to run map reduce without -libjars option ?

Bin Fan

unread,
Sep 24, 2018, 1:07:12 PM9/24/18
to ahuj...@gmail.com, Alluxio Users
(adding back alluxio-users in case the discussion is helpful for other folks).

- are you able to run MR job to read / write through Alluxio, without Hive in the picture?
- do you distribute the Alluxio client jar to all nodes you may run Hadoop?


On Sat, Sep 22, 2018 at 3:37 AM Pranjul Ahuja <ahuj...@gmail.com> wrote:
Actually, it was class alluxio.hadoop.FileSystem not found. I have placed the jar in HADOOP_CLASSPATH as well as in HIVE_AUX_JARS_PATH as well as in tez library paths. Hadoop apis are working fine with alluxio Uri and i am able to create an external hive table as well on the top of uri pointing to alluxio. However, whenever i run any query which requires MR or TEZ in hive, it failes with this issue. What am i missing here ? I am able to run the map reduce word count example using -libjars option as well. Only in hive MR/TEZ gives this error.

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to alluxio-user...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.


--
- Bin Fan

Software Engineer
Alluxio
www.alluxio.com


Pranjul Ahuja

unread,
Sep 25, 2018, 4:22:29 AM9/25/18
to Alluxio Users
Hi Bin,

Yes i am able to run the hadoop map reduce jar and thats because i am supplying the alluxio client jar using -libjars option. Can you tell me whats the other alternative in case i want to run MR without using -libjars option ? I have distibuted the jars to all nodes in /usr/hadoop/lib. What else am i missing ? 

Bin Fan

unread,
Oct 3, 2018, 1:26:27 PM10/3/18
to Alluxio Users
hi Pranjul

Yes distributing Alluxio client jar is an alternative way.
make sure the client jar is in lib directory your Hadoop runtime honors.
In your case, please double check if "/usr/hadoop/lib" is sufficient

- Bin
Reply all
Reply to author
Forward
0 new messages