Alluxio Secure (Kerberos) HDFS issues

362 views
Skip to first unread message

an...@phdata.io

unread,
Oct 6, 2016, 2:57:30 PM10/6/16
to Alluxio Users
We are trying to configure Alluxio to use Secure HDFS with Kerberos and running into issues. We are currently using Alluxio 1.1.0 and CDH 5.7.1 because it was mentioned in this thread  https://groups.google.com/forum/#!topic/alluxio-users/hVTnRfS-sEc  that someone had it working with hdfs kerberos. We are using Active Directory as our KDC with rc4-hmac algorithm. We have also tried the following versions 0.8.2, 1.2.0, 1.3.0RC-1, 1.3.0RC-2.

It appears everything starts ok but when we try and run the tests we start seeing errors in the master.log and all tests fail stack trace attached below. I don't think there is any issue with the keytab file if we do a kinit with the keytab file before starting alluxio everything works fine runTests passes and we are able to read and write files to HDFS. Please let me know if you need me to attach any logs or configuration files


alluxio-env.sh

ALLUXIO_JAVA_OPTS+="
 -Dalluxio.master.principal=hdfs/_HO...@PHDATA.IO
 -Dalluxio.master.keytab.file=/tmp/alluxio-1.1.0/conf/hdfs.keytab
 -Dalluxio.worker.principal=hdfs/_HO...@PHDATA.IO
 -Dalluxio.worker.keytab.file=/tmp/alluxio-1.1.0/conf/hdfs.keytab
 -Djava.security.krb5.realm=PHDATA.IO
 -Djava.security.krb5.kdc=172.31.11.190:88
"

alluxio-site.properties (default except for underfs)

alluxio.underfs.address=hdfs://ip-172-31-37-123.us-west-2.compute.internal:8020



Commands to reproduce
 
[hdfs@ip-172-31-40-120 alluxio-1.1.0]$ klist
klist: Credentials cache file '/tmp/krb5cc_992' not found

[hdfs@ip-172-31-40-120 alluxio-1.1.0]$ ./bin/alluxio-start.sh work SudoMount
[hdfs@ip-172-31-40-120 alluxio-1.1.0]$ ./bin/alluxio-start.sh worker SudoMount

[hdfs@ip-172-31-40-120 alluxio-1.1.0]$ kinit an...@PHDATA.IO
Password for an...@PHDATA.IO:

[hdfs@ip-172-31-40-120 alluxio-1.1.0]$ klist
Ticket cache: FILE:/tmp/krb5cc_992
Default principal: an...@PHDATA.IO

Valid starting       Expires              Service principal
10/06/2016 18:24:10  10/07/2016 18:24:04  krbtgt/PHDA...@PHDATA.IO
renew until 10/13/2016 18:24:04

[hdfs@ip-172-31-40-120 alluxio-1.1.0]$ ./bin/alluxio runTests
.............
Failed the test!
Number of failed tests: 24

[hdfs@ip-172-31-40-120 alluxio-1.1.0]$ ps aux | grep alluxio
hdfs     11853  8.4  1.4 7330828 212440 pts/1  Sl   18:41   0:18 /bin/../bin/java -cp /tmp/alluxio-1.1.0/conf/::/tmp/alluxio-1.1.0/assembly/target/alluxio-assemblies-1.1.0-jar-with-dependencies.jar -Dalluxio.master.principal=hdfs/_HO...@PHDATA.IO -Dalluxio.master.keytab.file=/tmp/alluxio-1.1.0/conf/hdfs.keytab -Dalluxio.worker.principal=hdfs/_HO...@PHDATA.IO -Dalluxio.worker.keytab.file=/tmp/alluxio-1.1.0/conf/hdfs.keytab -Djava.security.krb5.realm=PHDATA.IO -Djava.security.krb5.kdc=172.31.11.190:88 -Dalluxio.home=/tmp/alluxio-1.1.0 -Dalluxio.logs.dir=/tmp/alluxio-1.1.0/logs -Dalluxio.master.hostname=ip-172-31-40-120.us-west-2.compute.internal -Dlog4j.configuration=file:/tmp/alluxio-1.1.0/conf/log4j.properties -Dorg.apache.jasper.compiler.disablejsr199=true -Djava.net.preferIPv4Stack=true -Dalluxio.logger.type=MASTER_LOGGER alluxio.master.AlluxioMaster
hdfs     12020  6.4  1.6 7193604 245800 pts/1  Sl   18:42   0:12 /bin/../bin/java -cp /tmp/alluxio-1.1.0/conf/::/tmp/alluxio-1.1.0/assembly/target/alluxio-assemblies-1.1.0-jar-with-dependencies.jar -Dalluxio.master.principal=hdfs/_HO...@PHDATA.IO -Dalluxio.master.keytab.file=/tmp/alluxio-1.1.0/conf/hdfs.keytab -Dalluxio.worker.principal=hdfs/_HO...@PHDATA.IO -Dalluxio.worker.keytab.file=/tmp/alluxio-1.1.0/conf/hdfs.keytab -Djava.security.krb5.realm=PHDATA.IO -Djava.security.krb5.kdc=172.31.11.190:88 -Dalluxio.home=/tmp/alluxio-1.1.0 -Dalluxio.logs.dir=/tmp/alluxio-1.1.0/logs -Dalluxio.master.hostname=ip-172-31-40-120.us-west-2.compute.internal -Dlog4j.configuration=file:/tmp/alluxio-1.1.0/conf/log4j.properties -Dorg.apache.jasper.compiler.disablejsr199=true -Djava.net.preferIPv4Stack=true -Dalluxio.logger.type=WORKER_LOGGER alluxio.worker.AlluxioWorker

2016-10-06 18:42:00,006 ERROR logger.type (MetricsConfig.java:loadConfigFile) - Error loading metrics configuration file.
2016-10-06 18:42:01,074 WARN  util.NativeCodeLoader (NativeCodeLoader.java:<clinit>) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2016-10-06 18:42:01,297 INFO  security.UserGroupInformation (UserGroupInformation.java:loginUserFromKeytab) - Login successful for user hdfs/ip-172-31-40-120.us-w...@PHDATA.IO using keytab file /tmp/alluxio-1.1.0/conf/hdfs.keytab
2016-10-06 18:42:01,300 INFO  logger.type (AbstractMaster.java:start) - BlockMaster: Starting leader master.
2016-10-06 18:42:01,302 INFO  logger.type (JournalWriter.java:completeAllLogs) - Marking all logs as complete.
2016-10-06 18:42:01,303 INFO  logger.type (JournalWriter.java:completeCurrentLog) - Completed current log: /tmp/alluxio-1.1.0/journal/BlockMaster/log.out to completed log: /tmp/alluxio-1.1.0/journal/BlockMaster/completed/log.00000000000000000001
2016-10-06 18:42:01,304 INFO  logger.type (AbstractMaster.java:start) - BlockMaster: process entire journal before becoming leader master.
2016-10-06 18:42:01,304 INFO  logger.type (JournalTailer.java:processJournalCheckpoint) - BlockMaster: Loading checkpoint file: /tmp/alluxio-1.1.0/journal/BlockMaster/checkpoint.data
2016-10-06 18:42:01,305 INFO  logger.type (JournalReader.java:getCheckpointInputStream) - Opening journal checkpoint file: /tmp/alluxio-1.1.0/journal/BlockMaster/checkpoint.data
2016-10-06 18:42:01,318 INFO  logger.type (JournalReader.java:getNextInputStream) - Opening journal log file: /tmp/alluxio-1.1.0/journal/BlockMaster/completed/log.00000000000000000001
2016-10-06 18:42:01,318 INFO  logger.type (JournalTailer.java:processNextJournalLogFiles) - BlockMaster: Processing a completed log file.
2016-10-06 18:42:01,318 INFO  logger.type (JournalTailer.java:processNextJournalLogFiles) - BlockMaster: Finished processing the log file.
2016-10-06 18:42:01,318 INFO  logger.type (JournalWriter.java:getCheckpointOutputStream) - Creating tmp checkpoint file: /tmp/alluxio-1.1.0/journal/BlockMaster/checkpoint.data.tmp
2016-10-06 18:42:01,318 INFO  logger.type (JournalWriter.java:getCheckpointOutputStream) - Latest journal sequence number: 22 Next journal sequence number: 23
2016-10-06 18:42:01,342 INFO  logger.type (JournalWriter.java:close) - Successfully created tmp checkpoint file: /tmp/alluxio-1.1.0/journal/BlockMaster/checkpoint.data.tmp
2016-10-06 18:42:01,343 INFO  logger.type (JournalWriter.java:close) - Renamed checkpoint file /tmp/alluxio-1.1.0/journal/BlockMaster/checkpoint.data.tmp to /tmp/alluxio-1.1.0/journal/BlockMaster/checkpoint.data
2016-10-06 18:42:01,344 INFO  logger.type (JournalWriter.java:deleteCompletedLogs) - Deleting all completed log files...
2016-10-06 18:42:01,344 INFO  logger.type (JournalWriter.java:deleteCompletedLogs) - Deleting completed log: /tmp/alluxio-1.1.0/journal/BlockMaster/completed/log.00000000000000000001
2016-10-06 18:42:01,345 INFO  logger.type (JournalWriter.java:deleteCompletedLogs) - Finished deleting all completed log files.
2016-10-06 18:42:01,353 INFO  logger.type (JournalWriter.java:openCurrentLog) - Opened current log file: /tmp/alluxio-1.1.0/journal/BlockMaster/log.out
2016-10-06 18:42:01,356 INFO  logger.type (MountTable.java:add) - Mounting hdfs://ip-172-31-37-123.us-west-2.compute.internal:8020 at /
2016-10-06 18:42:01,357 INFO  logger.type (AbstractMaster.java:start) - FileSystemMaster: Starting leader master.
2016-10-06 18:42:01,357 INFO  logger.type (JournalWriter.java:completeAllLogs) - Marking all logs as complete.
2016-10-06 18:42:01,357 INFO  logger.type (JournalWriter.java:completeCurrentLog) - Completed current log: /tmp/alluxio-1.1.0/journal/FileSystemMaster/log.out to completed log: /tmp/alluxio-1.1.0/journal/FileSystemMaster/completed/log.00000000000000000001
2016-10-06 18:42:01,358 INFO  logger.type (AbstractMaster.java:start) - FileSystemMaster: process entire journal before becoming leader master.
2016-10-06 18:42:01,358 INFO  logger.type (JournalTailer.java:processJournalCheckpoint) - FileSystemMaster: Loading checkpoint file: /tmp/alluxio-1.1.0/journal/FileSystemMaster/checkpoint.data
2016-10-06 18:42:01,358 INFO  logger.type (JournalReader.java:getCheckpointInputStream) - Opening journal checkpoint file: /tmp/alluxio-1.1.0/journal/FileSystemMaster/checkpoint.data
2016-10-06 18:42:01,364 INFO  logger.type (JournalReader.java:getNextInputStream) - Opening journal log file: /tmp/alluxio-1.1.0/journal/FileSystemMaster/completed/log.00000000000000000001
2016-10-06 18:42:01,364 INFO  logger.type (JournalTailer.java:processNextJournalLogFiles) - FileSystemMaster: Processing a completed log file.
2016-10-06 18:42:01,374 INFO  logger.type (JournalTailer.java:processNextJournalLogFiles) - FileSystemMaster: Finished processing the log file.
2016-10-06 18:42:01,374 INFO  logger.type (JournalWriter.java:getCheckpointOutputStream) - Creating tmp checkpoint file: /tmp/alluxio-1.1.0/journal/FileSystemMaster/checkpoint.data.tmp
2016-10-06 18:42:01,374 INFO  logger.type (JournalWriter.java:getCheckpointOutputStream) - Latest journal sequence number: 421 Next journal sequence number: 422
2016-10-06 18:42:01,380 INFO  logger.type (JournalWriter.java:close) - Successfully created tmp checkpoint file: /tmp/alluxio-1.1.0/journal/FileSystemMaster/checkpoint.data.tmp
2016-10-06 18:42:01,380 INFO  logger.type (JournalWriter.java:close) - Renamed checkpoint file /tmp/alluxio-1.1.0/journal/FileSystemMaster/checkpoint.data.tmp to /tmp/alluxio-1.1.0/journal/FileSystemMaster/checkpoint.data
2016-10-06 18:42:01,380 INFO  logger.type (JournalWriter.java:deleteCompletedLogs) - Deleting all completed log files...
2016-10-06 18:42:01,380 INFO  logger.type (JournalWriter.java:deleteCompletedLogs) - Deleting completed log: /tmp/alluxio-1.1.0/journal/FileSystemMaster/completed/log.00000000000000000001
2016-10-06 18:42:01,381 INFO  logger.type (JournalWriter.java:deleteCompletedLogs) - Finished deleting all completed log files.
2016-10-06 18:42:01,575 INFO  server.Server (Server.java:doStart) - jetty-7.x.y-SNAPSHOT
2016-10-06 18:42:01,596 INFO  handler.ContextHandler (ContextHandler.java:startContext) - started o.e.j.s.ServletContextHandler{/metrics/json,null}
2016-10-06 18:42:01,608 WARN  webapp.WebInfConfiguration (WebInfConfiguration.java:makeTempDirectory) - Can't reuse /tmp/jetty-0.0.0.0-19999-webapp-_-any-, using /tmp/jetty-0.0.0.0-19999-webapp-_-any-_5220974578236382142
2016-10-06 18:42:01,679 INFO  handler.ContextHandler (ContextHandler.java:startContext) - started o.e.j.w.WebAppContext{/,file:/tmp/alluxio-1.1.0/core/server/src/main/webapp/},/tmp/alluxio-1.1.0/core/server/src/main/webapp
2016-10-06 18:42:10,819 INFO  server.AbstractConnector (AbstractConnector.java:doStart) - Started SelectChann...@0.0.0.0:19999
2016-10-06 18:42:10,820 INFO  logger.type (UIWebServer.java:startWebServer) - Alluxio Master Web service started @ 0.0.0.0/0.0.0.0:19999
2016-10-06 18:42:10,822 INFO  logger.type (AlluxioMaster.java:startServing) - Alluxio master version 1.1.0 started @ ip-172-31-40-120.us-west-2.compute.internal/172.31.40.120:19998
2016-10-06 18:42:39,308 INFO  logger.type (BlockMaster.java:getWorkerId) - getWorkerId(): WorkerNetAddress: WorkerNetAddress{host=ip-172-31-40-120.us-west-2.compute.internal, rpcPort=29998, dataPort=29999, webPort=30000} id: 1
2016-10-06 18:42:39,321 INFO  logger.type (BlockMaster.java:workerRegister) - registerWorker(): MasterWorkerInfo{id=1, workerAddress=WorkerNetAddress{host=ip-172-31-40-120.us-west-2.compute.internal, rpcPort=29998, dataPort=29999, webPort=30000}, capacityBytes=134217728, usedBytes=0, lastUpdatedTimeMs=1475779359320, blocks=[]}
2016-10-06 18:42:53,103 INFO  logger.type (JournalWriter.java:openCurrentLog) - Opened current log file: /tmp/alluxio-1.1.0/journal/FileSystemMaster/log.out
2016-10-06 18:42:53,215 WARN  security.UserGroupInformation (UserGroupInformation.java:doAs) - PriviledgedActionException as:hdfs (auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
2016-10-06 18:42:53,216 WARN  ipc.Client (Client.java:run) - Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
2016-10-06 18:42:53,216 WARN  security.UserGroupInformation (UserGroupInformation.java:doAs) - PriviledgedActionException as:hdfs (auth:KERBEROS) cause:java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
2016-10-06 18:42:53,221 ERROR logger.type (HdfsUnderFileSystem.java:exists) - 1 try to check if hdfs://ip-172-31-37-123.us-west-2.compute.internal:8020/default_tests_files/BasicNonByteBuffer_CACHE_PROMOTE_MUST_CACHE exists : Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "ip-172-31-40-120.us-west-2.compute.internal/172.31.40.120"; destination host is: "ip-172-31-37-123.us-west-2.compute.internal":8020;
java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "ip-172-31-40-120.us-west-2.compute.internal/172.31.40.120"; destination host is: "ip-172-31-37-123.us-west-2.compute.internal":8020;
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
        at org.apache.hadoop.ipc.Client.call(Client.java:1472)
        at org.apache.hadoop.ipc.Client.call(Client.java:1399)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
        at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
        at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1982)
        at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1128)
        at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1124)
        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1124)
        at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1400)
        at alluxio.underfs.hdfs.HdfsUnderFileSystem.exists(HdfsUnderFileSystem.java:207)
        at alluxio.master.file.FileSystemMaster.loadMetadataAndJournal(FileSystemMaster.java:1792)
        at alluxio.master.file.FileSystemMaster.loadMetadataIfNotExistAndJournal(FileSystemMaster.java:1932)
        at alluxio.master.file.FileSystemMaster.getFileInfo(FileSystemMaster.java:447)
        at alluxio.master.file.FileSystemMasterClientServiceHandler$7.call(FileSystemMasterClientServiceHandler.java:155)
        at alluxio.master.file.FileSystemMasterClientServiceHandler$7.call(FileSystemMasterClientServiceHandler.java:152)
        at alluxio.RpcUtils.call(RpcUtils.java:40)
        at alluxio.master.file.FileSystemMasterClientServiceHandler.getStatus(FileSystemMasterClientServiceHandler.java:152)
        at alluxio.thrift.FileSystemMasterClientService$Processor$getStatus.getResult(FileSystemMasterClientService.java:1460)
        at alluxio.thrift.FileSystemMasterClientService$Processor$getStatus.getResult(FileSystemMasterClientService.java:1444)
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
        at org.apache.thrift.TMultiplexedProcessor.process(TMultiplexedProcessor.java:123)
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
        at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:680)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
        at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:643)
        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:730)
        at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
        at org.apache.hadoop.ipc.Client.call(Client.java:1438)
        ... 34 more
Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
        at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
        at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:413)
        at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:553)
        at org.apache.hadoop.ipc.Client$Connection.access$1800(Client.java:368)
        at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:722)
        at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:718)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:717)
        ... 37 more
Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
        at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147)
        at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:122)
        at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:187)
        at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:224)
        at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212)
        at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)
        at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:192)
        ... 46 more
2016-10-06 18:42:53,229 WARN  security.UserGroupInformation (UserGroupInformation.java:doAs) - PriviledgedActionException as:hdfs (auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
 
 

Chaomin Yu

unread,
Oct 6, 2016, 4:36:39 PM10/6/16
to Andrew Stadtler, Alluxio Users
Great, glad to hear that can resolve the issue!

On Thu, Oct 6, 2016 at 1:31 PM, Andrew Stadtler <an...@phdata.io> wrote:
Hi Chaomin,

Yes everything works fine if I kinit with the keytab first I am able to successfully runTests and read/write files through alluxio to hdfs.

[hdfs@ip-172-31-40-120 alluxio-1.1.0]$ kinit -k -t hdfs.keytab hdfs/ip-172-31-40-120.us-west-2.compute...@PHDATA.IO

[hdfs@ip-172-31-40-120 alluxio-1.1.0]$ klist
Ticket cache: FILE:/tmp/krb5cc_992


Valid starting       Expires              Service principal
10/06/2016 19:51:07  10/06/2016 20:01:07  krbtgt/PHDA...@PHDATA.IO
renew until 10/06/2016 20:21:07

[hdfs@ip-172-31-40-120 alluxio-1.1.0]$ ./bin/alluxio-start.sh master
Starting master @ ip-172-31-40-120.us-west-2.compute.internal. Logging to /tmp/alluxio-1.1.0/logs

[hdfs@ip-172-31-40-120 alluxio-1.1.0]$ ./bin/alluxio-start.sh worker SudoMount
ALLUXIO_WORKER_MEMORY_SIZE was not set. Using the default one: 128MB
ALLUXIO_RAM_FOLDER was not set. Using the default one: /mnt/ramdisk
Formatting RamFS: /mnt/ramdisk (128mb)

[hdfs@ip-172-31-40-120 alluxio-1.1.0]$ ./bin/alluxio runTests
2016-10-06 19:56:18,866 INFO  type (AbstractClient.java:connect) - Alluxio client (version 1.1.0) is trying to connect with FileSystemMasterClient master @ ip-172-31-40-120.us-west-2.compute.internal/172.31.40.120:19998
2016-10-06 19:56:18,876 INFO  type (AbstractClient.java:connect) - Client registered with FileSystemMasterClient master @ ip-172-31-40-120.us-west-2.compute.internal/172.31.40.120:19998
runTest Basic CACHE_PROMOTE MUST_CACHE
2016-10-06 19:56:18,979 INFO  type (AbstractClient.java:connect) - Alluxio client (version 1.1.0) is trying to connect with BlockMasterClient master @ ip-172-31-40-120.us-west-2.compute.internal/172.31.40.120:19998
2016-10-06 19:56:18,983 INFO  type (AbstractClient.java:connect) - Client registered with BlockMasterClient master @ ip-172-31-40-120.us-west-2.compute.internal/172.31.40.120:19998
2016-10-06 19:56:19,009 INFO  type (BlockWorkerClient.java:connectOperation) - Connecting to local worker @ ip-172-31-40-120.us-west-2.compute.internal/172.31.40.120:29998
2016-10-06 19:56:19,040 INFO  type (BasicOperations.java:writeFile) - writeFile to file /default_tests_files/Basic_CACHE_PROMOTE_MUST_CACHE took 76 ms.
2016-10-06 19:56:19,151 INFO  type (BasicOperations.java:readFile) - readFile file /default_tests_files/Basic_CACHE_PROMOTE_MUST_CACHE took 111 ms.
Passed the test!
runTest BasicNonByteBuffer CACHE_PROMOTE MUST_CACHE
Passed the test!
runTest Basic CACHE_PROMOTE CACHE_THROUGH
2016-10-06 19:56:19,250 INFO  type (AbstractClient.java:connect) - Alluxio client (version 1.1.0) is trying to connect with FileSystemWorkerClient FileSystemWorker @ ip-172-31-40-120.us-west-2.compute.internal/172.31.40.120:29998
2016-10-06 19:56:19,250 INFO  type (AbstractClient.java:connect) - Client registered with FileSystemWorkerClient FileSystemWorker @ ip-172-31-40-120.us-west-2.compute.internal/172.31.40.120:29998
2016-10-06 19:56:19,505 INFO  type (BasicOperations.java:writeFile) - writeFile to file /default_tests_files/Basic_CACHE_PROMOTE_CACHE_THROUGH took 266 ms.
2016-10-06 19:56:19,583 INFO  type (BasicOperations.java:readFile) - readFile file /default_tests_files/Basic_CACHE_PROMOTE_CACHE_THROUGH took 78 ms.
Passed the test!
runTest BasicNonByteBuffer CACHE_PROMOTE CACHE_THROUGH
2016-10-06 19:56:19,588 INFO  type (AbstractClient.java:connect) - Alluxio client (version 1.1.0) is trying to connect with FileSystemWorkerClient FileSystemWorker @ ip-172-31-40-120.us-west-2.compute.internal/172.31.40.120:29998
2016-10-06 19:56:19,588 INFO  type (AbstractClient.java:connect) - Client registered with FileSystemWorkerClient FileSystemWorker @ ip-172-31-40-120.us-west-2.compute.internal/172.31.40.120:29998
Passed the test!
runTest Basic CACHE_PROMOTE THROUGH
2016-10-06 19:56:19,745 INFO  type (AbstractClient.java:connect) - Alluxio client (version 1.1.0) is trying to connect with FileSystemWorkerClient FileSystemWorker @ ip-172-31-40-120.us-west-2.compute.internal/172.31.40.120:29998
2016-10-06 19:56:19,746 INFO  type (AbstractClient.java:connect) - Client registered with FileSystemWorkerClient FileSystemWorker @ ip-172-31-40-120.us-west-2.compute.internal/172.31.40.120:29998
2016-10-06 19:56:19,829 INFO  type (BasicOperations.java:writeFile) - writeFile to file /default_tests_files/Basic_CACHE_PROMOTE_THROUGH took 85 ms.
2016-10-06 19:56:19,904 INFO  type (AbstractClient.java:connect) - Alluxio client (version 1.1.0) is trying to connect with FileSystemWorkerClient FileSystemWorker @ ip-172-31-40-120.us-west-2.compute.internal/172.31.40.120:29998
2016-10-06 19:56:19,904 INFO  type (AbstractClient.java:connect) - Client registered with FileSystemWorkerClient FileSystemWorker @ ip-172-31-40-120.us-west-2.compute.internal/172.31.40.120:29998
2016-10-06 19:56:19,989 INFO  type (BasicOperations.java:readFile) - readFile file /default_tests_files/Basic_CACHE_PROMOTE_THROUGH took 159 ms.
Passed the test!
runTest BasicNonByteBuffer CACHE_PROMOTE THROUGH
2016-10-06 19:56:19,993 INFO  type (AbstractClient.java:connect) - Alluxio client (version 1.1.0) is trying to connect with FileSystemWorkerClient FileSystemWorker @ ip-172-31-40-120.us-west-2.compute.internal/172.31.40.120:29998
2016-10-06 19:56:19,994 INFO  type (AbstractClient.java:connect) - Client registered with FileSystemWorkerClient FileSystemWorker @ ip-172-31-40-120.us-west-2.compute.internal/172.31.40.120:29998
2016-10-06 19:56:20,147 INFO  type (AbstractClient.java:connect) - Alluxio client (version 1.1.0) is trying to connect with FileSystemWorkerClient FileSystemWorker @ ip-172-31-40-120.us-west-2.compute.internal/172.31.40.120:29998
2016-10-06 19:56:20,148 INFO  type (AbstractClient.java:connect) - Client registered with FileSystemWorkerClient FileSystemWorker @ ip-172-31-40-120.us-west-2.compute.internal/172.31.40.120:29998
Passed the test!
runTest Basic CACHE_PROMOTE ASYNC_THROUGH
2016-10-06 19:56:20,225 INFO  type (BasicOperations.java:writeFile) - writeFile to file /default_tests_files/Basic_CACHE_PROMOTE_ASYNC_THROUGH took 10 ms.
2016-10-06 19:56:20,306 INFO  type (BasicOperations.java:readFile) - readFile file /default_tests_files/Basic_CACHE_PROMOTE_ASYNC_THROUGH took 81 ms.
Passed the test!
runTest BasicNonByteBuffer CACHE_PROMOTE ASYNC_THROUGH
Passed the test!
runTest Basic CACHE MUST_CACHE
2016-10-06 19:56:20,387 INFO  type (BasicOperations.java:writeFile) - writeFile to file /default_tests_files/Basic_CACHE_MUST_CACHE took 5 ms.
2016-10-06 19:56:20,454 INFO  type (BasicOperations.java:readFile) - readFile file /default_tests_files/Basic_CACHE_MUST_CACHE took 67 ms.
Passed the test!
runTest BasicNonByteBuffer CACHE MUST_CACHE
Passed the test!
runTest Basic CACHE CACHE_THROUGH
2016-10-06 19:56:20,529 INFO  type (AbstractClient.java:connect) - Alluxio client (version 1.1.0) is trying to connect with FileSystemWorkerClient FileSystemWorker @ ip-172-31-40-120.us-west-2.compute.internal/172.31.40.120:29998
2016-10-06 19:56:20,530 INFO  type (AbstractClient.java:connect) - Client registered with FileSystemWorkerClient FileSystemWorker @ ip-172-31-40-120.us-west-2.compute.internal/172.31.40.120:29998
2016-10-06 19:56:20,625 INFO  type (BasicOperations.java:writeFile) - writeFile to file /default_tests_files/Basic_CACHE_CACHE_THROUGH took 97 ms.
2016-10-06 19:56:20,691 INFO  type (BasicOperations.java:readFile) - readFile file /default_tests_files/Basic_CACHE_CACHE_THROUGH took 66 ms.
Passed the test!
runTest BasicNonByteBuffer CACHE CACHE_THROUGH
2016-10-06 19:56:20,695 INFO  type (AbstractClient.java:connect) - Alluxio client (version 1.1.0) is trying to connect with FileSystemWorkerClient FileSystemWorker @ ip-172-31-40-120.us-west-2.compute.internal/172.31.40.120:29998
2016-10-06 19:56:20,695 INFO  type (AbstractClient.java:connect) - Client registered with FileSystemWorkerClient FileSystemWorker @ ip-172-31-40-120.us-west-2.compute.internal/172.31.40.120:29998
Passed the test!
runTest Basic CACHE THROUGH
2016-10-06 19:56:20,864 INFO  type (AbstractClient.java:connect) - Alluxio client (version 1.1.0) is trying to connect with FileSystemWorkerClient FileSystemWorker @ ip-172-31-40-120.us-west-2.compute.internal/172.31.40.120:29998
2016-10-06 19:56:20,865 INFO  type (AbstractClient.java:connect) - Client registered with FileSystemWorkerClient FileSystemWorker @ ip-172-31-40-120.us-west-2.compute.internal/172.31.40.120:29998
2016-10-06 19:56:20,948 INFO  type (BasicOperations.java:writeFile) - writeFile to file /default_tests_files/Basic_CACHE_THROUGH took 86 ms.
2016-10-06 19:56:21,015 INFO  type (AbstractClient.java:connect) - Alluxio client (version 1.1.0) is trying to connect with FileSystemWorkerClient FileSystemWorker @ ip-172-31-40-120.us-west-2.compute.internal/172.31.40.120:29998
2016-10-06 19:56:21,015 INFO  type (AbstractClient.java:connect) - Client registered with FileSystemWorkerClient FileSystemWorker @ ip-172-31-40-120.us-west-2.compute.internal/172.31.40.120:29998
2016-10-06 19:56:21,090 INFO  type (BasicOperations.java:readFile) - readFile file /default_tests_files/Basic_CACHE_THROUGH took 142 ms.
Passed the test!
runTest BasicNonByteBuffer CACHE THROUGH
2016-10-06 19:56:21,093 INFO  type (AbstractClient.java:connect) - Alluxio client (version 1.1.0) is trying to connect with FileSystemWorkerClient FileSystemWorker @ ip-172-31-40-120.us-west-2.compute.internal/172.31.40.120:29998
2016-10-06 19:56:21,096 INFO  type (AbstractClient.java:connect) - Client registered with FileSystemWorkerClient FileSystemWorker @ ip-172-31-40-120.us-west-2.compute.internal/172.31.40.120:29998
2016-10-06 19:56:21,247 INFO  type (AbstractClient.java:connect) - Alluxio client (version 1.1.0) is trying to connect with FileSystemWorkerClient FileSystemWorker @ ip-172-31-40-120.us-west-2.compute.internal/172.31.40.120:29998
2016-10-06 19:56:21,248 INFO  type (AbstractClient.java:connect) - Client registered with FileSystemWorkerClient FileSystemWorker @ ip-172-31-40-120.us-west-2.compute.internal/172.31.40.120:29998
Passed the test!
runTest Basic CACHE ASYNC_THROUGH
2016-10-06 19:56:21,336 INFO  type (BasicOperations.java:writeFile) - writeFile to file /default_tests_files/Basic_CACHE_ASYNC_THROUGH took 6 ms.
2016-10-06 19:56:21,402 INFO  type (BasicOperations.java:readFile) - readFile file /default_tests_files/Basic_CACHE_ASYNC_THROUGH took 66 ms.
Passed the test!
runTest BasicNonByteBuffer CACHE ASYNC_THROUGH
Passed the test!
runTest Basic NO_CACHE MUST_CACHE
2016-10-06 19:56:21,481 INFO  type (BasicOperations.java:writeFile) - writeFile to file /default_tests_files/Basic_NO_CACHE_MUST_CACHE took 5 ms.
2016-10-06 19:56:21,573 INFO  type (BasicOperations.java:readFile) - readFile file /default_tests_files/Basic_NO_CACHE_MUST_CACHE took 92 ms.
Passed the test!
runTest BasicNonByteBuffer NO_CACHE MUST_CACHE
Passed the test!
runTest Basic NO_CACHE CACHE_THROUGH
2016-10-06 19:56:21,651 INFO  type (AbstractClient.java:connect) - Alluxio client (version 1.1.0) is trying to connect with FileSystemWorkerClient FileSystemWorker @ ip-172-31-40-120.us-west-2.compute.internal/172.31.40.120:29998
2016-10-06 19:56:21,651 INFO  type (AbstractClient.java:connect) - Client registered with FileSystemWorkerClient FileSystemWorker @ ip-172-31-40-120.us-west-2.compute.internal/172.31.40.120:29998
2016-10-06 19:56:21,725 INFO  type (BasicOperations.java:writeFile) - writeFile to file /default_tests_files/Basic_NO_CACHE_CACHE_THROUGH took 75 ms.
2016-10-06 19:56:21,789 INFO  type (BasicOperations.java:readFile) - readFile file /default_tests_files/Basic_NO_CACHE_CACHE_THROUGH took 63 ms.
Passed the test!
runTest BasicNonByteBuffer NO_CACHE CACHE_THROUGH
2016-10-06 19:56:21,793 INFO  type (AbstractClient.java:connect) - Alluxio client (version 1.1.0) is trying to connect with FileSystemWorkerClient FileSystemWorker @ ip-172-31-40-120.us-west-2.compute.internal/172.31.40.120:29998
2016-10-06 19:56:21,793 INFO  type (AbstractClient.java:connect) - Client registered with FileSystemWorkerClient FileSystemWorker @ ip-172-31-40-120.us-west-2.compute.internal/172.31.40.120:29998
Passed the test!
runTest Basic NO_CACHE THROUGH
2016-10-06 19:56:22,167 INFO  type (AbstractClient.java:connect) - Alluxio client (version 1.1.0) is trying to connect with FileSystemWorkerClient FileSystemWorker @ ip-172-31-40-120.us-west-2.compute.internal/172.31.40.120:29998
2016-10-06 19:56:22,167 INFO  type (AbstractClient.java:connect) - Client registered with FileSystemWorkerClient FileSystemWorker @ ip-172-31-40-120.us-west-2.compute.internal/172.31.40.120:29998
2016-10-06 19:56:22,253 INFO  type (BasicOperations.java:writeFile) - writeFile to file /default_tests_files/Basic_NO_CACHE_THROUGH took 87 ms.
2016-10-06 19:56:22,322 INFO  type (AbstractClient.java:connect) - Alluxio client (version 1.1.0) is trying to connect with FileSystemWorkerClient FileSystemWorker @ ip-172-31-40-120.us-west-2.compute.internal/172.31.40.120:29998
2016-10-06 19:56:22,322 INFO  type (AbstractClient.java:connect) - Client registered with FileSystemWorkerClient FileSystemWorker @ ip-172-31-40-120.us-west-2.compute.internal/172.31.40.120:29998
2016-10-06 19:56:22,380 INFO  type (BasicOperations.java:readFile) - readFile file /default_tests_files/Basic_NO_CACHE_THROUGH took 127 ms.
Passed the test!
runTest BasicNonByteBuffer NO_CACHE THROUGH
2016-10-06 19:56:22,384 INFO  type (AbstractClient.java:connect) - Alluxio client (version 1.1.0) is trying to connect with FileSystemWorkerClient FileSystemWorker @ ip-172-31-40-120.us-west-2.compute.internal/172.31.40.120:29998
2016-10-06 19:56:22,384 INFO  type (AbstractClient.java:connect) - Client registered with FileSystemWorkerClient FileSystemWorker @ ip-172-31-40-120.us-west-2.compute.internal/172.31.40.120:29998
2016-10-06 19:56:22,522 INFO  type (AbstractClient.java:connect) - Alluxio client (version 1.1.0) is trying to connect with FileSystemWorkerClient FileSystemWorker @ ip-172-31-40-120.us-west-2.compute.internal/172.31.40.120:29998
2016-10-06 19:56:22,522 INFO  type (AbstractClient.java:connect) - Client registered with FileSystemWorkerClient FileSystemWorker @ ip-172-31-40-120.us-west-2.compute.internal/172.31.40.120:29998
Passed the test!
runTest Basic NO_CACHE ASYNC_THROUGH
2016-10-06 19:56:22,587 INFO  type (BasicOperations.java:writeFile) - writeFile to file /default_tests_files/Basic_NO_CACHE_ASYNC_THROUGH took 6 ms.
2016-10-06 19:56:22,653 INFO  type (BasicOperations.java:readFile) - readFile file /default_tests_files/Basic_NO_CACHE_ASYNC_THROUGH took 66 ms.
Passed the test!
runTest BasicNonByteBuffer NO_CACHE ASYNC_THROUGH
Passed the test!

On Oct 6, 2016, at 2:40 PM, Chaomin Yu <cha...@alluxio.com> wrote:

Hi Andy,

Thank you for trying out Alluxio!

Can you please try "kinit -k -t /tmp/alluxio-1.1.0/conf/hdfs.keytab hdfs/_HOST@PHDATA.IO" before starting Alluxio servers, to see whether it help mitigate this error?

Hope this helps,
Chaomin

2016-10-06 18:42:01,297 INFO  security.UserGroupInformation (UserGroupInformation.java:loginUserFromKeytab) - Login successful for user hdfs/ip-172-31-40-120.us-west-2.compute...@PHDATA.IO using keytab file /tmp/alluxio-1.1.0/conf/hdfs.keytab
2016-10-06 18:42:10,819 INFO  server.AbstractConnector (AbstractConnector.java:doStart) - Started SelectChannelConnector@0.0.0.0:19999

-- 
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to alluxio-users+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



-- 
Cheers,
Chaomin





--
Cheers,
Chaomin

Andrew Stadtler

unread,
Oct 6, 2016, 6:52:11 PM10/6/16
to Chaomin Yu, Alluxio Users
That only works until my kerberos ticket hits its expiration.

016-10-06 21:00:46,967 ERROR logger.type (MetricsConfig.java:loadConfigFile) - Error loading metrics configuration file.
2016-10-06 21:00:48,257 WARN  util.NativeCodeLoader (NativeCodeLoader.java:<clinit>) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2016-10-06 21:00:48,471 INFO  security.UserGroupInformation (UserGroupInformation.java:loginUserFromKeytab) - Login successful for user hdfs/ip-172-31-40-120.us-w...@PHDATA.IO using keytab file /tmp/alluxio-1.1.0/conf/hdfs.keytab
2016-10-06 21:00:48,473 INFO  logger.type (AbstractMaster.java:start) - BlockMaster: Starting leader master.
2016-10-06 21:00:48,474 INFO  logger.type (JournalWriter.java:completeAllLogs) - Marking all logs as complete.
2016-10-06 21:00:48,475 INFO  logger.type (JournalWriter.java:completeCurrentLog) - Completed current log: /tmp/alluxio-1.1.0/journal/BlockMaster/log.out to completed log: /tmp/alluxio-1.1.0/journal/BlockMaster/completed/log.00000000000000000001
2016-10-06 21:00:48,477 INFO  logger.type (AbstractMaster.java:start) - BlockMaster: process entire journal before becoming leader master.
2016-10-06 21:00:48,477 INFO  logger.type (JournalTailer.java:processJournalCheckpoint) - BlockMaster: Loading checkpoint file: /tmp/alluxio-1.1.0/journal/BlockMaster/checkpoint.data
2016-10-06 21:00:48,477 INFO  logger.type (JournalReader.java:getCheckpointInputStream) - Opening journal checkpoint file: /tmp/alluxio-1.1.0/journal/BlockMaster/checkpoint.data
2016-10-06 21:00:48,497 INFO  logger.type (JournalReader.java:getNextInputStream) - Opening journal log file: /tmp/alluxio-1.1.0/journal/BlockMaster/completed/log.00000000000000000001
2016-10-06 21:00:48,497 INFO  logger.type (JournalTailer.java:processNextJournalLogFiles) - BlockMaster: Processing a completed log file.
2016-10-06 21:00:48,497 INFO  logger.type (JournalTailer.java:processNextJournalLogFiles) - BlockMaster: Finished processing the log file.
2016-10-06 21:00:48,498 INFO  logger.type (JournalWriter.java:getCheckpointOutputStream) - Creating tmp checkpoint file: /tmp/alluxio-1.1.0/journal/BlockMaster/checkpoint.data.tmp
2016-10-06 21:00:48,498 INFO  logger.type (JournalWriter.java:getCheckpointOutputStream) - Latest journal sequence number: 494 Next journal sequence number: 495
2016-10-06 21:00:48,528 INFO  logger.type (JournalWriter.java:close) - Successfully created tmp checkpoint file: /tmp/alluxio-1.1.0/journal/BlockMaster/checkpoint.data.tmp
2016-10-06 21:00:48,529 INFO  logger.type (JournalWriter.java:close) - Renamed checkpoint file /tmp/alluxio-1.1.0/journal/BlockMaster/checkpoint.data.tmp to /tmp/alluxio-1.1.0/journal/BlockMaster/checkpoint.data
2016-10-06 21:00:48,531 INFO  logger.type (JournalWriter.java:deleteCompletedLogs) - Deleting all completed log files...
2016-10-06 21:00:48,531 INFO  logger.type (JournalWriter.java:deleteCompletedLogs) - Deleting completed log: /tmp/alluxio-1.1.0/journal/BlockMaster/completed/log.00000000000000000001
2016-10-06 21:00:48,531 INFO  logger.type (JournalWriter.java:deleteCompletedLogs) - Finished deleting all completed log files.
2016-10-06 21:00:48,540 INFO  logger.type (JournalWriter.java:openCurrentLog) - Opened current log file: /tmp/alluxio-1.1.0/journal/BlockMaster/log.out
2016-10-06 21:00:48,543 INFO  logger.type (MountTable.java:add) - Mounting hdfs://ip-172-31-37-123.us-west-2.compute.internal:8020 at /
2016-10-06 21:00:48,544 INFO  logger.type (AbstractMaster.java:start) - FileSystemMaster: Starting leader master.
2016-10-06 21:00:48,544 INFO  logger.type (JournalWriter.java:completeAllLogs) - Marking all logs as complete.
2016-10-06 21:00:48,544 INFO  logger.type (AbstractMaster.java:start) - FileSystemMaster: process entire journal before becoming leader master.
2016-10-06 21:00:48,545 INFO  logger.type (JournalTailer.java:processJournalCheckpoint) - FileSystemMaster: Loading checkpoint file: /tmp/alluxio-1.1.0/journal/FileSystemMaster/checkpoint.data
2016-10-06 21:00:48,545 INFO  logger.type (JournalReader.java:getCheckpointInputStream) - Opening journal checkpoint file: /tmp/alluxio-1.1.0/journal/FileSystemMaster/checkpoint.data
2016-10-06 21:00:48,556 INFO  logger.type (JournalWriter.java:getCheckpointOutputStream) - Creating tmp checkpoint file: /tmp/alluxio-1.1.0/journal/FileSystemMaster/checkpoint.data.tmp
2016-10-06 21:00:48,556 INFO  logger.type (JournalWriter.java:getCheckpointOutputStream) - Latest journal sequence number: 1396 Next journal sequence number: 1397
2016-10-06 21:00:48,568 INFO  logger.type (JournalWriter.java:close) - Successfully created tmp checkpoint file: /tmp/alluxio-1.1.0/journal/FileSystemMaster/checkpoint.data.tmp
2016-10-06 21:00:48,568 INFO  logger.type (JournalWriter.java:close) - Renamed checkpoint file /tmp/alluxio-1.1.0/journal/FileSystemMaster/checkpoint.data.tmp to /tmp/alluxio-1.1.0/journal/FileSystemMaster/checkpoint.data
2016-10-06 21:00:48,568 INFO  logger.type (JournalWriter.java:deleteCompletedLogs) - Deleting all completed log files...
2016-10-06 21:00:48,568 INFO  logger.type (JournalWriter.java:deleteCompletedLogs) - Finished deleting all completed log files.
2016-10-06 21:00:48,770 INFO  server.Server (Server.java:doStart) - jetty-7.x.y-SNAPSHOT
2016-10-06 21:00:48,791 INFO  handler.ContextHandler (ContextHandler.java:startContext) - started o.e.j.s.ServletContextHandler{/metrics/json,null}
2016-10-06 21:00:48,803 WARN  webapp.WebInfConfiguration (WebInfConfiguration.java:makeTempDirectory) - Can't reuse /tmp/jetty-0.0.0.0-19999-webapp-_-any-, using /tmp/jetty-0.0.0.0-19999-webapp-_-any-_2467324835338255604
2016-10-06 21:00:48,868 INFO  handler.ContextHandler (ContextHandler.java:startContext) - started o.e.j.w.WebAppContext{/,file:/tmp/alluxio-1.1.0/core/server/src/main/webapp/},/tmp/alluxio-1.1.0/core/server/src/main/webapp
2016-10-06 21:00:59,001 INFO  server.AbstractConnector (AbstractConnector.java:doStart) - Started SelectChann...@0.0.0.0:19999
2016-10-06 21:00:59,002 INFO  logger.type (UIWebServer.java:startWebServer) - Alluxio Master Web service started @ 0.0.0.0/0.0.0.0:19999
2016-10-06 21:00:59,004 INFO  logger.type (AlluxioMaster.java:startServing) - Alluxio master version 1.1.0 started @ ip-172-31-40-120.us-west-2.compute.internal/172.31.40.120:19998
2016-10-06 21:01:03,465 INFO  logger.type (BlockMaster.java:getWorkerId) - getWorkerId(): WorkerNetAddress: WorkerNetAddress{host=ip-172-31-40-120.us-west-2.compute.internal, rpcPort=29998, dataPort=29999, webPort=30000} id: 1
2016-10-06 21:01:03,493 INFO  logger.type (BlockMaster.java:workerRegister) - registerWorker(): MasterWorkerInfo{id=1, workerAddress=WorkerNetAddress{host=ip-172-31-40-120.us-west-2.compute.internal, rpcPort=29998, dataPort=29999, webPort=30000}, capacityBytes=2147483648, usedBytes=0, lastUpdatedTimeMs=1475787663492, blocks=[]}
2016-10-06 21:01:13,315 INFO  logger.type (JournalWriter.java:openCurrentLog) - Opened current log file: /tmp/alluxio-1.1.0/journal/FileSystemMaster/log.out
2016-10-06 21:05:57,019 WARN  security.UserGroupInformation (UserGroupInformation.java:run) - Exception encountered while running the renewal command. Aborting renew thread. ExitCodeException exitCode=1: kinit: Internal credentials cache error when initializing cache

2016-10-06 22:23:40,332 WARN  security.UserGroupInformation (UserGroupInformation.java:doAs) - PriviledgedActionException as:hdfs/ip-172-31-40-120.us-w...@PHDATA.IO (auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
2016-10-06 22:23:40,333 WARN  ipc.Client (Client.java:run) - Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
2016-10-06 22:23:40,334 WARN  security.UserGroupInformation (UserGroupInformation.java:doAs) - PriviledgedActionException as:hdfs/ip-172-31-40-120.us-w...@PHDATA.IO (auth:KERBEROS) cause:java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
2016-10-06 22:23:42,678 WARN  security.UserGroupInformation (UserGroupInformation.java:doAs) - PriviledgedActionException as:hdfs/ip-172-31-40-120.us-w...@PHDATA.IO (auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
2016-10-06 22:23:42,679 WARN  ipc.Client (Client.java:run) - Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
2016-10-06 22:23:42,679 WARN  security.UserGroupInformation (UserGroupInformation.java:doAs) - PriviledgedActionException as:hdfs/ip-172-31-40-120.us-w...@PHDATA.IO (auth:KERBEROS) cause:java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
2016-10-06 22:29:31,340 WARN  security.UserGroupInformation (UserGroupInformation.java:doAs) - PriviledgedActionException as:hdfs/ip-172-31-40-120.us-w...@PHDATA.IO (auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
2016-10-06 22:29:31,341 WARN  ipc.Client (Client.java:run) - Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
2016-10-06 22:29:31,341 WARN  security.UserGroupInformation (UserGroupInformation.java:doAs) - PriviledgedActionException as:hdfs/ip-172-31-40-120.us-w...@PHDATA.IO (auth:KERBEROS) cause:java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
(END)

Chaomin Yu

unread,
Oct 6, 2016, 7:14:32 PM10/6/16
to Andrew Stadtler, Alluxio Users
Can you please check whether your server-side kerberos ticket is renewable? for "hdfs/_HO...@PHDATA.IO"



On Thu, Oct 6, 2016 at 3:52 PM, Andrew Stadtler <an...@phdata.io> wrote:
That only works until my kerberos ticket hits its expiration.

016-10-06 21:00:46,967 ERROR logger.type (MetricsConfig.java:loadConfigFile) - Error loading metrics configuration file.
2016-10-06 21:00:48,257 WARN  util.NativeCodeLoader (NativeCodeLoader.java:<clinit>) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2016-10-06 21:00:48,471 INFO  security.UserGroupInformation (UserGroupInformation.java:loginUserFromKeytab) - Login successful for user hdfs/ip-172-31-40-120.us-west-2.compute...@PHDATA.IO using keytab file /tmp/alluxio-1.1.0/conf/hdfs.keytab
2016-10-06 22:23:40,332 WARN  security.UserGroupInformation (UserGroupInformation.java:doAs) - PriviledgedActionException as:hdfs/ip-172-31-40-120.us-west-2.compute.internal@PHDATA.IO (auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
2016-10-06 22:23:40,333 WARN  ipc.Client (Client.java:run) - Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
2016-10-06 22:23:40,334 WARN  security.UserGroupInformation (UserGroupInformation.java:doAs) - PriviledgedActionException as:hdfs/ip-172-31-40-120.us-west-2.compute.internal@PHDATA.IO (auth:KERBEROS) cause:java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
2016-10-06 22:23:42,678 WARN  security.UserGroupInformation (UserGroupInformation.java:doAs) - PriviledgedActionException as:hdfs/ip-172-31-40-120.us-west-2.compute.internal@PHDATA.IO (auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
2016-10-06 22:23:42,679 WARN  ipc.Client (Client.java:run) - Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
2016-10-06 22:23:42,679 WARN  security.UserGroupInformation (UserGroupInformation.java:doAs) - PriviledgedActionException as:hdfs/ip-172-31-40-120.us-west-2.compute.internal@PHDATA.IO (auth:KERBEROS) cause:java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
2016-10-06 22:29:31,340 WARN  security.UserGroupInformation (UserGroupInformation.java:doAs) - PriviledgedActionException as:hdfs/ip-172-31-40-120.us-west-2.compute.internal@PHDATA.IO (auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
2016-10-06 22:29:31,341 WARN  ipc.Client (Client.java:run) - Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
2016-10-06 22:29:31,341 WARN  security.UserGroupInformation (UserGroupInformation.java:doAs) - PriviledgedActionException as:hdfs/ip-172-31-40-120.us-west-2.compute.internal@PHDATA.IO (auth:KERBEROS) cause:java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
(END)

Hi Andy,

Can you please try "kinit -k -t /tmp/alluxio-1.1.0/conf/hdfs.keytab hdfs/_HO...@PHDATA.IO" before starting Alluxio servers, to see whether it help mitigate this error?

Hope this helps,
Chaomin

Chaomin Yu

unread,
Oct 6, 2016, 7:17:57 PM10/6/16
to Andrew Stadtler, Alluxio Users
According to the log you provided, seems like your ticket is non-renewable after 10/06/2016 20:21:07 ?

Valid starting       Expires              Service principal
10/06/2016 19:51:07  10/06/2016 20:01:07  krbtgt/PHDA...@PHDATA.IO
renew until 10/06/2016 20:21:07

Andrew Stadtler

unread,
Oct 6, 2016, 7:38:19 PM10/6/16
to Chaomin Yu, Alluxio Users
I made it shorter (1 hour) for testing but Kerberos default is 7days so currently I would have to restart the alluxio processes every 7days.

Chaomin Yu

unread,
Oct 6, 2016, 8:30:47 PM10/6/16
to Andrew Stadtler, Alluxio Users
Thanks for the confirmation.

Can you please try "kinit -R" to update the renew until time periodically?
Otherwise, this is a general limitation due to Kerberos ticket renewal.


On Thu, Oct 6, 2016 at 4:38 PM, Andrew Stadtler <an...@phdata.io> wrote:
I made it shorter (1 hour) for testing but Kerberos default is 7days so currently I would have to restart the alluxio processes every 7days.

On Oct 6, 2016, at 6:17 PM, Chaomin Yu <cha...@alluxio.com> wrote:

According to the log you provided, seems like your ticket is non-renewable after 10/06/2016 20:21:07 ?

Valid starting       Expires              Service principal
10/06/2016 19:51:07  10/06/2016 20:01:07  krbtgt/PHDATA.IO@PHDATA.IO

Andrew Stadtler

unread,
Oct 7, 2016, 9:04:10 AM10/7/16
to Chaomin Yu, Alluxio Users
So below is a standard length of a Ticket Granting Ticket(TGT) is 7days and the standard lifetime of Service Ticket is 1day. If I do a renewal (kinit -R) you can see it renews the start and expires dates of the service ticket ahead but the renew until time for the Ticket Granting Ticket(TGT) is still 7days. Essentially the way it works is the TGT is used to obtain the Service Ticket from the Ticket Granting Server(TGS), the service ticket is what is actually use to communicate with the hdfs service. So before 7days end we essentially need to do a new kinit to get a new TGT, there is utility code provided as part of hadoop to do this which I think is our issue and is better explained here http://stackoverflow.com/questions/34616676/should-i-call-ugi-checktgtandreloginfromkeytab-before-every-action-on-hadoop .

[hdfs@ip-172-31-40-120 alluxio-1.1.0]$ kinit -k -t hdfs.keytab hdfs/ip-172-31-40-120.us-w...@PHDATA.IO

[hdfs@ip-172-31-40-120 alluxio-1.1.0]$ klist
Ticket cache: FILE:/tmp/krb5cc_992

Valid starting       Expires              Service principal
10/07/2016 00:30:42  10/08/2016 00:30:42  krbtgt/PHDA...@PHDATA.IO
renew until 10/14/2016 00:30:42

[hdfs@ip-172-31-40-120 alluxio-1.1.0]$ kinit -R

[hdfs@ip-172-31-40-120 alluxio-1.1.0]$ klist
Ticket cache: FILE:/tmp/krb5cc_992

Valid starting       Expires              Service principal
10/07/2016 00:31:17  10/08/2016 00:31:17  krbtgt/PHDA...@PHDATA.IO
renew until 10/14/2016 00:30:42

Chaomin Yu

unread,
Nov 18, 2016, 2:09:48 PM11/18/16
to Andrew Stadtler, Alluxio Users
Thank you Andrew for pointing this out.

Yes, this is a known limitation in current Alluxio release. We've added it in our documentation here. Please let us know if you encountered any other issues with your workaround.

Best,
Chaomin

On Fri, Oct 7, 2016 at 6:03 AM, Andrew Stadtler <an...@phdata.io> wrote:
So below is a standard length of a Ticket Granting Ticket(TGT) is 7days and the standard lifetime of Service Ticket is 1day. If I do a renewal (kinit -R) you can see it renews the start and expires dates of the service ticket ahead but the renew until time for the Ticket Granting Ticket(TGT) is still 7days. Essentially the way it works is the TGT is used to obtain the Service Ticket from the Ticket Granting Server(TGS), the service ticket is what is actually use to communicate with the hdfs service. So before 7days end we essentially need to do a new kinit to get a new TGT, there is utility code provided as part of hadoop to do this which I think is our issue and is better explained here http://stackoverflow.com/questions/34616676/should-i-call-ugi-checktgtandreloginfromkeytab-before-every-action-on-hadoop .
[hdfs@ip-172-31-40-120 alluxio-1.1.0]$ kinit -k -t hdfs.keytab hdfs/ip-172-31-40-120.us-west-2.compute...@PHDATA.IO

[hdfs@ip-172-31-40-120 alluxio-1.1.0]$ klist
Ticket cache: FILE:/tmp/krb5cc_992

Valid starting       Expires              Service principal
10/07/2016 00:30:42  10/08/2016 00:30:42  krbtgt/PHDA...@PHDATA.IO
renew until 10/14/2016 00:30:42

[hdfs@ip-172-31-40-120 alluxio-1.1.0]$ kinit -R

[hdfs@ip-172-31-40-120 alluxio-1.1.0]$ klist
Ticket cache: FILE:/tmp/krb5cc_992
Reply all
Reply to author
Forward
0 new messages