Ā Ā Ā
Ā Ā Ā Ā IĀ deployĀ aĀ spark cluster withĀ standalone mode (no useĀ Mesos), andĀ start the cluster by the defualt user ---Ā Spark. Then i run the shark , suppose i am the user --Ā SharkćPlease pay attention to the user i use
Ā Ā Ā Ā Now i executed some sql sentence in Shark , for example " select * from table" , but it throw the exceptionļ¼
Ā
Ā Ā Ā Ā Ā rg.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.security.AccessControlException: Permission denied: user=shark, access=ALL, inode="/tmp/hive_2013-07-Ā Ā Ā Ā 16_849_1224166558688754033/_task_tmp.-ext-10001" park:upergroupĀ rwxr-xr-x
Ā Ā Ā Ā Ā Ā Ā at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:205)
Ā Ā Ā Ā Ā Ā Ā at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkSubAccess(FSPermissionChecker.java:174)
Ā Ā Ā Ā Ā Ā Ā at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:144)
Ā Ā Ā Ā Ā Ā Ā at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4547)
Ā Ā Ā Ā Ā Ā Ā at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:2679)
Ā Ā Ā Ā Ā Ā Ā at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:2643)
Ā Ā Ā Ā Ā Ā Ā at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2626)
Ā Ā Ā Ā Ā Ā Ā at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:612)
Ā Ā Ā Ā Ā Ā Ā at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:406)
Ā Ā Ā Ā Ā Ā Ā at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44094)
Ā Ā Ā Ā
Ā Ā Ā and then i found the directory in HDFS: (the currentĀ hive.exec.scratchdirĀ Ā is set toĀ /tmpĀ )
Ā Ā drwxrwxrwxĀ sharkĀ supergroupĀ Ā 2013-07-16 15:45 /tmp/hive_2013-07-Ā
Ā Ā 16_849_1224166558688754033
Ā Ā obviouslyļ¼the owner of the directory is Shark (the user i start shark), but this directory contain:
Ā Ā Ā drwxr-xr-xĀ Ā - spark supergroupĀ Ā Ā Ā Ā Ā Ā Ā Ā 0 2013-07-16 15:54 /tmp/hive_2013-07-Ā
Ā Ā Ā 16_849_1224166558688754033/-ext-10001
Ā Ā Ā drwxr-xr-xĀ Ā - spark supergroupĀ Ā Ā Ā Ā Ā Ā Ā Ā 0 2013-07-16 15:54 /tmp/hive_2013-07-Ā
Ā Ā 16_849_1224166558688754033/_task_tmp.-ext-10001
Ā Ā Ā Ā the owner isĀ Spark( the user i start Sparkļ¼Ā !!!!
Ā Ā Ā In my option, when i execute "select xxxx"Ā it would generate some temp file , such as above /tmp/hive-xxxxxx. And the owner of this directory is shark, that is reasonableļ¼because i start shark with user--shark,Ā butĀ why the owner of the subdirectory of this directory ( e.g. _task_tmp.-ext-10001) is spark ( maybe i start the spark with user --Spark).
Ā Ā Ā Ā Ā Now the problem comes, looking at the exception, it said thatĀ Ā SharkĀ must has the ALL permissionĀ of directoryĀ _task_tmp.-ext-10001ļ¼but now the access control of _task_tmp.-ext-10001 is (Spark:upergroupĀ rwxr-xr-x)
Ā Ā Ā Ā My question is: i have to start shark and spark using the same user? But if so , it is so
Strange. If several users want to use shark, how could i do?
Yeah currently I have been working on the yarn mode so its passing the delegation token from the master to workers and running things as the appropriate user.Ā Ā But it just runs one job and then exits also right now. Once we have that working we do want to start looking at having the yarn mode be longer lived similar to standalone and allow for multi-tenancy.Ā Ā But unfortunately I haven't looked at it yet.
I'm also working on security for securing the internal SPARK connections, the first cut is a simple shared secret exchange to authenticate.
For the secure hdfs you need to either ship the UGI (with credentials) around or a key tab file.Ā Ā Something that lets the individual user authenticate.
I don't think you need to run the actual process as the user if its just the hdfs security that is the issue, but for a true multi tenant secure environment you would want to. It looks like the ApplicationDescription is already pulling the user from the environmentĀ Ā via the user.name property. Are you using using that user for the doAs?
I'm not sure I completely understand how you are using the proxy user? Are you having the spark user be the proxy user that then access hdfs on behalf of the user?
Tom
--