I deploy a spark cluster with standalone mode (no use Mesos), and start the cluster by the defualt user --- Spark. Then i run the shark , suppose i am the user -- Shark。Please pay attention to the user i use
Now i executed some sql sentence in Shark , for example " select * from table" , but it throw the exception:
rg.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.security.AccessControlException: Permission denied: user=shark, access=ALL, inode="/tmp/hive_2013-07- 16_849_1224166558688754033/_task_tmp.-ext-10001" park:upergroup rwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:205)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkSubAccess(FSPermissionChecker.java:174)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:144)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4547)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:2679)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:2643)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2626)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:612)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:406)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44094)
and then i found the directory in HDFS: (the current hive.exec.scratchdir is set to /tmp )
drwxrwxrwx shark supergroup 2013-07-16 15:45 /tmp/hive_2013-07-
16_849_1224166558688754033
obviously,the owner of the directory is Shark (the user i start shark), but this directory contain:
drwxr-xr-x - spark supergroup 0 2013-07-16 15:54 /tmp/hive_2013-07-
16_849_1224166558688754033/-ext-10001
drwxr-xr-x - spark supergroup 0 2013-07-16 15:54 /tmp/hive_2013-07-
16_849_1224166558688754033/_task_tmp.-ext-10001
the owner is Spark( the user i start Spark) !!!!
In my option, when i execute "select xxxx" it would generate some temp file , such as above /tmp/hive-xxxxxx. And the owner of this directory is shark, that is reasonable,because i start shark with user--shark, but why the owner of the subdirectory of this directory ( e.g. _task_tmp.-ext-10001) is spark ( maybe i start the spark with user --Spark).
Now the problem comes, looking at the exception, it said that Shark must has the ALL permission of directory _task_tmp.-ext-10001,but now the access control of _task_tmp.-ext-10001 is (Spark:upergroup rwxr-xr-x)
My question is: i have to start shark and spark using the same user? But if so , it is so
Strange. If several users want to use shark, how could i do?
Yeah currently I have been working on the yarn mode so its passing the delegation token from the master to workers and running things as the appropriate user. But it just runs one job and then exits also right now. Once we have that working we do want to start looking at having the yarn mode be longer lived similar to standalone and allow for multi-tenancy. But unfortunately I haven't looked at it yet.
I'm also working on security for securing the internal SPARK connections, the first cut is a simple shared secret exchange to authenticate.
For the secure hdfs you need to either ship the UGI (with credentials) around or a key tab file. Something that lets the individual user authenticate.
I don't think you need to run the actual process as the user if its just the hdfs security that is the issue, but for a true multi tenant secure environment you would want to. It looks like the ApplicationDescription is already pulling the user from the environment via the user.name property. Are you using using that user for the doAs?
I'm not sure I completely understand how you are using the proxy user? Are you having the spark user be the proxy user that then access hdfs on behalf of the user?
Tom
--