AccessControlException when running a Scalding job consisting of more than one step

Vitaly Gordon

unread,

Jan 31, 2013, 8:48:22 PM1/31/13

to cascadi...@googlegroups.com

I am running this job that compiles into two steps.

The job setup process fails on the exception below. From looking at the job xml, it seems that the output path of the mappers are in a strange directory which might be what causing it.

Anyone else had this problem/knows how to fix it/knows how to control the cascading temp working directories?

Thanks,

Vitaly

org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission denied: user=vgordon, access=WRITE, inode="":hdfs:hdfs:rwxr-xr-x
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
	at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95)
	at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
	at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:1216)
	at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:321)
	at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1126)
	at org.apache.hadoop.mapred.FileOutputCommitter.setupJob(FileOutputCommitter.java:52)
	at org.apache.hadoop.mapred.OutputCommitter.setupJob(OutputCommitter.java:146)
	at org.apache.hadoop.mapred.Task.runJobSetupTask(Task.java:1101)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:361)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
	at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.security.AccessControlException: Permission denied: user=vgordon, access=WRITE, inode="":hdfs:hdfs:rwxr-xr-x
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:199)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:180)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:128)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5214)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:5188)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:2060)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:2029)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.mkdirs(NameNode.java:794)
	at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)

	at org.apache.hadoop.ipc.Client.call(Client.java:1070)
	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
	at $Proxy7.mkdirs(Unknown Source)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
	at $Proxy7.mkdirs(Unknown Source)
	at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:1214)
	... 11 more

Hugo Gävert

unread,

Feb 1, 2013, 6:54:57 AM2/1/13

to cascadi...@googlegroups.com

Hi!

Not sure anymore if this is the same as what I had when hitting strange permission problems while running jobs with more than 1 step. But I need to now always have this in the beginning of the job class:

override def config(implicit mode : Mode) : Map[AnyRef, AnyRef] = {

super.config ++ Map("mapreduce.job.complete.cancel.delegation.tokens" -> "false")

}

Alternatively, you can give this -Dmapreduce.job.complete.cancel.delegation.tokens=false as parameter when running the job. If it's the same problem of course :-)

--

HG.

Vitaly Gordon

unread,

Feb 1, 2013, 12:18:14 PM2/1/13

to cascadi...@googlegroups.com

Thanks Hugo.

I have tried this method before posting and it didn't work for me.

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Vitaly Gordon

unread,

Feb 1, 2013, 1:00:33 PM2/1/13

to cascadi...@googlegroups.com

Also, from what it seems from the job.xml, the output of the first job is a local directory instead of an "hdfs" directory and that I think what is causing it. Any ideas on how can I change that?

Oscar Boykin

unread,

Feb 1, 2013, 3:03:30 PM2/1/13

to cascadi...@googlegroups.com

What versions of hadoop are in play here?

On Fri, Feb 1, 2013 at 3:54 AM, Hugo Gävert <hga...@gmail.com> wrote:

--

You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

--
Oscar Boykin :: @posco :: https://twitter.com/intent/user?screen_name=posco

Vitaly Gordon

unread,

Feb 1, 2013, 3:14:00 PM2/1/13

to cascadi...@googlegroups.com

Hadoop 1.0.4-p3

Oscar Boykin

unread,

Feb 1, 2013, 6:31:38 PM2/1/13

to cascadi...@googlegroups.com

We still use CDH3 internally. I doubt this is a scalding issue, and probably more a cascading question.

Vitaly Gordon

unread,

Feb 7, 2013, 6:50:54 PM2/7/13

to cascadi...@googlegroups.com

Yeah, It's definitely a cascading issue.

Listed below is all the info. I have no idea how to read this, but like I've mentioned, I've noticed that the output.dir in the job.xml doesn't look like an HDFS dir, but more like a physical dir.

Any help would be greatly appriciated.

I've created the following cascading example that has this problem for me (it is based on the Impatient code with just 2 added lines)

----------------------------------------------------------------------------------------------

import cascading.operation.aggregator.Sum;

import java.util.Properties;

import cascading.flow.Flow;

import cascading.flow.FlowDef;

import cascading.flow.hadoop.HadoopFlowConnector;

import cascading.operation.aggregator.Count;

import cascading.operation.regex.RegexFilter;

import cascading.operation.regex.RegexSplitGenerator;

import cascading.pipe.Each;

import cascading.pipe.Every;

import cascading.pipe.GroupBy;

import cascading.pipe.Pipe;

import cascading.property.AppProps;

import cascading.scheme.Scheme;

import cascading.scheme.hadoop.TextDelimited;

import cascading.tap.Tap;

import cascading.tap.hadoop.Hfs;

import cascading.tuple.Fields;

public class CascadingMain {

public static void main( String[] args ) {

String docPath = args[ 0 ];

String wcPath = args[ 1 ];

Properties properties = new Properties();

AppProps.setApplicationJarClass( properties, CascadingMain.class );

HadoopFlowConnector flowConnector = new HadoopFlowConnector( properties );

// create source and sink taps

Tap docTap = new Hfs( new TextDelimited( true, "\t" ), docPath );

Tap wcTap = new Hfs( new TextDelimited( true, "\t" ), wcPath );

// specify a regex operation to split the "document" text lines into a token stream

Fields token = new Fields( "token" );

Fields text = new Fields( "text" );

RegexSplitGenerator splitter = new RegexSplitGenerator( token, "[ \\[\\]\$\$,.]" );

// only returns "token"

Pipe docPipe = new Each( "token", text, splitter, Fields.RESULTS );

// determine the word counts

Pipe wcPipe = new Pipe( "wc", docPipe );

wcPipe = new GroupBy( wcPipe, token );

wcPipe = new Every( wcPipe, Fields.ALL, new Count(), Fields.ALL );

wcPipe = new GroupBy(wcPipe, new Fields("count")); //added line

wcPipe = new Every(wcPipe, Fields.ALL, new Sum(), Fields.ALL); //added line

// connect the taps, pipes, etc., into a flow

FlowDef flowDef = FlowDef.flowDef()

.setName( "wc" )

.addSource( docPipe, docTap )

.addTailSink( wcPipe, wcTap );

// write a DOT file and run the flow

Flow wcFlow = flowConnector.connect( flowDef );

wcFlow.writeDOT( "dot/wc.dot" );

wcFlow.complete();

}

-----------------------------------------------------------------------------

The runtime exception

Exception in thread "main" cascading.flow.FlowException: step failed: (1/2), with job id: job_201302051840_31683, please see cluster logs for failure messages

at cascading.flow.planner.FlowStepJob.blockOnJob(FlowStepJob.java:206)

at cascading.flow.planner.FlowStepJob.start(FlowStepJob.java:145)

at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:120)

at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:42)

at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)

at java.util.concurrent.FutureTask.run(FutureTask.java:138)

at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

at java.lang.Thread.run(Thread.java:662)

-------------------------------------------------------------------------------------------

The job setup exception

org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission denied: user=vgordon, access=WRITE, inode="":hdfs:hdfs:rwxr-xr-x
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
	at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95)
	at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
	at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:1216)
	at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:321)
	at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1126)
	at org.apache.hadoop.mapred.FileOutputCommitter.setupJob(FileOutputCommitter.java:52)
	at org.apache.hadoop.mapred.OutputCommitter.setupJob(OutputCommitter.java:146)
	at org.apache.hadoop.mapred.Task.runJobSetupTask(Task.java:1101)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:361)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
	at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.security.AccessControlException: Permission denied: user=vgordon, access=WRITE, inode="":hdfs:hdfs:rwxr-xr-x
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:199)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:180)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:128)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5214)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:5188)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:2060)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:2029)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.mkdirs(NameNode.java:794)
	at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)

	at org.apache.hadoop.ipc.Client.call(Client.java:1070)
	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
	at $Proxy7.mkdirs(Unknown Source)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
	at $Proxy7.mkdirs(Unknown Source)
	at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:1214)
	... 11 more

Chris K Wensel

unread,

Feb 7, 2013, 11:26:07 PM2/7/13

to cascadi...@googlegroups.com

I don't follow how this is a Cascading issue.

Granting yourself proper permissions might be a start.

Or just a bug in Hadoop. You might review the compatibility list..

http://www.cascading.org/support/compatibility/

ckw

--

Chris K Wensel

ch...@concurrentinc.com

http://concurrentinc.com

Vitaly Gordon

unread,

Feb 7, 2013, 11:29:33 PM2/7/13

to cascadi...@googlegroups.com

Chris, like I mentioned on my post, the problem doesn't seem to be the permissions, but the directory where Cascading tries to write, it just isn't a proper hdfs path.

All other frameworks I've used, such as Scoobi and Pig had no issues with multi-step M/R jobs, and by looking at their output directories (on job.xml) they write to proper directories.

I am using Hadoop 1.0.3

Chris K Wensel

unread,

Feb 8, 2013, 12:17:45 AM2/8/13

to cascadi...@googlegroups.com

Your using CDH, not Apache, per prior emails. There is a difference.

Cascading never writes to your default (or any) FS except between MR jobs or via Taps specified in your code, and then it always writes to your hadoop temp path, "hadoop.tmp.dir" unless "cascading.tmp.dir" is set.

So setting permissions on that path, or making sure your default configuration is correct would be a place to start troubleshooting.

ckw

Reply all

Reply to author

Forward