kafka consumption to secure hdfs cluster through standalone.sh failing with kerberos issue

238 views
Skip to first unread message

vinay k

unread,
Nov 18, 2016, 5:26:59 AM11/18/16
to gobblin-users
Helo all,

I have been trying to consume kafka msges to secure hortonworks hdfs cluster through standalone.sh script using launcher.type=MAPREDUCE. Since its a secure cluster, its asking for keytab authentication. but in job config of gobblin config files there is no means to provide details for standalone code. 

i tried with below properties in gobblin-standalone.properties but these dont appear to be recognized by the script. Could anyone please suggest on how to pass the keytab info for standalone script run with mapreduce launch type

gobblin.yarn.keytab.file.path="/home_dir/xxxx/XXXXX.keytab"
gobblin.yarn.app.queue=default

2016-11-18 00:03:06 CST INFO  [main] gobblin.runtime.app.ServiceBasedAppLauncher  158 - Starting the Gobblin application and all its associated Services
2016-11-18 00:03:06 CST INFO  [JobScheduler STARTING] gobblin.scheduler.JobScheduler  164 - Starting the job scheduler
2016-11-18 00:03:06 CST INFO  [SchedulerService STARTING] org.quartz.impl.StdSchedulerFactory  1172 - Using default implementation for ThreadExecutor
2016-11-18 00:03:06 CST INFO  [SchedulerService STARTING] org.quartz.core.SchedulerSignalerImpl  61 - Initialized Scheduler Signaller of type: class org.quartz.core.SchedulerSignalerImpl
2016-11-18 00:03:06 CST INFO  [SchedulerService STARTING] org.quartz.core.QuartzScheduler  240 - Quartz Scheduler v.2.2.3 created.
2016-11-18 00:03:06 CST INFO  [SchedulerService STARTING] org.quartz.simpl.RAMJobStore  155 - RAMJobStore initialized.
2016-11-18 00:03:06 CST INFO  [SchedulerService STARTING] org.quartz.core.QuartzScheduler  305 - Scheduler meta-data: Quartz Scheduler (v2.2.3) 'LocalJobScheduler' with instanceId 'NON_CLUSTERED'
  Scheduler class: 'org.quartz.core.QuartzScheduler' - running locally.
  NOT STARTED.
  Currently in standby mode.
  Number of jobs executed: 0
  Using thread pool 'org.quartz.simpl.SimpleThreadPool' - with 3 threads.
  Using job-store 'org.quartz.simpl.RAMJobStore' - which does not support persistence. and is not clustered.

2016-11-18 00:03:06 CST INFO  [SchedulerService STARTING] org.quartz.impl.StdSchedulerFactory  1327 - Quartz scheduler 'LocalJobScheduler' initialized from specified file: '/xxxx/xxxx/xxx/gobblin-dist/conf/quartz.properties'
2016-11-18 00:03:06 CST INFO  [SchedulerService STARTING] org.quartz.impl.StdSchedulerFactory  1331 - Quartz scheduler version: 2.2.3
2016-11-18 00:03:06 CST INFO  [SchedulerService STARTING] org.quartz.core.QuartzScheduler  575 - Scheduler LocalJobScheduler_$_NON_CLUSTERED started.
2016-11-18 00:03:06 CST WARN  [JobScheduler STARTING] org.apache.hadoop.util.NativeCodeLoader  62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2016-11-18 00:03:06 CST INFO  [JobScheduler STARTING] gobblin.scheduler.JobScheduler  401 - Scheduling configured jobs
2016-11-18 00:03:06 CST INFO  [JobScheduler STARTING] gobblin.scheduler.JobScheduler  415 - Loaded 1 job configurations
2016-11-18 00:03:07 CST ERROR [JobScheduler-0] gobblin.scheduler.JobScheduler$NonScheduledJobRunner  501 - Failed to run job GobblinKafkaQuickStart
gobblin.runtime.JobException: Failed to run job GobblinKafkaQuickStart
at gobblin.scheduler.JobScheduler.runJob(JobScheduler.java:337)
at gobblin.scheduler.JobScheduler$NonScheduledJobRunner.run(JobScheduler.java:499)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Failed to create job launcher: org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled.  Available:[TOKEN, KERBEROS]
at gobblin.runtime.JobLauncherFactory.newJobLauncher(JobLauncherFactory.java:94)
at gobblin.runtime.JobLauncherFactory.newJobLauncher(JobLauncherFactory.java:59)
at gobblin.scheduler.JobScheduler.runJob(JobScheduler.java:335)
... 4 more
Caused by: org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled.  Available:[TOKEN, KERBEROS]
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1748)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1112)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1108)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1108)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1399)
at gobblin.runtime.FsDatasetStateStore.getLatestDatasetStatesByUrns(FsDatasetStateStore.java:156)
at gobblin.runtime.JobContext.<init>(JobContext.java:136)
at gobblin.runtime.AbstractJobLauncher.<init>(AbstractJobLauncher.java:131)
at gobblin.runtime.local.LocalJobLauncher.<init>(LocalJobLauncher.java:62)
at gobblin.runtime.JobLauncherFactory.newJobLauncher(JobLauncherFactory.java:80)
... 6 more
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): SIMPLE authentication is not enabled.  Available:[TOKEN, KERBEROS]
at org.apache.hadoop.ipc.Client.call(Client.java:1406)
at org.apache.hadoop.ipc.Client.call(Client.java:1359)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy8.getFileInfo(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy8.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:671)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1746)
... 16 more
bash-4

om singh

unread,
Mar 1, 2017, 8:31:04 PM3/1/17
to gobblin-users

Hi Vinay,

are you able to resolve it ?

Issac Buenrostro

unread,
Mar 1, 2017, 8:50:37 PM3/1/17
to om singh, gobblin-users
Are you also having this issue? Depending on how you're running Gobblin there are different ways of solving it.

A simple way is to do a "kinit" before running the Gobblin job (in the same shell). If you are running using "bin/gobblin", you can also provide Kerberos authentication credentials using the option "-kerberosAuthentication".

Let us know if this doesn't work.

--
You received this message because you are subscribed to the Google Groups "gobblin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gobblin-user...@googlegroups.com.
To post to this group, send email to gobbli...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gobblin-users/7aa26ec9-1187-4788-a268-3e7d2a36c501%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

om singh

unread,
Mar 2, 2017, 7:34:02 AM3/2/17
to Issac Buenrostro, gobblin-users
Hi,

Yes, i am facing the same issue. i am running the code in mapreduce mode.

here is my conf file:

job.name=PullFromKafkaToHDFS
job.group=Wikipedia
job.description=Pull from kafka and write int HDFS
job.lock.enabled=false

kafka.brokers=10.66.107.225:9092
topic.name=mmt-contacts

source.class=gobblin.source.extractor.extract.kafka.KafkaSimpleSource
extract.namespace=gobblin.extract.kafka

writer.builder.class=gobblin.writer.SimpleDataWriterBuilder
writer.file.path.type=tablename
writer.destination.type=HDFS
writer.output.format=txt

data.publisher.type=gobblin.publisher.BaseDataPublisher

mr.job.max.mappers=1

metrics.reporting.file.enabled=true
metrics.log.dir=${env:GOBBLIN_WORK_DIR}/metrics
metrics.reporting.file.suffix=txt

bootstrap.with.offset=earliest

fs.uri=hdfs://10.66.53.27:8020/app/hydra-analytics/gobblin
writer.fs.uri=hdfs://10.66.53.27:8020/app/hydra-analytics/gobblin/
state.store.fs.uri=hdfs://10.66.53.27:8020/app/hydra-analytics/gobblin/

mr.job.root.dir=/gobblin-kafka/working
state.store.dir=/gobblin-kafka/state-store
task.data.root.dir=/jobs/kafkaetl/gobblin/gobblin-kafka/task-data
data.publisher.final.dir=/gobblintest/job-output

After kinit , i am facing different error message. Application is trying to write into

Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=hydra-analytics, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x

Don't know why how application trying to write into "/"

kindly let me know if i made any mistake in configuration file or need to make some other changes as well

Regards,
Om


To unsubscribe from this group and stop receiving emails from it, send an email to gobblin-users+unsubscribe@googlegroups.com.

Issac Buenrostro

unread,
Mar 2, 2017, 11:25:43 AM3/2/17
to om singh, Issac Buenrostro, gobblin-users
Hi Om,
Can you include the full stack trace?

om singh

unread,
Mar 2, 2017, 12:00:32 PM3/2/17
to Issac Buenrostro, Issac Buenrostro, gobblin-users
Sure, kindly find attached log file.

Regards,
Om
log

Issac Buenrostro

unread,
Mar 2, 2017, 12:34:21 PM3/2/17
to om singh, Issac Buenrostro, gobblin-users
Hi Om,
The problem is that your configuration is referring to directories that don't exist at the "/" level (for example, /gobblin-kafka, /gobblintest, possibly others). Your user is not allowed to create those directories in the HDFS cluster. Please make sure to use existing directories (for example, you could put everything under /user/<your-user>).
Best
Issac

To unsubscribe from this group and stop receiving emails from it, send an email to gobblin-user...@googlegroups.com.

To post to this group, send email to gobbli...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gobblin-users/7aa26ec9-1187-4788-a268-3e7d2a36c501%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "gobblin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gobblin-user...@googlegroups.com.

To post to this group, send email to gobbli...@googlegroups.com.

om singh

unread,
Mar 6, 2017, 1:53:54 AM3/6/17
to Issac Buenrostro, Issac Buenrostro, gobblin-users
Thanks Issac,

Applicaiton was trying to write at "/" directory because i passed --workdir ~/wordir as parameter

[nohup ~/gobblin-dist/bin/gobblin-mapreduce.sh --workdir ~/workdir --conf ~/gobblin-dist/job-conf/mmt-kafka.pull -fs hdfs://10.66.53.27:8020 --jt hdfs://cmmt-53-28.mmt.com:8032]

I changed the workdir path and now facing  facing different issue. Look like error is coming while launch mr job.

Error log:
2017-03-06 12:08:37 IST INFO  [main] gobblin.runtime.mapreduce.GobblinWorkUnitsInputFormat  92 - Found 1 input files at hdfs://10.66.53.27:8020/app/hydra-analytics/gobblin-kafka/working/PullFromKafkaToHDFS/job_PullFromKafkaToHDFS_1488782304917/input: [FileStatus{path=hdfs://10.66.53.27:8020/app/hydra-analytics/gobblin-kafka/working/PullFromKafkaToHDFS/job_PullFromKafkaToHDFS_1488782304917/input/multitask_PullFromKafkaToHDFS_1488782304917_0.mwu; isDirectory=false; length=382598; replication=3; blocksize=67108864; modification_time=1488782316084; access_time=1488782315866; owner=hydra-analytics; group=supergroup; permission=rw-r--r--; isSymlink=false}]
2017-03-06 12:08:37 IST INFO  [main] org.apache.hadoop.mapreduce.JobSubmitter  396 - number of splits:1
2017-03-06 12:08:37 IST INFO  [main] org.apache.hadoop.mapreduce.JobSubmitter  479 - Submitting tokens for job: job_1488349530373_110905
2017-03-06 12:08:37 IST INFO  [main] org.apache.hadoop.mapreduce.JobSubmitter  481 - Kind: HDFS_DELEGATION_TOKEN, Service: 10.66.53.27:8020, Ident: (HDFS_DELEGATION_TOKEN token 5264650 for hydra-analytics)
2017-03-06 12:08:37 IST INFO  [main] org.apache.hadoop.mapreduce.JobSubmitter  441 - Cleaning up the staging area /user/hydra-analytics/.staging/job_1488349530373_110905
2017-03-06 12:08:37 IST INFO  [TaskStateCollectorService STOPPING] gobblin.runtime.TaskStateCollectorService  103 - Stopping the TaskStateCollectorService
2017-03-06 12:08:37 IST WARN  [TaskStateCollectorService STOPPING] gobblin.runtime.TaskStateCollectorService  131 - No output task state files found in hdfs://10.66.53.27:8020/app/hydra-analytics/gobblin-kafka/working/PullFromKafkaToHDFS/job_PullFromKafkaToHDFS_1488782304917/output/job_PullFromKafkaToHDFS_1488782304917
2017-03-06 12:08:37 IST INFO  [main] gobblin.runtime.mapreduce.MRJobLauncher  498 - Deleted working directory hdfs://10.66.53.27:8020/app/hydra-analytics/gobblin-kafka/working/PullFromKafkaToHDFS/job_PullFromKafkaToHDFS_1488782304917
2017-03-06 12:08:37 IST ERROR [main] gobblin.runtime.AbstractJobLauncher  420 - Failed to launch and run job job_PullFromKafkaToHDFS_1488782304917: java.lang.NoSuchMethodError: org.apache.hadoop.yarn.client.ClientRMProxy.getRMDelegationTokenService(Lorg/apache/hadoop/conf/Configuration;)Lorg/apache/hadoop/io/Text;
java.lang.NoSuchMethodError: org.apache.hadoop.yarn.client.ClientRMProxy.getRMDelegationTokenService(Lorg/apache/hadoop/conf/Configuration;)Lorg/apache/hadoop/io/Text;
    at org.apache.hadoop.mapred.ResourceMgrDelegate.getRMDelegationTokenService(ResourceMgrDelegate.java:165)
    at org.apache.hadoop.mapred.YARNRunner.addHistoryToken(YARNRunner.java:192)
    at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:282)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
    at gobblin.runtime.mapreduce.MRJobLauncher.runWorkUnits(MRJobLauncher.java:227)
    at gobblin.runtime.AbstractJobLauncher.launchJob(AbstractJobLauncher.java:395)
    at gobblin.runtime.mapreduce.CliMRJobLauncher.launchJob(CliMRJobLauncher.java:89)
    at gobblin.runtime.mapreduce.CliMRJobLauncher.run(CliMRJobLauncher.java:66)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
    at gobblin.runtime.mapreduce.CliMRJobLauncher.main(CliMRJobLauncher.java:111)

    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)


Regards,
Om

To unsubscribe from this group and stop receiving emails from it, send an email to gobblin-users+unsubscribe@googlegroups.com.

To post to this group, send email to gobbli...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gobblin-users/7aa26ec9-1187-4788-a268-3e7d2a36c501%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "gobblin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gobblin-users+unsubscribe@googlegroups.com.

To post to this group, send email to gobbli...@googlegroups.com.

om singh

unread,
Mar 10, 2017, 6:28:58 AM3/10/17
to Issac Buenrostro, Issac Buenrostro, gobblin-users
Thanks Issac,

Finally i am able to resolve above mention issue by following steps :

kinit key  tab (e.g kinit -kt <key tab file> <key tab user> )

Build command : ./gradlew clean assemble -PuseHadoop2 -PhadoopVersion=2.6.0-cdh5.8.3 --stacktrace

Adding following jars with map reduce command : (gobblin-mapreduce.sh)

nohup ~/gobblin-dist/bin/gobblin-mapreduce.sh --workdir hdfs://*****:8020/app/hydra-analytics/workdir --conf ~/gobblin-dist/job-conf/**-kafka.pull --jars  /hydra-analytics/gobblin-dist/lib/reflections-0.9.10.jar,/hydra-analytics/gobblin-dist/lib/guava-retrying-2.0.0.jar,/hydra-analytics/gobblin-dist/lib/javassist-3.18.2-GA.jar,/hydra-analytics/gobblin-dist/lib/kafka-avro-serializer-2.0.1.jar,/hydra-analytics/gobblin-dist/lib/kafka-json-serializer-2.0.1.jar,/hydra-analytics/gobblin-dist/lib/hadoop-common-2.6.0-cdh5.8.3.jar,/hydra-analytics/gobblin-dist/lib/gobblin-metrics-base-0.9.0-365-gcfea157.jar,/hydra-analytics/gobblin-dist/lib/gobblin-metrics-0.9.0-365-gcfea157.jar,/hydra-analytics/gobblin-dist/lib/gobblin-core-base-0.9.0-365-gcfea157.jar,/hydra-analytics/gobblin-dist/lib/gobblin-kafka-08-0.9.0-365-gcfea157.jar,/hydra-analytics/gobblin-dist/lib/gobblin-kafka-common-0.9.0-365-gcfea157.jar,/hydra-analytics/gobblin-dist/lib/guava-15.0.jar,/hydra-analytics/gobblin-dist/lib/opencsv-3.8.jar &

Regards,
Om

vinay k

unread,
Mar 21, 2017, 1:55:50 AM3/21/17
to gobblin-users
Hi Om,

I did not proceed with gobblin after we encountered the issues. I used camus since the setup was easier and my project needs were met with camus.

Thanks,
Vinay 
Reply all
Reply to author
Forward
0 new messages