kafka consumption to secure hdfs cluster through standalone.sh failing with kerberos issue

已查看 239 次
跳至第一个未读帖子

vinay k

未读,
2016年11月18日 05:26:592016/11/18
收件人 gobblin-users
Helo all,

I have been trying to consume kafka msges to secure hortonworks hdfs cluster through standalone.sh script using launcher.type=MAPREDUCE. Since its a secure cluster, its asking for keytab authentication. but in job config of gobblin config files there is no means to provide details for standalone code. 

i tried with below properties in gobblin-standalone.properties but these dont appear to be recognized by the script. Could anyone please suggest on how to pass the keytab info for standalone script run with mapreduce launch type

gobblin.yarn.keytab.file.path="/home_dir/xxxx/XXXXX.keytab"
gobblin.yarn.app.queue=default

2016-11-18 00:03:06 CST INFO  [main] gobblin.runtime.app.ServiceBasedAppLauncher  158 - Starting the Gobblin application and all its associated Services
2016-11-18 00:03:06 CST INFO  [JobScheduler STARTING] gobblin.scheduler.JobScheduler  164 - Starting the job scheduler
2016-11-18 00:03:06 CST INFO  [SchedulerService STARTING] org.quartz.impl.StdSchedulerFactory  1172 - Using default implementation for ThreadExecutor
2016-11-18 00:03:06 CST INFO  [SchedulerService STARTING] org.quartz.core.SchedulerSignalerImpl  61 - Initialized Scheduler Signaller of type: class org.quartz.core.SchedulerSignalerImpl
2016-11-18 00:03:06 CST INFO  [SchedulerService STARTING] org.quartz.core.QuartzScheduler  240 - Quartz Scheduler v.2.2.3 created.
2016-11-18 00:03:06 CST INFO  [SchedulerService STARTING] org.quartz.simpl.RAMJobStore  155 - RAMJobStore initialized.
2016-11-18 00:03:06 CST INFO  [SchedulerService STARTING] org.quartz.core.QuartzScheduler  305 - Scheduler meta-data: Quartz Scheduler (v2.2.3) 'LocalJobScheduler' with instanceId 'NON_CLUSTERED'
  Scheduler class: 'org.quartz.core.QuartzScheduler' - running locally.
  NOT STARTED.
  Currently in standby mode.
  Number of jobs executed: 0
  Using thread pool 'org.quartz.simpl.SimpleThreadPool' - with 3 threads.
  Using job-store 'org.quartz.simpl.RAMJobStore' - which does not support persistence. and is not clustered.

2016-11-18 00:03:06 CST INFO  [SchedulerService STARTING] org.quartz.impl.StdSchedulerFactory  1327 - Quartz scheduler 'LocalJobScheduler' initialized from specified file: '/xxxx/xxxx/xxx/gobblin-dist/conf/quartz.properties'
2016-11-18 00:03:06 CST INFO  [SchedulerService STARTING] org.quartz.impl.StdSchedulerFactory  1331 - Quartz scheduler version: 2.2.3
2016-11-18 00:03:06 CST INFO  [SchedulerService STARTING] org.quartz.core.QuartzScheduler  575 - Scheduler LocalJobScheduler_$_NON_CLUSTERED started.
2016-11-18 00:03:06 CST WARN  [JobScheduler STARTING] org.apache.hadoop.util.NativeCodeLoader  62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2016-11-18 00:03:06 CST INFO  [JobScheduler STARTING] gobblin.scheduler.JobScheduler  401 - Scheduling configured jobs
2016-11-18 00:03:06 CST INFO  [JobScheduler STARTING] gobblin.scheduler.JobScheduler  415 - Loaded 1 job configurations
2016-11-18 00:03:07 CST ERROR [JobScheduler-0] gobblin.scheduler.JobScheduler$NonScheduledJobRunner  501 - Failed to run job GobblinKafkaQuickStart
gobblin.runtime.JobException: Failed to run job GobblinKafkaQuickStart
at gobblin.scheduler.JobScheduler.runJob(JobScheduler.java:337)
at gobblin.scheduler.JobScheduler$NonScheduledJobRunner.run(JobScheduler.java:499)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Failed to create job launcher: org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled.  Available:[TOKEN, KERBEROS]
at gobblin.runtime.JobLauncherFactory.newJobLauncher(JobLauncherFactory.java:94)
at gobblin.runtime.JobLauncherFactory.newJobLauncher(JobLauncherFactory.java:59)
at gobblin.scheduler.JobScheduler.runJob(JobScheduler.java:335)
... 4 more
Caused by: org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled.  Available:[TOKEN, KERBEROS]
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1748)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1112)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1108)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1108)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1399)
at gobblin.runtime.FsDatasetStateStore.getLatestDatasetStatesByUrns(FsDatasetStateStore.java:156)
at gobblin.runtime.JobContext.<init>(JobContext.java:136)
at gobblin.runtime.AbstractJobLauncher.<init>(AbstractJobLauncher.java:131)
at gobblin.runtime.local.LocalJobLauncher.<init>(LocalJobLauncher.java:62)
at gobblin.runtime.JobLauncherFactory.newJobLauncher(JobLauncherFactory.java:80)
... 6 more
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): SIMPLE authentication is not enabled.  Available:[TOKEN, KERBEROS]
at org.apache.hadoop.ipc.Client.call(Client.java:1406)
at org.apache.hadoop.ipc.Client.call(Client.java:1359)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy8.getFileInfo(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy8.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:671)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1746)
... 16 more
bash-4

om singh

未读,
2017年3月1日 20:31:042017/3/1
收件人 gobblin-users

Hi Vinay,

are you able to resolve it ?

Issac Buenrostro

未读,
2017年3月1日 20:50:372017/3/1
收件人 om singh、gobblin-users
Are you also having this issue? Depending on how you're running Gobblin there are different ways of solving it.

A simple way is to do a "kinit" before running the Gobblin job (in the same shell). If you are running using "bin/gobblin", you can also provide Kerberos authentication credentials using the option "-kerberosAuthentication".

Let us know if this doesn't work.

--
You received this message because you are subscribed to the Google Groups "gobblin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gobblin-user...@googlegroups.com.
To post to this group, send email to gobbli...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gobblin-users/7aa26ec9-1187-4788-a268-3e7d2a36c501%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

om singh

未读,
2017年3月2日 07:34:022017/3/2
收件人 Issac Buenrostro、gobblin-users
Hi,

Yes, i am facing the same issue. i am running the code in mapreduce mode.

here is my conf file:

job.name=PullFromKafkaToHDFS
job.group=Wikipedia
job.description=Pull from kafka and write int HDFS
job.lock.enabled=false

kafka.brokers=10.66.107.225:9092
topic.name=mmt-contacts

source.class=gobblin.source.extractor.extract.kafka.KafkaSimpleSource
extract.namespace=gobblin.extract.kafka

writer.builder.class=gobblin.writer.SimpleDataWriterBuilder
writer.file.path.type=tablename
writer.destination.type=HDFS
writer.output.format=txt

data.publisher.type=gobblin.publisher.BaseDataPublisher

mr.job.max.mappers=1

metrics.reporting.file.enabled=true
metrics.log.dir=${env:GOBBLIN_WORK_DIR}/metrics
metrics.reporting.file.suffix=txt

bootstrap.with.offset=earliest

fs.uri=hdfs://10.66.53.27:8020/app/hydra-analytics/gobblin
writer.fs.uri=hdfs://10.66.53.27:8020/app/hydra-analytics/gobblin/
state.store.fs.uri=hdfs://10.66.53.27:8020/app/hydra-analytics/gobblin/

mr.job.root.dir=/gobblin-kafka/working
state.store.dir=/gobblin-kafka/state-store
task.data.root.dir=/jobs/kafkaetl/gobblin/gobblin-kafka/task-data
data.publisher.final.dir=/gobblintest/job-output

After kinit , i am facing different error message. Application is trying to write into

Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=hydra-analytics, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x

Don't know why how application trying to write into "/"

kindly let me know if i made any mistake in configuration file or need to make some other changes as well

Regards,
Om


To unsubscribe from this group and stop receiving emails from it, send an email to gobblin-users+unsubscribe@googlegroups.com.

Issac Buenrostro

未读,
2017年3月2日 11:25:432017/3/2
收件人 om singh、Issac Buenrostro、gobblin-users
Hi Om,
Can you include the full stack trace?

om singh

未读,
2017年3月2日 12:00:322017/3/2
收件人 Issac Buenrostro、Issac Buenrostro、gobblin-users
Sure, kindly find attached log file.

Regards,
Om
log

Issac Buenrostro

未读,
2017年3月2日 12:34:212017/3/2
收件人 om singh、Issac Buenrostro、gobblin-users
Hi Om,
The problem is that your configuration is referring to directories that don't exist at the "/" level (for example, /gobblin-kafka, /gobblintest, possibly others). Your user is not allowed to create those directories in the HDFS cluster. Please make sure to use existing directories (for example, you could put everything under /user/<your-user>).
Best
Issac

To unsubscribe from this group and stop receiving emails from it, send an email to gobblin-user...@googlegroups.com.

To post to this group, send email to gobbli...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gobblin-users/7aa26ec9-1187-4788-a268-3e7d2a36c501%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "gobblin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gobblin-user...@googlegroups.com.

To post to this group, send email to gobbli...@googlegroups.com.

om singh

未读,
2017年3月6日 01:53:542017/3/6
收件人 Issac Buenrostro、Issac Buenrostro、gobblin-users
Thanks Issac,

Applicaiton was trying to write at "/" directory because i passed --workdir ~/wordir as parameter

[nohup ~/gobblin-dist/bin/gobblin-mapreduce.sh --workdir ~/workdir --conf ~/gobblin-dist/job-conf/mmt-kafka.pull -fs hdfs://10.66.53.27:8020 --jt hdfs://cmmt-53-28.mmt.com:8032]

I changed the workdir path and now facing  facing different issue. Look like error is coming while launch mr job.

Error log:
2017-03-06 12:08:37 IST INFO  [main] gobblin.runtime.mapreduce.GobblinWorkUnitsInputFormat  92 - Found 1 input files at hdfs://10.66.53.27:8020/app/hydra-analytics/gobblin-kafka/working/PullFromKafkaToHDFS/job_PullFromKafkaToHDFS_1488782304917/input: [FileStatus{path=hdfs://10.66.53.27:8020/app/hydra-analytics/gobblin-kafka/working/PullFromKafkaToHDFS/job_PullFromKafkaToHDFS_1488782304917/input/multitask_PullFromKafkaToHDFS_1488782304917_0.mwu; isDirectory=false; length=382598; replication=3; blocksize=67108864; modification_time=1488782316084; access_time=1488782315866; owner=hydra-analytics; group=supergroup; permission=rw-r--r--; isSymlink=false}]
2017-03-06 12:08:37 IST INFO  [main] org.apache.hadoop.mapreduce.JobSubmitter  396 - number of splits:1
2017-03-06 12:08:37 IST INFO  [main] org.apache.hadoop.mapreduce.JobSubmitter  479 - Submitting tokens for job: job_1488349530373_110905
2017-03-06 12:08:37 IST INFO  [main] org.apache.hadoop.mapreduce.JobSubmitter  481 - Kind: HDFS_DELEGATION_TOKEN, Service: 10.66.53.27:8020, Ident: (HDFS_DELEGATION_TOKEN token 5264650 for hydra-analytics)
2017-03-06 12:08:37 IST INFO  [main] org.apache.hadoop.mapreduce.JobSubmitter  441 - Cleaning up the staging area /user/hydra-analytics/.staging/job_1488349530373_110905
2017-03-06 12:08:37 IST INFO  [TaskStateCollectorService STOPPING] gobblin.runtime.TaskStateCollectorService  103 - Stopping the TaskStateCollectorService
2017-03-06 12:08:37 IST WARN  [TaskStateCollectorService STOPPING] gobblin.runtime.TaskStateCollectorService  131 - No output task state files found in hdfs://10.66.53.27:8020/app/hydra-analytics/gobblin-kafka/working/PullFromKafkaToHDFS/job_PullFromKafkaToHDFS_1488782304917/output/job_PullFromKafkaToHDFS_1488782304917
2017-03-06 12:08:37 IST INFO  [main] gobblin.runtime.mapreduce.MRJobLauncher  498 - Deleted working directory hdfs://10.66.53.27:8020/app/hydra-analytics/gobblin-kafka/working/PullFromKafkaToHDFS/job_PullFromKafkaToHDFS_1488782304917
2017-03-06 12:08:37 IST ERROR [main] gobblin.runtime.AbstractJobLauncher  420 - Failed to launch and run job job_PullFromKafkaToHDFS_1488782304917: java.lang.NoSuchMethodError: org.apache.hadoop.yarn.client.ClientRMProxy.getRMDelegationTokenService(Lorg/apache/hadoop/conf/Configuration;)Lorg/apache/hadoop/io/Text;
java.lang.NoSuchMethodError: org.apache.hadoop.yarn.client.ClientRMProxy.getRMDelegationTokenService(Lorg/apache/hadoop/conf/Configuration;)Lorg/apache/hadoop/io/Text;
    at org.apache.hadoop.mapred.ResourceMgrDelegate.getRMDelegationTokenService(ResourceMgrDelegate.java:165)
    at org.apache.hadoop.mapred.YARNRunner.addHistoryToken(YARNRunner.java:192)
    at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:282)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
    at gobblin.runtime.mapreduce.MRJobLauncher.runWorkUnits(MRJobLauncher.java:227)
    at gobblin.runtime.AbstractJobLauncher.launchJob(AbstractJobLauncher.java:395)
    at gobblin.runtime.mapreduce.CliMRJobLauncher.launchJob(CliMRJobLauncher.java:89)
    at gobblin.runtime.mapreduce.CliMRJobLauncher.run(CliMRJobLauncher.java:66)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
    at gobblin.runtime.mapreduce.CliMRJobLauncher.main(CliMRJobLauncher.java:111)

    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)


Regards,
Om

To unsubscribe from this group and stop receiving emails from it, send an email to gobblin-users+unsubscribe@googlegroups.com.

To post to this group, send email to gobbli...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gobblin-users/7aa26ec9-1187-4788-a268-3e7d2a36c501%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "gobblin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gobblin-users+unsubscribe@googlegroups.com.

To post to this group, send email to gobbli...@googlegroups.com.

om singh

未读,
2017年3月10日 06:28:582017/3/10
收件人 Issac Buenrostro、Issac Buenrostro、gobblin-users
Thanks Issac,

Finally i am able to resolve above mention issue by following steps :

kinit key  tab (e.g kinit -kt <key tab file> <key tab user> )

Build command : ./gradlew clean assemble -PuseHadoop2 -PhadoopVersion=2.6.0-cdh5.8.3 --stacktrace

Adding following jars with map reduce command : (gobblin-mapreduce.sh)

nohup ~/gobblin-dist/bin/gobblin-mapreduce.sh --workdir hdfs://*****:8020/app/hydra-analytics/workdir --conf ~/gobblin-dist/job-conf/**-kafka.pull --jars  /hydra-analytics/gobblin-dist/lib/reflections-0.9.10.jar,/hydra-analytics/gobblin-dist/lib/guava-retrying-2.0.0.jar,/hydra-analytics/gobblin-dist/lib/javassist-3.18.2-GA.jar,/hydra-analytics/gobblin-dist/lib/kafka-avro-serializer-2.0.1.jar,/hydra-analytics/gobblin-dist/lib/kafka-json-serializer-2.0.1.jar,/hydra-analytics/gobblin-dist/lib/hadoop-common-2.6.0-cdh5.8.3.jar,/hydra-analytics/gobblin-dist/lib/gobblin-metrics-base-0.9.0-365-gcfea157.jar,/hydra-analytics/gobblin-dist/lib/gobblin-metrics-0.9.0-365-gcfea157.jar,/hydra-analytics/gobblin-dist/lib/gobblin-core-base-0.9.0-365-gcfea157.jar,/hydra-analytics/gobblin-dist/lib/gobblin-kafka-08-0.9.0-365-gcfea157.jar,/hydra-analytics/gobblin-dist/lib/gobblin-kafka-common-0.9.0-365-gcfea157.jar,/hydra-analytics/gobblin-dist/lib/guava-15.0.jar,/hydra-analytics/gobblin-dist/lib/opencsv-3.8.jar &

Regards,
Om

vinay k

未读,
2017年3月21日 01:55:502017/3/21
收件人 gobblin-users
Hi Om,

I did not proceed with gobblin after we encountered the issues. I used camus since the setup was easier and my project needs were met with camus.

Thanks,
Vinay 
回复全部
回复作者
转发
0 个新帖子