kafka consumption to secure hdfs cluster through standalone.sh failing with kerberos issue

vinay k

未读，

2016年11月18日 05:26:592016/11/18

收件人 gobblin-users

Helo all,

I have been trying to consume kafka msges to secure hortonworks hdfs cluster through standalone.sh script using launcher.type=MAPREDUCE. Since its a secure cluster, its asking for keytab authentication. but in job config of gobblin config files there is no means to provide details for standalone code.

i tried with below properties in gobblin-standalone.properties but these dont appear to be recognized by the script. Could anyone please suggest on how to pass the keytab info for standalone script run with mapreduce launch type

gobblin.yarn.keytab.file.path="/home_dir/xxxx/XXXXX.keytab"

gobblin.yarn.keytab.principal.name="xxx...@xxxxxx.xxxx.COM"

gobblin.yarn.app.queue=default

2016-11-18 00:03:06 CST INFO [main] gobblin.runtime.app.ServiceBasedAppLauncher 158 - Starting the Gobblin application and all its associated Services

2016-11-18 00:03:06 CST INFO [JobScheduler STARTING] gobblin.scheduler.JobScheduler 164 - Starting the job scheduler

2016-11-18 00:03:06 CST INFO [SchedulerService STARTING] org.quartz.impl.StdSchedulerFactory 1172 - Using default implementation for ThreadExecutor

2016-11-18 00:03:06 CST INFO [SchedulerService STARTING] org.quartz.core.SchedulerSignalerImpl 61 - Initialized Scheduler Signaller of type: class org.quartz.core.SchedulerSignalerImpl

2016-11-18 00:03:06 CST INFO [SchedulerService STARTING] org.quartz.core.QuartzScheduler 240 - Quartz Scheduler v.2.2.3 created.

2016-11-18 00:03:06 CST INFO [SchedulerService STARTING] org.quartz.simpl.RAMJobStore 155 - RAMJobStore initialized.

2016-11-18 00:03:06 CST INFO [SchedulerService STARTING] org.quartz.core.QuartzScheduler 305 - Scheduler meta-data: Quartz Scheduler (v2.2.3) 'LocalJobScheduler' with instanceId 'NON_CLUSTERED'

Scheduler class: 'org.quartz.core.QuartzScheduler' - running locally.

NOT STARTED.

Currently in standby mode.

Number of jobs executed: 0

Using thread pool 'org.quartz.simpl.SimpleThreadPool' - with 3 threads.

Using job-store 'org.quartz.simpl.RAMJobStore' - which does not support persistence. and is not clustered.

2016-11-18 00:03:06 CST INFO [SchedulerService STARTING] org.quartz.impl.StdSchedulerFactory 1327 - Quartz scheduler 'LocalJobScheduler' initialized from specified file: '/xxxx/xxxx/xxx/gobblin-dist/conf/quartz.properties'

2016-11-18 00:03:06 CST INFO [SchedulerService STARTING] org.quartz.impl.StdSchedulerFactory 1331 - Quartz scheduler version: 2.2.3

2016-11-18 00:03:06 CST INFO [SchedulerService STARTING] org.quartz.core.QuartzScheduler 575 - Scheduler LocalJobScheduler_$_NON_CLUSTERED started.

2016-11-18 00:03:06 CST WARN [JobScheduler STARTING] org.apache.hadoop.util.NativeCodeLoader 62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

2016-11-18 00:03:06 CST INFO [JobScheduler STARTING] gobblin.scheduler.JobScheduler 401 - Scheduling configured jobs

2016-11-18 00:03:06 CST INFO [JobScheduler STARTING] gobblin.scheduler.JobScheduler 415 - Loaded 1 job configurations

2016-11-18 00:03:07 CST ERROR [JobScheduler-0] gobblin.scheduler.JobScheduler$NonScheduledJobRunner 501 - Failed to run job GobblinKafkaQuickStart

gobblin.runtime.JobException: Failed to run job GobblinKafkaQuickStart

at gobblin.scheduler.JobScheduler.runJob(JobScheduler.java:337)

at gobblin.scheduler.JobScheduler$NonScheduledJobRunner.run(JobScheduler.java:499)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

Caused by: java.lang.RuntimeException: Failed to create job launcher: org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS]

at gobblin.runtime.JobLauncherFactory.newJobLauncher(JobLauncherFactory.java:94)

at gobblin.runtime.JobLauncherFactory.newJobLauncher(JobLauncherFactory.java:59)

at gobblin.scheduler.JobScheduler.runJob(JobScheduler.java:335)

... 4 more

Caused by: org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS]

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)

at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

at java.lang.reflect.Constructor.newInstance(Constructor.java:422)

at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)

at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)

at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1748)

at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1112)

at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1108)

at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)

at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1108)

at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1399)

at gobblin.runtime.FsDatasetStateStore.getLatestDatasetStatesByUrns(FsDatasetStateStore.java:156)

at gobblin.runtime.JobContext.<init>(JobContext.java:136)

at gobblin.runtime.AbstractJobLauncher.<init>(AbstractJobLauncher.java:131)

at gobblin.runtime.local.LocalJobLauncher.<init>(LocalJobLauncher.java:62)

at gobblin.runtime.JobLauncherFactory.newJobLauncher(JobLauncherFactory.java:80)

... 6 more

Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS]

at org.apache.hadoop.ipc.Client.call(Client.java:1406)

at org.apache.hadoop.ipc.Client.call(Client.java:1359)

at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)

at com.sun.proxy.$Proxy8.getFileInfo(Unknown Source)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:497)

at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)

at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)

at com.sun.proxy.$Proxy8.getFileInfo(Unknown Source)

at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:671)

at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1746)

... 16 more

bash-4

om singh

未读，

2017年3月1日 20:31:042017/3/1

收件人 gobblin-users

Hi Vinay,

are you able to resolve it ?

Issac Buenrostro

未读，

2017年3月1日 20:50:372017/3/1

收件人 om singh、gobblin-users

Are you also having this issue? Depending on how you're running Gobblin there are different ways of solving it.

A simple way is to do a "kinit" before running the Gobblin job (in the same shell). If you are running using "bin/gobblin", you can also provide Kerberos authentication credentials using the option "-kerberosAuthentication".

Let us know if this doesn't work.

--
You received this message because you are subscribed to the Google Groups "gobblin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gobblin-user...@googlegroups.com.
To post to this group, send email to gobbli...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gobblin-users/7aa26ec9-1187-4788-a268-3e7d2a36c501%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

om singh

未读，

2017年3月2日 07:34:022017/3/2

收件人 Issac Buenrostro、gobblin-users

Hi,

Yes, i am facing the same issue. i am running the code in mapreduce mode.

here is my conf file:

job.name=PullFromKafkaToHDFS
job.group=Wikipedia
job.description=Pull from kafka and write int HDFS
job.lock.enabled=false

kafka.brokers=10.66.107.225:9092
topic.name=mmt-contacts

source.class=gobblin.source.extractor.extract.kafka.KafkaSimpleSource
extract.namespace=gobblin.extract.kafka

writer.builder.class=gobblin.writer.SimpleDataWriterBuilder
writer.file.path.type=tablename
writer.destination.type=HDFS
writer.output.format=txt

data.publisher.type=gobblin.publisher.BaseDataPublisher

mr.job.max.mappers=1

metrics.reporting.file.enabled=true
metrics.log.dir=${env:GOBBLIN_WORK_DIR}/metrics
metrics.reporting.file.suffix=txt

bootstrap.with.offset=earliest

fs.uri=hdfs://10.66.53.27:8020/app/hydra-analytics/gobblin
writer.fs.uri=hdfs://10.66.53.27:8020/app/hydra-analytics/gobblin/
state.store.fs.uri=hdfs://10.66.53.27:8020/app/hydra-analytics/gobblin/

mr.job.root.dir=/gobblin-kafka/working
state.store.dir=/gobblin-kafka/state-store
task.data.root.dir=/jobs/kafkaetl/gobblin/gobblin-kafka/task-data
data.publisher.final.dir=/gobblintest/job-output

After kinit , i am facing different error message. Application is trying to write into

Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=hydra-analytics, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x

Don't know why how application trying to write into "/"

kindly let me know if i made any mistake in configuration file or need to make some other changes as well

Regards,

Om

To unsubscribe from this group and stop receiving emails from it, send an email to gobblin-users+unsubscribe@googlegroups.com.

Issac Buenrostro

未读，

2017年3月2日 11:25:432017/3/2

收件人 om singh、Issac Buenrostro、gobblin-users

Hi Om,

Can you include the full stack trace?

To view this discussion on the web visit https://groups.google.com/d/msgid/gobblin-users/CAObiJGcELvKVgpvcdZN2u2sfBhMhu0oJT3WYYL4BRc3y_2%3Dryg%40mail.gmail.com.

om singh

未读，

2017年3月2日 12:00:322017/3/2

收件人 Issac Buenrostro、Issac Buenrostro、gobblin-users

Sure, kindly find attached log file.

Regards,

Om

log

Issac Buenrostro

未读，

2017年3月2日 12:34:212017/3/2

收件人 om singh、Issac Buenrostro、gobblin-users

Hi Om,

The problem is that your configuration is referring to directories that don't exist at the "/" level (for example, /gobblin-kafka, /gobblintest, possibly others). Your user is not allowed to create those directories in the HDFS cluster. Please make sure to use existing directories (for example, you could put everything under /user/<your-user>).

Best

Issac

To unsubscribe from this group and stop receiving emails from it, send an email to gobblin-user...@googlegroups.com.

To post to this group, send email to gobbli...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gobblin-users/7aa26ec9-1187-4788-a268-3e7d2a36c501%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "gobblin-users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to gobblin-user...@googlegroups.com.

To post to this group, send email to gobbli...@googlegroups.com.

om singh

未读，

2017年3月6日 01:53:542017/3/6

收件人 Issac Buenrostro、Issac Buenrostro、gobblin-users

Thanks Issac,

Applicaiton was trying to write at "/" directory because i passed --workdir ~/wordir as parameter

[nohup ~/gobblin-dist/bin/gobblin-mapreduce.sh --workdir ~/workdir --conf ~/gobblin-dist/job-conf/mmt-kafka.pull -fs hdfs://10.66.53.27:8020 --jt hdfs://cmmt-53-28.mmt.com:8032]

I changed the workdir path and now facing facing different issue. Look like error is coming while launch mr job.

Error log:
2017-03-06 12:08:37 IST INFO [main] gobblin.runtime.mapreduce.GobblinWorkUnitsInputFormat 92 - Found 1 input files at hdfs://10.66.53.27:8020/app/hydra-analytics/gobblin-kafka/working/PullFromKafkaToHDFS/job_PullFromKafkaToHDFS_1488782304917/input: [FileStatus{path=hdfs://10.66.53.27:8020/app/hydra-analytics/gobblin-kafka/working/PullFromKafkaToHDFS/job_PullFromKafkaToHDFS_1488782304917/input/multitask_PullFromKafkaToHDFS_1488782304917_0.mwu; isDirectory=false; length=382598; replication=3; blocksize=67108864; modification_time=1488782316084; access_time=1488782315866; owner=hydra-analytics; group=supergroup; permission=rw-r--r--; isSymlink=false}]
2017-03-06 12:08:37 IST INFO [main] org.apache.hadoop.mapreduce.JobSubmitter 396 - number of splits:1
2017-03-06 12:08:37 IST INFO [main] org.apache.hadoop.mapreduce.JobSubmitter 479 - Submitting tokens for job: job_1488349530373_110905
2017-03-06 12:08:37 IST INFO [main] org.apache.hadoop.mapreduce.JobSubmitter 481 - Kind: HDFS_DELEGATION_TOKEN, Service: 10.66.53.27:8020, Ident: (HDFS_DELEGATION_TOKEN token 5264650 for hydra-analytics)
2017-03-06 12:08:37 IST INFO [main] org.apache.hadoop.mapreduce.JobSubmitter 441 - Cleaning up the staging area /user/hydra-analytics/.staging/job_1488349530373_110905
2017-03-06 12:08:37 IST INFO [TaskStateCollectorService STOPPING] gobblin.runtime.TaskStateCollectorService 103 - Stopping the TaskStateCollectorService
2017-03-06 12:08:37 IST WARN [TaskStateCollectorService STOPPING] gobblin.runtime.TaskStateCollectorService 131 - No output task state files found in hdfs://10.66.53.27:8020/app/hydra-analytics/gobblin-kafka/working/PullFromKafkaToHDFS/job_PullFromKafkaToHDFS_1488782304917/output/job_PullFromKafkaToHDFS_1488782304917
2017-03-06 12:08:37 IST INFO [main] gobblin.runtime.mapreduce.MRJobLauncher 498 - Deleted working directory hdfs://10.66.53.27:8020/app/hydra-analytics/gobblin-kafka/working/PullFromKafkaToHDFS/job_PullFromKafkaToHDFS_1488782304917
2017-03-06 12:08:37 IST ERROR [main] gobblin.runtime.AbstractJobLauncher 420 - Failed to launch and run job job_PullFromKafkaToHDFS_1488782304917: java.lang.NoSuchMethodError: org.apache.hadoop.yarn.client.ClientRMProxy.getRMDelegationTokenService(Lorg/apache/hadoop/conf/Configuration;)Lorg/apache/hadoop/io/Text;
java.lang.NoSuchMethodError: org.apache.hadoop.yarn.client.ClientRMProxy.getRMDelegationTokenService(Lorg/apache/hadoop/conf/Configuration;)Lorg/apache/hadoop/io/Text;
    at org.apache.hadoop.mapred.ResourceMgrDelegate.getRMDelegationTokenService(ResourceMgrDelegate.java:165)
    at org.apache.hadoop.mapred.YARNRunner.addHistoryToken(YARNRunner.java:192)
    at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:282)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
    at gobblin.runtime.mapreduce.MRJobLauncher.runWorkUnits(MRJobLauncher.java:227)
    at gobblin.runtime.AbstractJobLauncher.launchJob(AbstractJobLauncher.java:395)
    at gobblin.runtime.mapreduce.CliMRJobLauncher.launchJob(CliMRJobLauncher.java:89)
    at gobblin.runtime.mapreduce.CliMRJobLauncher.run(CliMRJobLauncher.java:66)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
    at gobblin.runtime.mapreduce.CliMRJobLauncher.main(CliMRJobLauncher.java:111)

    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)

at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

Regards,

Om

To unsubscribe from this group and stop receiving emails from it, send an email to gobblin-users+unsubscribe@googlegroups.com.

To post to this group, send email to gobbli...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gobblin-users/7aa26ec9-1187-4788-a268-3e7d2a36c501%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "gobblin-users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to gobblin-users+unsubscribe@googlegroups.com.

To post to this group, send email to gobbli...@googlegroups.com.

om singh

未读，

2017年3月10日 06:28:582017/3/10

收件人 Issac Buenrostro、Issac Buenrostro、gobblin-users

Thanks Issac,

Finally i am able to resolve above mention issue by following steps :

kinit key tab (e.g kinit -kt <key tab file> <key tab user> )

Build command : ./gradlew clean assemble -PuseHadoop2 -PhadoopVersion=2.6.0-cdh5.8.3 --stacktrace

Adding following jars with map reduce command : (gobblin-mapreduce.sh)

nohup ~/gobblin-dist/bin/gobblin-mapreduce.sh --workdir hdfs://*****:8020/app/hydra-analytics/workdir --conf ~/gobblin-dist/job-conf/**-kafka.pull --jars /hydra-analytics/gobblin-dist/lib/reflections-0.9.10.jar,/hydra-analytics/gobblin-dist/lib/guava-retrying-2.0.0.jar,/hydra-analytics/gobblin-dist/lib/javassist-3.18.2-GA.jar,/hydra-analytics/gobblin-dist/lib/kafka-avro-serializer-2.0.1.jar,/hydra-analytics/gobblin-dist/lib/kafka-json-serializer-2.0.1.jar,/hydra-analytics/gobblin-dist/lib/hadoop-common-2.6.0-cdh5.8.3.jar,/hydra-analytics/gobblin-dist/lib/gobblin-metrics-base-0.9.0-365-gcfea157.jar,/hydra-analytics/gobblin-dist/lib/gobblin-metrics-0.9.0-365-gcfea157.jar,/hydra-analytics/gobblin-dist/lib/gobblin-core-base-0.9.0-365-gcfea157.jar,/hydra-analytics/gobblin-dist/lib/gobblin-kafka-08-0.9.0-365-gcfea157.jar,/hydra-analytics/gobblin-dist/lib/gobblin-kafka-common-0.9.0-365-gcfea157.jar,/hydra-analytics/gobblin-dist/lib/guava-15.0.jar,/hydra-analytics/gobblin-dist/lib/opencsv-3.8.jar &

Regards,

Om

vinay k

未读，

2017年3月21日 01:55:502017/3/21

收件人 gobblin-users

Hi Om,

I did not proceed with gobblin after we encountered the issues. I used camus since the setup was easier and my project needs were met with camus.

Thanks,

Vinay

回复全部

回复作者