SFTP all files into HDFS

743 views
Skip to first unread message

Bala Kasaram

unread,
May 27, 2016, 8:48:15 AM5/27/16
to gobblin-users
Hi Team,

It was great tools I found till now to ingest any data[sql + any*.* files] into hdfs. Good work Gobblin team.

I would like to load all files[images, txt, videos, etc..] in my SFTP dump into HDFS.

I have below questions:
1. What will be sample conf file for this ?
2. Do I need to write any class file for write classes then any guide for this?

Thanks and Regards
Kasaram Bala
Digital Tech Corp Hadoop expert


Sahil Takiar

unread,
May 27, 2016, 12:37:24 PM5/27/16
to Bala Kasaram, gobblin-users
The config file you have here should work: https://groups.google.com/d/msg/gobblin-users/z1yJmOaGhaw/4b7B1lG7CQAJ

You just need to set source.filebased.fs.uri to "sftp://[connection-uri]"

--
You received this message because you are subscribed to the Google Groups "gobblin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gobblin-user...@googlegroups.com.
To post to this group, send email to gobbli...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gobblin-users/b29ef6e1-0ed9-4e42-91ed-d2977cb1bfe9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Bala Kasaram

unread,
May 30, 2016, 6:52:16 AM5/30/16
to gobblin-users, kasar...@gmail.com
Hi Sahil,

I have done same and adjusted my pull file like below, but getting error while copying data from source:

job.name=ftpalltest

job.group=ftptohdfs

job.description=Job to copy data from sftp to hdfs

# Source properties

source.class=gobblin.data.management.copy.CloseableFsCopySource

source.conn.username=gobblin

source.conn.password=Password7897

source.conn.host=gobblindemo-ssh.azurehdinsight.net

source.conn.port=22

# The SftpSource class will look for data on the SFTP server under this directory

source.filebased.fs.uri=sftp://gobblindemo-ssh.azurehdinsight.net:22

source.filebased.data.directory=/home/gobblin

# Publisher properties - hdfs path

data.publisher.type=gobblin.data.management.copy.publisher.CopyDataPublisher

data.publisher.final.dir=wasb://gobbl...@gobblindemo.blob.core.windows.net/home/gobblin/

#/hadoop/hdfs/namenode

# Writer properties

writer.builder.class=gobblin.data.management.copy.writer.FileAwareInputStreamDataWriterBuilder

# run local

launcher.type=LOCAL

job.lock.enabled=false

----------------------------------------

 

 

commands:
------

export GOBBLIN_JOB_CONFIG_DIR=/hadoop/gb/conf/
export GOBBLIN_WORK_DIR=/hadoop/gb/conf/
bin/gobblin-standalone.sh start


Error log:
-----------------

2016-05-30 10:46:43 UTC INFO [main] org.quartz.impl.StdSchedulerFactory 1172 - Using default implementation for ThreadExecutor


2016-05-30 10:46:43 UTC INFO [main] org.quartz.core.SchedulerSignalerImpl 61 - Initialized Scheduler Signaller of type: class org.quartz.core.SchedulerSignalerImpl


2016-05-30 10:46:43 UTC INFO [main] org.quartz.core.QuartzScheduler 240 - Quartz Scheduler v.2.2.3 created.


2016-05-30 10:46:43 UTC INFO [main] org.quartz.simpl.RAMJobStore 155 - RAMJobStore initialized.


2016-05-30 10:46:43 UTC INFO [main] org.quartz.core.QuartzScheduler 305 - Scheduler meta-data: Quartz Scheduler (v2.2.3) 'LocalJobScheduler' with instanceId 'NON_CLUSTERED'


Scheduler class: 'org.quartz.core.QuartzScheduler' - running locally.


NOT STARTED.


Currently in standby mode.


Number of jobs executed: 0


Using thread pool 'org.quartz.simpl.SimpleThreadPool' - with 3 threads.


Using job-store 'org.quartz.simpl.RAMJobStore' - which does not support persistence. and is not clustered.


2016-05-30 10:46:43 UTC INFO [main] org.quartz.impl.StdSchedulerFactory 1327 - Quartz scheduler 'LocalJobScheduler' initialized from specified file: '/hadoop/gb/gobblin/gobblin-dist/conf/quartz.properties'


2016-05-30 10:46:43 UTC INFO [main] org.quartz.impl.StdSchedulerFactory 1331 - Quartz scheduler version: 2.2.3


2016-05-30 10:46:43 UTC INFO [main] gobblin.runtime.app.ServiceBasedAppLauncher 144 - Starting the Gobblin application and all its associated Services


2016-05-30 10:46:43 UTC INFO [JobScheduler STARTING] gobblin.scheduler.JobScheduler 144 - Starting the job scheduler


2016-05-30 10:46:43 UTC INFO [JobScheduler STARTING] org.quartz.core.QuartzScheduler 575 - Scheduler LocalJobScheduler_$_NON_CLUSTERED started.


2016-05-30 10:46:43 UTC INFO [JobScheduler STARTING] gobblin.scheduler.JobScheduler 351 - Scheduling locally configured jobs


2016-05-30 10:46:43 UTC WARN [JobScheduler STARTING] gobblin.util.SchedulerUtils 214 - Skipped file /hadoop/gb/conf/.gobblin-pid that has an unsupported extension


2016-05-30 10:46:43 UTC INFO [JobScheduler STARTING] gobblin.scheduler.JobScheduler 363 - Loaded 1 job configuration


2016-05-30 10:46:43 UTC WARN [JobScheduler-0] org.apache.hadoop.util.NativeCodeLoader 62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable


2016-05-30 10:46:44 UTC WARN [JobScheduler-0] gobblin.runtime.JobContext 248 - Property task.data.root.dir is missing.


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.util.ClustersNames 69 - no default cluster mapping found


2016-05-30 10:46:44 UTC INFO [TaskExecutor STARTING] gobblin.runtime.TaskExecutor 119 - Starting the task executor


2016-05-30 10:46:44 UTC INFO [LocalTaskStateTracker STARTING] gobblin.runtime.AbstractTaskStateTracker 64 - Starting the task state tracker


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.util.ExecutorsUtils 125 - Attempting to shutdown ExecutorService: java.util.concurrent.ThreadPoolExecutor@565dc839[Shutting down, pool size = 2, active threads = 0, queued tasks = 0, completed tasks = 2]


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.util.ExecutorsUtils 144 - Successfully shutdown ExecutorService: java.util.concurrent.ThreadPoolExecutor@565dc839[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 2]


2016-05-30 10:46:44 UTC WARN [JobScheduler-0] gobblin.password.PasswordManager 190 - Property encrypt.key.loc not set. Cannot decrypt any encrypted password.


2016-05-30 10:46:44 UTC WARN [JobScheduler-0] gobblin.password.PasswordManager 190 - Property encrypt.key.loc not set. Cannot decrypt any encrypted password.


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper 146 - Attempting to connect to source via SFTP with privateKey: null knownHosts: null userName: gobblin hostName: gobblindemo-ssh.azurehdinsight.net port: 22 proxyHost: null proxyPort: -1


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper 167 - Known hosts path is not set, StrictHostKeyChecking will be turned off


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger 333 - Connecting to gobblindemo-ssh.azurehdinsight.net port 22


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger 333 - Connection established


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger 333 - Remote version string: SSH-2.0-OpenSSH_5.9p1 Debian-5ubuntu1.8


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger 333 - Local version string: SSH-2.0-JSCH-0.1.53


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger 333 - CheckCiphers: aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger 333 - CheckKexes: diffie-hellman-group14-sha1,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger 333 - CheckSignatures: ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger 333 - SSH_MSG_KEXINIT sent


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger 333 - SSH_MSG_KEXINIT received


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger 333 - kex: server: ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1,diffie-hellman-group1-sha1


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger 333 - kex: server: ssh-rsa,ssh-dss,ecdsa-sha2-nistp256


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger 333 - kex: server: aes128-ctr,aes192-ctr,aes256-ctr


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger 333 - kex: server: aes128-ctr,aes192-ctr,aes256-ctr


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger 333 - kex: server: hmac-md5,hmac-sha1,uma...@openssh.com,hmac-sha2-256,hmac-sha2-256-96,hmac-sha2-512,hmac-sha2-512-96,hmac-ripemd160,hmac-ri...@openssh.com,hmac-sha1-96,hmac-md5-96


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger 333 - kex: server: hmac-md5,hmac-sha1,uma...@openssh.com,hmac-sha2-256,hmac-sha2-256-96,hmac-sha2-512,hmac-sha2-512-96,hmac-ripemd160,hmac-ri...@openssh.com,hmac-sha1-96,hmac-md5-96


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger 333 - kex: server: none,zl...@openssh.com


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger 333 - kex: server: none,zl...@openssh.com


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger 333 - kex: server:


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger 333 - kex: server:


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger 333 - kex: client: ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group14-sha1,diffie-hellman-group-exchange-sha256,diffie-hellman-group-exchange-sha1,diffie-hellman-group1-sha1


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger 333 - kex: client: ssh-rsa,ssh-dss,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger 333 - kex: client: aes128-ctr,aes128-cbc,3des-ctr,3des-cbc,blowfish-cbc,aes192-ctr,aes192-cbc,aes256-ctr,aes256-cbc


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger 333 - kex: client: aes128-ctr,aes128-cbc,3des-ctr,3des-cbc,blowfish-cbc,aes192-ctr,aes192-cbc,aes256-ctr,aes256-cbc


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger 333 - kex: client: hmac-md5,hmac-sha1,hmac-sha2-256,hmac-sha1-96,hmac-md5-96


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger 333 - kex: client: hmac-md5,hmac-sha1,hmac-sha2-256,hmac-sha1-96,hmac-md5-96


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger 333 - kex: client: none


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger 333 - kex: client: none


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger 333 - kex: client:


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger 333 - kex: client:


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger 333 - kex: server->client aes128-ctr hmac-md5 none


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger 333 - kex: client->server aes128-ctr hmac-md5 none


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger 333 - SSH_MSG_KEX_ECDH_INIT sent


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger 333 - expecting SSH_MSG_KEX_ECDH_REPLY


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger 333 - ssh_rsa_verify: signature true


2016-05-30 10:46:44 UTC WARN [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger 336 - Permanently added 'gobblindemo-ssh.azurehdinsight.net' (RSA) to the list of known hosts.


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger 333 - SSH_MSG_NEWKEYS sent


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger 333 - SSH_MSG_NEWKEYS received


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger 333 - SSH_MSG_SERVICE_REQUEST sent


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger 333 - SSH_MSG_SERVICE_ACCEPT received


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper$MyUserInfo 386 - Authorized uses only. All activity may be monitored and reported.


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger 333 - Authentications that can continue: publickey,password


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger 333 - Next authentication method: publickey


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger 333 - Authentications that can continue: password


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger 333 - Next authentication method: password


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger 333 - Authentication succeeded (password).


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper 186 - Finished connecting to source


2016-05-30 10:46:45 UTC INFO [JobScheduler-0] gobblin.util.reflection.GobblinConstructorUtils 84 - Found accessible constructor for class class gobblin.data.management.copy.CopyableGlobDatasetFinder with parameter types [class gobblin.source.extractor.extract.sftp.SftpLightWeightFileSystem, class java.util.Properties].


2016-05-30 10:46:45 UTC ERROR [JobScheduler-0] gobblin.runtime.SourceDecorator 58 - Failed to get work units for job job_ftpalltest_1464605203511


java.lang.RuntimeException: java.io.IOException: java.lang.reflect.InvocationTargetException


at gobblin.data.management.copy.CopySource.getWorkunits(CopySource.java:217)


at gobblin.runtime.SourceDecorator.getWorkunits(SourceDecorator.java:51)


at gobblin.runtime.AbstractJobLauncher.launchJob(AbstractJobLauncher.java:241)


at gobblin.scheduler.JobScheduler.runJob(JobScheduler.java:328)


at gobblin.scheduler.JobScheduler.runJob(JobScheduler.java:287)


at gobblin.scheduler.JobScheduler$NonScheduledJobRunner.run(JobScheduler.java:521)


at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)


at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)


at java.lang.Thread.run(Thread.java:745)


Caused by: java.io.IOException: java.lang.reflect.InvocationTargetException


at gobblin.data.management.dataset.DatasetUtils.instantiateDatasetFinder(DatasetUtils.java:83)


at gobblin.data.management.copy.CopySource.getWorkunits(CopySource.java:135)


... 8 more


Caused by: java.lang.reflect.InvocationTargetException


at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)


at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)


at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)


at java.lang.reflect.Constructor.newInstance(Constructor.java:526)


at org.apache.commons.lang3.reflect.ConstructorUtils.invokeConstructor(ConstructorUtils.java:118)


at org.apache.commons.lang3.reflect.ConstructorUtils.invokeConstructor(ConstructorUtils.java:85)


at gobblin.util.reflection.GobblinConstructorUtils.invokeLongestConstructor(GobblinConstructorUtils.java:87)


at gobblin.data.management.dataset.DatasetUtils.instantiateDatasetFinder(DatasetUtils.java:81)


... 9 more


Caused by: java.lang.IllegalArgumentException: Missing required property gobblin.dataset.pattern


at com.google.common.base.Preconditions.checkArgument(Preconditions.java:93)


at gobblin.data.management.retention.profile.ConfigurableGlobDatasetFinder.<init>(ConfigurableGlobDatasetFinder.java:72)


at gobblin.data.management.retention.profile.ConfigurableGlobDatasetFinder.<init>(ConfigurableGlobDatasetFinder.java:100)


at gobblin.data.management.copy.CopyableGlobDatasetFinder.<init>(CopyableGlobDatasetFinder.java:31)


... 17 more


2016-05-30 10:46:45 UTC ERROR [JobScheduler-0] gobblin.runtime.AbstractJobLauncher 321 - Failed to launch and run job job_ftpalltest_1464605203511: gobblin.runtime.JobException: Failed to get work units for job job_ftpalltest_1464605203511


gobblin.runtime.JobException: Failed to get work units for job job_ftpalltest_1464605203511


at gobblin.runtime.AbstractJobLauncher.launchJob(AbstractJobLauncher.java:249)


at gobblin.scheduler.JobScheduler.runJob(JobScheduler.java:328)


at gobblin.scheduler.JobScheduler.runJob(JobScheduler.java:287)


at gobblin.scheduler.JobScheduler$NonScheduledJobRunner.run(JobScheduler.java:521)


at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)


at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)


at java.lang.Thread.run(Thread.java:745)


2016-05-30 10:46:45 UTC INFO [JobScheduler-0] gobblin.util.ExecutorsUtils 125 - Attempting to shutdown ExecutorService: java.util.concurrent.ThreadPoolExecutor@b364520[Shutting down, pool size = 2, active threads = 0, queued tasks = 0, completed tasks = 2]


2016-05-30 10:46:45 UTC INFO [JobScheduler-0] gobblin.util.ExecutorsUtils 144 - Successfully shutdown ExecutorService: java.util.concurrent.ThreadPoolExecutor@b364520[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 2]


2016-05-30 10:46:45 UTC INFO [JobScheduler-0] gobblin.util.ExecutorsUtils 125 - Attempting to shutdown ExecutorService: java.util.concurrent.ThreadPoolExecutor@7650cc27[Shutting down, pool size = 2, active threads = 0, queued tasks = 0, completed tasks = 2]


2016-05-30 10:46:45 UTC INFO [JobScheduler-0] gobblin.util.ExecutorsUtils 144 - Successfully shutdown ExecutorService: java.util.concurrent.ThreadPoolExecutor@7650cc27[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 2]


2016-05-30 10:46:45 UTC INFO [TaskExecutor STOPPING] gobblin.runtime.TaskExecutor 134 - Stopping the task executor


2016-05-30 10:46:45 UTC INFO [TaskExecutor STOPPING] gobblin.util.ExecutorsUtils 125 - Attempting to shutdown ExecutorService: java.util.concurrent.ThreadPoolExecutor@79bd7026[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]


2016-05-30 10:46:45 UTC INFO [LocalTaskStateTracker STOPPING] gobblin.runtime.AbstractTaskStateTracker 69 - Stopping the task state tracker


2016-05-30 10:46:45 UTC INFO [TaskExecutor STOPPING] gobblin.util.ExecutorsUtils 144 - Successfully shutdown ExecutorService: java.util.concurrent.ThreadPoolExecutor@79bd7026[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]


2016-05-30 10:46:45 UTC INFO [LocalTaskStateTracker STOPPING] gobblin.util.ExecutorsUtils 125 - Attempting to shutdown ExecutorService: java.util.concurrent.ScheduledThreadPoolExecutor@40e0d3b[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]


2016-05-30 10:46:45 UTC INFO [TaskExecutor STOPPING] gobblin.util.ExecutorsUtils 125 - Attempting to shutdown ExecutorService: java.util.concurrent.ScheduledThreadPoolExecutor@36869e91[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]


2016-05-30 10:46:45 UTC INFO [LocalTaskStateTracker STOPPING] gobblin.util.ExecutorsUtils 144 - Successfully shutdown ExecutorService: java.util.concurrent.ScheduledThreadPoolExecutor@40e0d3b[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]


2016-05-30 10:46:45 UTC INFO [TaskExecutor STOPPING] gobblin.util.ExecutorsUtils 144 - Successfully shutdown ExecutorService: java.util.concurrent.ScheduledThreadPoolExecutor@36869e91[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]


2016-05-30 10:46:45 UTC INFO [TaskExecutor STOPPING] gobblin.util.ExecutorsUtils 125 - Attempting to shutdown ExecutorService: java.util.concurrent.ThreadPoolExecutor@40145d8e[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]


2016-05-30 10:46:45 UTC INFO [TaskExecutor STOPPING] gobblin.util.ExecutorsUtils 144 - Successfully shutdown ExecutorService: java.util.concurrent.ThreadPoolExecutor@40145d8e[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]


2016-05-30 10:46:45 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger 333 - Disconnecting from gobblindemo-ssh.azurehdinsight.net port 22


2016-05-30 10:46:45 UTC ERROR [JobScheduler-0] gobblin.scheduler.JobScheduler$NonScheduledJobRunner 523 - Failed to run job ftpalltest


gobblin.runtime.JobException: Failed to run job ftpalltest


at gobblin.scheduler.JobScheduler.runJob(JobScheduler.java:289)


at gobblin.scheduler.JobScheduler$NonScheduledJobRunner.run(JobScheduler.java:521)


at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)


at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)


at java.lang.Thread.run(Thread.java:745)


Caused by: gobblin.runtime.JobException: Failed to launch and run job ftpalltest


at gobblin.scheduler.JobScheduler.runJob(JobScheduler.java:334)


at gobblin.scheduler.JobScheduler.runJob(JobScheduler.java:287)


... 4 more


Caused by: gobblin.runtime.JobException: Job job_ftpalltest_1464605203511 failed


at gobblin.runtime.AbstractJobLauncher.launchJob(AbstractJobLauncher.java:363)


at gobblin.scheduler.JobScheduler.runJob(JobScheduler.java:328)


... 5 more


2016-05-30 10:46:45 UTC INFO [Connect thread gobblindemo-ssh.azurehdinsight.net session] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger 333 - Caught an exception, leaving main loop due to Socket closed


Bala Kasaram

unread,
May 31, 2016, 8:30:23 PM5/31/16
to gobblin-users, kasar...@gmail.com
Any update how to fix this?


On Monday, May 30, 2016 at 4:22:16 PM UTC+5:30, Bala Kasaram wrote:
Hi Sahil,

I have done same and adjusted my pull file like below, but getting error while copying data from source:

job.name=ftpalltest

job.group=ftptohdfs

job.description=Job to copy data from sftp to hdfs

# Source properties

source.class=gobblin.data.management.copy.CloseableFsCopySource

source.conn.username=gobblin

source.conn.password=Password7897

source.conn.host=gobblindemo-ssh.azurehdinsight.net

source.conn.port=22

# The SftpSource class will look for data on the SFTP server under this directory

source.filebased.fs.uri=sftp://gobblindemo-ssh.azurehdinsight.net:22

source.filebased.data.directory=/home/gobblin

# Publisher properties - hdfs path

data.publisher.type=gobblin.data.management.copy.publisher.CopyDataPublisher

2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger 333 - kex: server: hmac-md5,hmac-sha1,umac-64@openssh.com,hmac-sha2-256,hmac-sha2-256-96,hmac-sha2-512,hmac-sha2-512-96,hmac-ripemd160,hmac-ripemd160@openssh.com,hmac-sha1-96,hmac-md5-96


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger 333 - kex: server: hmac-md5,hmac-sha1,umac-64@openssh.com,hmac-sha2-256,hmac-sha2-256-96,hmac-sha2-512,hmac-sha2-512-96,hmac-ripemd160,hmac-ripemd160@openssh.com,hmac-sha1-96,hmac-md5-96

Sahil Takiar

unread,
Jun 9, 2016, 4:31:08 PM6/9/16
to gobblin-users, kasar...@gmail.com
As the stack trace says, you are missing the property: gobblin.dataset.pattern - it should be set to a regular expression that identifies the location of the files that need to be copied from the source system.

2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger 333 - kex: server: hmac-md5,hmac-sha1,uma...@openssh.com,hmac-sha2-256,hmac-sha2-256-96,hmac-sha2-512,hmac-sha2-512-96,hmac-ripemd160,hmac-ri...@openssh.com,hmac-sha1-96,hmac-md5-96


2016-05-30 10:46:44 UTC INFO [JobScheduler-0] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger 333 - kex: server: hmac-md5,hmac-sha1,uma...@openssh.com,hmac-sha2-256,hmac-sha2-256-96,hmac-sha2-512,hmac-sha2-512-96,hmac-ripemd160,hmac-ri...@openssh.com,hmac-sha1-96,hmac-md5-96

Bala Kasaram

unread,
Jun 10, 2016, 1:54:33 AM6/10/16
to gobblin-users, kasar...@gmail.com
Thanks Sahil. let me give a try on this now.
Reply all
Reply to author
Forward
0 new messages