load XML Files from SFTP server into HDFS

88 views
Skip to first unread message

Sadok Ben Yahia

unread,
Feb 25, 2016, 6:31:30 AM2/25/16
to gobblin-users
Now gobblin work in map-reduce mode but unfortunately i could not get my xml files.
Here the content of gobblin-current.log :

2016-02-25 12:09:42 CET INFO  [main] org.apache.hadoop.conf.Configuration  996 - mapreduce.user.classpath.first is deprecated. Instead, use mapreduce.job.user.classpath.first
2016-02-25 12:09:42 CET WARN  [main] gobblin.runtime.JobContext  242 - Property task.data.root.dir is missing.
2016-02-25 12:09:42 CET INFO  [main] gobblin.util.ClustersNames  73 - no default cluster mapping found
2016-02-25 12:09:42 CET INFO  [main] org.apache.hadoop.conf.Configuration  996 - mapred.max.map.failures.percent is deprecated. Instead, use mapreduce.map.failures.maxpercent
2016-02-25 12:09:43 CET INFO  [main] gobblin.metrics.GobblinMetrics  481 - Not reporting metrics to JMX
2016-02-25 12:09:43 CET INFO  [main] gobblin.metrics.GobblinMetrics  430 - Not reporting metrics to log files
2016-02-25 12:09:43 CET INFO  [main] gobblin.metrics.GobblinMetrics  492 - Not reporting metrics to Kafka
2016-02-25 12:09:43 CET INFO  [main] gobblin.util.ExecutorsUtils  125 - Attempting to shutdown ExecutorService: java.util.concurrent.ThreadPoolExecutor@4ef27d66[Shutting down, pool size = 1, active threads = 0, queued tasks = 0, completed tasks = 1]
2016-02-25 12:09:43 CET INFO  [main] gobblin.util.ExecutorsUtils  144 - Successfully shutdown ExecutorService: java.util.concurrent.ThreadPoolExecutor@4ef27d66[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1]
2016-02-25 12:09:43 CET WARN  [main] gobblin.password.PasswordManager  189 - Property encrypt.key.loc not set. Cannot decrypt any encrypted password.
2016-02-25 12:09:43 CET WARN  [main] gobblin.password.PasswordManager  189 - Property encrypt.key.loc not set. Cannot decrypt any encrypted password.
2016-02-25 12:09:43 CET INFO  [main] gobblin.source.extractor.extract.sftp.SftpFsHelper  147 - Attempting to connect to source via SFTP with privateKey: ~/.ssh/id_rsa knownHosts: null userName: ftpuser hostName: 134.106.13.145 port: 21 proxyHost: null proxyPort: -1
2016-02-25 12:09:43 CET INFO  [main] gobblin.source.extractor.extract.sftp.SftpFsHelper$LocalFileIdentityStrategy  433 - Successfully set identity using local file ~/.ssh/id_rsa
2016-02-25 12:09:43 CET INFO  [main] gobblin.source.extractor.extract.sftp.SftpFsHelper  171 - Known hosts path is not set, StrictHostKeyChecking will be turned off
2016-02-25 12:09:43 CET INFO  [main] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger  335 - Connecting to 134.106.13.145 port 21
2016-02-25 12:09:43 CET INFO  [main] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger  335 - Connection established
2016-02-25 12:14:43 CET INFO  [main] gobblin.source.extractor.extract.sftp.SftpFsHelper$JSchLogger  335 - Disconnecting from 134.106.13.145 port 21
2016-02-25 12:14:43 CET ERROR [main] gobblin.source.extractor.extract.sftp.SftpFsHelper  195 - connection is closed by foreign host
com.jcraft.jsch.JSchException: connection is closed by foreign host
    at com.jcraft.jsch.Session.connect(Session.java:269)
    at com.jcraft.jsch.Session.connect(Session.java:183)
    at gobblin.source.extractor.extract.sftp.SftpFsHelper.connect(SftpFsHelper.java:188)
    at gobblin.source.extractor.extract.sftp.SftpLightWeightFileSystem.initialize(SftpLightWeightFileSystem.java:89)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2316)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:366)
    at gobblin.data.management.copy.CopySource.getSourceFileSystem(CopySource.java:245)
    at gobblin.data.management.copy.CloseableFsCopySource.getSourceFileSystem(CloseableFsCopySource.java:55)
    at gobblin.data.management.copy.CopySource.getWorkunits(CopySource.java:108)
    at gobblin.runtime.SourceDecorator.getWorkunits(SourceDecorator.java:52)
    at gobblin.runtime.AbstractJobLauncher.launchJob(AbstractJobLauncher.java:239)
    at gobblin.runtime.mapreduce.CliMRJobLauncher.run(CliMRJobLauncher.java:50)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at gobblin.runtime.mapreduce.CliMRJobLauncher.main(CliMRJobLauncher.java:77)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
2016-02-25 12:14:43 CET ERROR [main] gobblin.runtime.SourceDecorator  59 - Failed to get work units for job job_SftpDistcp_1456398582443
java.lang.RuntimeException: java.io.IOException: gobblin.source.extractor.filebased.FileBasedHelperException: Cannot connect to SFTP source
    at gobblin.data.management.copy.CopySource.getWorkunits(CopySource.java:148)
    at gobblin.runtime.SourceDecorator.getWorkunits(SourceDecorator.java:52)
    at gobblin.runtime.AbstractJobLauncher.launchJob(AbstractJobLauncher.java:239)
    at gobblin.runtime.mapreduce.CliMRJobLauncher.run(CliMRJobLauncher.java:50)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at gobblin.runtime.mapreduce.CliMRJobLauncher.main(CliMRJobLauncher.java:77)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: java.io.IOException: gobblin.source.extractor.filebased.FileBasedHelperException: Cannot connect to SFTP source
    at gobblin.source.extractor.extract.sftp.SftpLightWeightFileSystem.initialize(SftpLightWeightFileSystem.java:91)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2316)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:366)
    at gobblin.data.management.copy.CopySource.getSourceFileSystem(CopySource.java:245)
    at gobblin.data.management.copy.CloseableFsCopySource.getSourceFileSystem(CloseableFsCopySource.java:55)
    at gobblin.data.management.copy.CopySource.getWorkunits(CopySource.java:108)
    ... 10 more
Caused by: gobblin.source.extractor.filebased.FileBasedHelperException: Cannot connect to SFTP source
    at gobblin.source.extractor.extract.sftp.SftpFsHelper.connect(SftpFsHelper.java:196)
    at gobblin.source.extractor.extract.sftp.SftpLightWeightFileSystem.initialize(SftpLightWeightFileSystem.java:89)
    ... 15 more
Caused by: com.jcraft.jsch.JSchException: connection is closed by foreign host
    at com.jcraft.jsch.Session.connect(Session.java:269)
    at com.jcraft.jsch.Session.connect(Session.java:183)
    at gobblin.source.extractor.extract.sftp.SftpFsHelper.connect(SftpFsHelper.java:188)
    ... 16 more
2016-02-25 12:14:43 CET ERROR [main] gobblin.runtime.AbstractJobLauncher  307 - Failed to launch and run job job_SftpDistcp_1456398582443: gobblin.runtime.JobException: Failed to get work units for job job_SftpDistcp_1456398582443
gobblin.runtime.JobException: Failed to get work units for job job_SftpDistcp_1456398582443
    at gobblin.runtime.AbstractJobLauncher.launchJob(AbstractJobLauncher.java:246)
    at gobblin.runtime.mapreduce.CliMRJobLauncher.run(CliMRJobLauncher.java:50)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at gobblin.runtime.mapreduce.CliMRJobLauncher.main(CliMRJobLauncher.java:77)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
2016-02-25 12:14:43 CET INFO  [main] gobblin.util.ExecutorsUtils  125 - Attempting to shutdown ExecutorService: java.util.concurrent.ThreadPoolExecutor@24f360b2[Shutting down, pool size = 1, active threads = 0, queued tasks = 0, completed tasks = 1]
2016-02-25 12:14:43 CET INFO  [main] gobblin.util.ExecutorsUtils  144 - Successfully shutdown ExecutorService: java.util.concurrent.ThreadPoolExecutor@24f360b2[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1]
2016-02-25 12:14:43 CET INFO  [main] gobblin.util.ExecutorsUtils  125 - Attempting to shutdown ExecutorService: java.util.concurrent.ThreadPoolExecutor@4b21844c[Shutting down, pool size = 1, active threads = 0, queued tasks = 0, completed tasks = 1]
2016-02-25 12:14:43 CET INFO  [main] gobblin.util.ExecutorsUtils  144 - Successfully shutdown ExecutorService: java.util.concurrent.ThreadPoolExecutor@4b21844c[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1]
2016-02-25 12:14:43 CET INFO  [main] gobblin.runtime.mapreduce.MRJobLauncher  443 - Deleted working directory /usr/local/gobblin/SFTP_HDFS/work_dir/working/SftpDistcp

Please if someone could help

and here the content of my .pull file :

job.name=SftpDistcp
job.group=Distcp
job.description=Job to copy data from sftp to hdfs

# Source properties
source.filebased.fs.uri=sftp:///134.106.13.145:21
source.class=gobblin.data.management.copy.CloseableFsCopySource
source.conn.private.key=~/.ssh/id_rsa   
source.conn.username=ftpuser
source.conn.host=134.106.13.145
source.conn.port=21

# Dataset properties
gobblin.dataset.pattern=/tmp/Gobblin-Test

# Publisher properties
#data.publisher.type=gobblin.data.management.copy.publisher.CopyDataPublisher
data.publisher.final.dir=/usr/local/hadoop_store/hdfs/datanode

# Writer properties
writer.builder.class=gobblin.data.management.copy.writer.FileAwareInputStreamDataWriterBuilder
-------
PS: to connect to the one need a password, that is why i have tried to add source.conn.password to my .pull file but nothing changed

Sahil Takiar

unread,
Feb 25, 2016, 1:42:09 PM2/25/16
to gobblin-users
Reply all
Reply to author
Forward
0 new messages