FTP to HDFS Help

301 views
Skip to first unread message

Bala Kasaram

unread,
May 18, 2016, 7:10:09 AM5/18/16
to gobblin-users
I am trying to run below command for

job.name=test
job.group=test
job.description=Job to copy data from sftp to hdfs

# Source properties
source.class=gobblin.data.management.copy.CloseableFsCopySource
source.conn.username=username
source.conn.password=password9
source.conn.host=localhost
source.conn.port=22

# The SftpSource class will look for data on the SFTP server under this directory
#source.filebased.data.directory=/tmp/demofiles

# Publisher properties - hdfs path
data.publisher.type=gobblin.data.management.copy.publisher.CopyDataPublisher
data.publisher.final.dir=hdfs://nn:port
#/hadoop/hdfs/namenode

# Writer properties
writer.builder.class=gobblin.data.management.copy.writer.FileAwareInputStreamDataWriterBuilder

# run local
launcher.type=LOCAL
job.lock.enabled=false

@hn0-bdssde:/hadoop/gobblin$ ./bin/gobblin-standalone.sh --workdir /hadoop/gobblin/p1/work --conf /hadoop/gobblin/p1 start

ls: cannot access /hadoop/gobblin/lib/*: No such file or directory

Starting Gobblin standalone daemon

Running command:

/usr/lib/jvm/java-7-openjdk-amd64/bin/java -Xmx2g -Xms1g -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution -XX:+UseCompressedOops -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/hadoop/gobblin/logs/ -Xloggc:/hadoop/gobblin/logs/gobblin-gc.log -Dgobblin.logs.dir=/hadoop/gobblin/logs -Dlog4j.configuration=file:///hadoop/gobblin/conf/log4j-standalone.xml -cp :/hadoop/gobblin/conf -Dorg.quartz.properties=/hadoop/gobblin/conf/quartz.properties  gobblin.scheduler.SchedulerDaemon /hadoop/gobblin/conf/gobblin-standalone.properties

@hn0-bdssde:/hadoop/gobblin$ nohup: appending output to `nohup.out'



Any help to load file from FTP or local server to HDFS ?? 

Sahil Takiar

unread,
May 18, 2016, 1:45:36 PM5/18/16
to Bala Kasaram, gobblin-users
The launch script requires access to a the Gobblin lib/ directory, the logs say the script cannot find that directory which is why it is not working. Have you tried running a basic Gobblin job using the "Getting Started Guide" - http://gobblin.readthedocs.io/en/latest/Getting-Started/

--
You received this message because you are subscribed to the Google Groups "gobblin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gobblin-user...@googlegroups.com.
To post to this group, send email to gobbli...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gobblin-users/a2b9162b-43c3-4b09-8dcb-b6260e559019%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Bala Kasaram

unread,
May 20, 2016, 2:56:19 AM5/20/16
to gobblin-users, kasar...@gmail.com
That is fine, I set global variables settings done.

I have doubt on source.conn.password=password9 ?

is this line correct? or whether (s)FTP only accepts private key and knows host files?

Sahil Takiar

unread,
May 20, 2016, 12:47:28 PM5/20/16
to Bala Kasaram, gobblin-users
Yes, that line is correct, "source.conn.password" should work. If it is not working as expected, can you send over a copy of the logs?

--Sahil

Bala Kasaram

unread,
May 23, 2016, 9:45:41 AM5/23/16
to gobblin-users, kasar...@gmail.com
Thanks. It worked and change csv file as avro format.
Reply all
Reply to author
Forward
0 new messages