trouble writing to s3

97 views
Skip to first unread message

Chris Kellogg

unread,
Jun 26, 2017, 7:26:35 PM6/26/17
to gobblin-users
I am fairly new to gobblin and am experimenting with a new data source and extractor. I am able to run this locally and write files to my laptop but when I change the job to write to s3 I keep getting the following exception. I am running in standalone mode.

2017-06-26 16:07:42 PDT INFO  [Commit-thread-0] gobblin.publisher.BaseDataPublisher  384 - Moving /Users/chris/fs/pulsar/writer-output/job_PulsarSample_1498518458122/PULSAR/sample/standalone/ns1/my-topic/20170626230739_append to /pulsar-demo/PULSAR/sample/standalone/ns1/my-topic/20170626230739_append

2017-06-26 16:07:42 PDT INFO  [ParallelRunner] org.apache.hadoop.fs.s3a.S3AFileSystem  684 - Getting path status for /pulsar-demo/PULSAR/sample/standalone/ns1/my-topic/20170626230739_append (pulsar-demo/PULSAR/sample/standalone/ns1/my-topic/20170626230739_append)

2017-06-26 16:07:42 PDT INFO  [ParallelRunner] org.apache.hadoop.fs.s3a.S3AFileSystem  684 - Getting path status for /pulsar-demo/PULSAR/sample/standalone/ns1/my-topic/20170626230739_append (pulsar-demo/PULSAR/sample/standalone/ns1/my-topic/20170626230739_append)

2017-06-26 16:07:43 PDT INFO  [ParallelRunner] org.apache.hadoop.fs.s3a.S3AFileSystem  810 - Copying local file from /Users/chris/fs/pulsar/writer-output/job_PulsarSample_1498518458122/PULSAR/sample/standalone/ns1/my-topic/20170626230739_append to /pulsar-demo/PULSAR/sample/standalone/ns1/my-topic/20170626230739_append

2017-06-26 16:07:43 PDT INFO  [ParallelRunner] org.apache.hadoop.fs.s3a.S3AFileSystem  684 - Getting path status for /pulsar-demo/PULSAR/sample/standalone/ns1/my-topic/20170626230739_append (pulsar-demo/PULSAR/sample/standalone/ns1/my-topic/20170626230739_append)

2017-06-26 16:07:43 PDT WARN  [Commit-thread-0] gobblin.util.ParallelRunner  364 - Task failed: Move /Users/chris/fs/pulsar/writer-output/job_PulsarSample_1498518458122/PULSAR/sample/standalone/ns1/my-topic/20170626230739_append to /pulsar-demo/PULSAR/sample/standalone/ns1/my-topic/20170626230739_append

com.amazonaws.AmazonClientException: Unable to calculate MD5 hash: /Users/chris/fs/pulsar/writer-output/job_PulsarSample_1498518458122/PULSAR/sample/standalone/ns1/my-topic/20170626230739_append (Is a directory)

at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1298)

at com.amazonaws.services.s3.transfer.internal.UploadCallable.uploadInOneChunk(UploadCallable.java:108)

at com.amazonaws.services.s3.transfer.internal.UploadCallable.call(UploadCallable.java:100)

at com.amazonaws.services.s3.transfer.internal.UploadMonitor.upload(UploadMonitor.java:192)

at com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:150)

at com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:50)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:748)

Caused by: java.io.FileNotFoundException: /Users/chris/fs/pulsar/writer-output/job_PulsarSample_1498518458122/PULSAR/sample/standalone/ns1/my-topic/20170626230739_append (Is a directory)

at java.io.FileInputStream.open0(Native Method)

at java.io.FileInputStream.open(FileInputStream.java:195)

at java.io.FileInputStream.<init>(FileInputStream.java:138)

at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1294)

... 9 more




Here is my job file
=========================================
#job.name=PulsarDemo
#job.group=Pulsar
#job.description=A getting started example for Gobblin

# A sample pull file that reads from Kafka in a streaming manner
job.name=PulsarSample
job.group=PulsarRunForever-test
job.description=this is a job that runs forever and consumes from pulsar
job.lock.enabled=false

fs.uri=file:///
writer.fs.uri=file:///
state.store.fs.uri=file:///
data.publisher.fs.uri=s3a://pulsar-gobblin-demo
fs.s3a.access.key=S3_KEY
fs.s3a.secret.key=S3_ACCESS
fs.s3a.buffer.dir=/Users/chris/fs/tmp

data.publisher.final.dir=/pulsar-demo/

data.publisher.replace.final.dir=false

source.class=gobblin.source.extractor.extract.pulsar.PulsarStringStreamingSource


writer.destination.type=HDFS
writer.output.format=txt
writer.builder.class=gobblin.writer.SimpleDataWriterBuilder
simple.writer.delimiter="\n"
simple.writer.prepend.size=false


data.publisher.type=gobblin.publisher.BaseDataPublisher

# Work paths
state.store.enabled=false
writer.staging.dir=/Users/chris/fs/pulsar/writer-staging
writer.output.dir=/Users/chris/fs/pulsar/writer-output
========================================

Any help is appreciated.

Thanks.




Reply all
Reply to author
Forward
0 new messages