I am trying to pull a file from hdfs and put back to hdfs,
below is my job configuration:
job.group=demo
job.description=A Gobblin job for demo purpose
source.class=gobblin.example.hdfs.SimpleHdfsTextSource
#converter.classes=gobblin.example.simplejson.SimpleJsonConverter
extract.namespace=gobblin.example.simplejson
# source configuration properties
# comma-separated list of file URIs (supporting different schemes, e.g., file://, ftp://, sftp://, http://, etc)
# whether to use authentication or not (default is false)
source.conn.use.authentication=
# credential for authentication purpose (optional)
source.conn.domain=
source.conn.username=
source.conn.password=
# source data schema
source.schema={"namespace":"example.avro", "type":"record", "name":"User", "fields":[{"name":"name", "type":"string"}, {"name":"favorite_number", "type":"int"}, {"name":"favorite_color", "type":"
string"}]}
# quality checker configuration properties
#qualitychecker.task.policies=gobblin.policies.count.RowCountPolicy,gobblin.policies.schema.SchemaCompatibilityPolicy
#qualitychecker.task.policy.types=OPTIONAL,OPTIONAL
#qualitychecker.row.policies=gobblin.policies.schema.SchemaRowCheckPolicy
#qualitychecker.row.policy.types=OPTIONAL
#qualitychecker.row.err.file=test/jobOutput
data.publisher.type=gobblin.publisher.BaseDataPublisher
# Data publisher related configuration properties
# #data.publisher.type=gobblin.publisher.BaseDataPublisher
data.publisher.final.dir=/data/gobblin-yarn/job-output
data.publisher.replace.final.dir=false
#
# # Directory where job/task state files are stored
state.store.dir=/data/gobblin-yarn/state-store
# writer configuration properties
writer.destination.type=HDFS
writer.output.format=AVRO
writer.staging.dir=/data/gobblin-yarn/task-staging
writer.output.dir=/data/gobblin-yarn/task-output
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
but I get a problem , the stack trace as below:
on app master
2016-04-20 23:32:15 CST ERROR [JobScheduler-0] gobblin.runtime.AbstractJobLauncher - Failed to clean leftover staging data
java.lang.IllegalArgumentException: Wrong FS: hdfs://_append, expected: hdfs://
10.45.41.172:9000 at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:647)
at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:194)
at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:106)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1424)
at gobblin.util.JobLauncherUtils.cleanTaskStagingData(JobLauncherUtils.java:212)
at gobblin.runtime.AbstractJobLauncher.cleanLeftoverStagingData(AbstractJobLauncher.java:704)
at gobblin.runtime.AbstractJobLauncher.launchJob(AbstractJobLauncher.java:259)
at gobblin.scheduler.JobScheduler.runJob(JobScheduler.java:335)
at gobblin.yarn.GobblinHelixJobScheduler.runJob(GobblinHelixJobScheduler.java:102)
at gobblin.yarn.GobblinHelixJobScheduler$NonScheduledJobRunner.run(GobblinHelixJobScheduler.java:148)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
on work units:
2016-04-21 00:33:06 CST ERROR [JobScheduler-0] gobblin.yarn.GobblinHelixJobScheduler$NonScheduledJobRunner - Failed to run job GobblinDemo
gobblin.runtime.JobException: Failed to run job GobblinDemo
at gobblin.yarn.GobblinHelixJobScheduler.runJob(GobblinHelixJobScheduler.java:104)
at gobblin.yarn.GobblinHelixJobScheduler$NonScheduledJobRunner.run(GobblinHelixJobScheduler.java:148)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: gobblin.runtime.JobException: Failed to launch and run job GobblinDemo
at gobblin.scheduler.JobScheduler.runJob(JobScheduler.java:341)
at gobblin.yarn.GobblinHelixJobScheduler.runJob(GobblinHelixJobScheduler.java:102)
... 4 more
Caused by: java.lang.IllegalArgumentException: Wrong FS: hdfs://_append, expected: hdfs://
10.45.41.172:9000 at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:647)
at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:194)
at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:106)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1424)
at gobblin.util.JobLauncherUtils.cleanTaskStagingData(JobLauncherUtils.java:212)
at gobblin.runtime.AbstractJobLauncher.cleanupStagingDataPerTask(AbstractJobLauncher.java:766)
at gobblin.runtime.AbstractJobLauncher.cleanupStagingData(AbstractJobLauncher.java:743)
at gobblin.runtime.AbstractJobLauncher.launchJob(AbstractJobLauncher.java:318)
at gobblin.scheduler.JobScheduler.runJob(JobScheduler.java:335)
is there any one also get this problem, or some one can help.
I built gobblin on master branch.