No AbstractFileSystem for scheme: null

829 views
Skip to first unread message

Gabriele Tiberti

unread,
Sep 10, 2015, 10:33:10 AM9/10/15
to gobblin-users
Hello everybody!!
I'm quite new to Kafka - Gobblin environment. I'm a situation where I have millions of messages arriving at Kafka and I'd like to copy them on my hdfs in order to analyse them via a Spark batch job.
The messages in Kafka are json strings, and I want to copy them as avro files in the hadoop cluster.
I'm trying to set up the system using the default classes KafkaSimpleSource and AvroHdfsDataWriter, as my needs are really simple so I think I could avoid write classes for them. 
I set up the job and the properties file but I continuosly receive this error:

WARN [KafkaSource] Previous offset for partition impression_2015-09-08:0 does not exist. This partition will start from the earliest offset: 0
WARN [KafkaSource] Avg event size for partition impression_2015-09-08:0 not available, using default size 1024
WARN [UserGroupInformation] PriviledgedActionException as:gabriele (auth:SIMPLE) cause:org.apache.hadoop.fs.UnsupportedFileSystemException: No AbstractFileSystem for scheme: null
WARN [UserGroupInformation] PriviledgedActionException as:gabriele (auth:SIMPLE) cause:org.apache.hadoop.fs.UnsupportedFileSystemException: No AbstractFileSystem for scheme: null
ERROR [AbstractJobLauncher] Failed to launch and run job job_KafkaHdfsTest_1441895484485: org.apache.hadoop.fs.UnsupportedFileSystemException: No AbstractFileSystem for scheme: null
org.apache.hadoop.fs.UnsupportedFileSystemException: No AbstractFileSystem for scheme: null

and after some rows in the stack trace: 

Exception in thread "main" java.lang.IllegalArgumentException: Missing required property writer.staging.dir

Even though the writer.staging.dir is set up in the properties.

Anybody has a suggestion?

Ziyang Liu

unread,
Sep 10, 2015, 11:46:19 AM9/10/15
to gobblin-users
Hi Gabriele, what's the value of fs.uri in your job config?

-Ziyang

Seong Hwan Cho

unread,
Sep 10, 2015, 11:48:33 AM9/10/15
to gobblin-users
The exception looks very similar to the one I'm having right now. (Please refer to the very next post)
I assume that you are running in MR mode.
Is the property writer.staging.dir set in the job configuration file or in the gobblin-mapreduce.properties file? 

Gabriele Tiberti

unread,
Sep 10, 2015, 11:51:00 AM9/10/15
to gobblin-users
Hi Ziyang,
This is my job.pull file:

job.name=KafkaHdfsTest
job.group=Kafka
job.description=Kafka Extractor for Gobblin
job.lock.enabled=false

source.class=gobblin.source.extractor.extract.kafka.KafkaSimpleSource
converter.classes=gobblin.converter.IdentityConverter
extract.namespace=gobblin.extract.kafka

fs.uri=hdfs://xxx.xxx.xxx.com

writer.destination.type=HDFS
writer.output.format=AVRO
writer.fs.uri=hdfs://xxx.xxx.xxx.com
writer.staging.dir=/user/gabriele/gobblinStaging
writer.output.dir=/user/gabriele/gobblinTest

data.publisher.type=gobblin.publisher.BaseDataPublisher

topic.whitelist=impression_2015-09-08
topic.name=impression_2015-09-08
bootstrap.with.offset=earliest

kafka.brokers=xxx.xxx.xxx.com:9092

writer.builder.class=gobblin.writer.SimpleDataWriterBuilder

mr.job.max.mappers=20


and my properties for this job are:


# Thread pool settings for the task executor
taskexecutor.threadpool.size=2
taskretry.threadpool.coresize=1
taskretry.threadpool.maxsize=2

# File system URIs
fs.uri=hdfs://xxx.xxx.xxx.com
writer.fs.uri=${fs.uri}
state.store.fs.uri=${fs.uri}

# Writer related configuration properties
writer.destination.type=HDFS
writer.output.format=AVRO
writer.staging.dir=$GOBBLIN_WORK_DIR/task-staging
writer.output.dir=$GOBBLIN_WORK_DIR/task-output

# Data publisher related configuration properties
data.publisher.type=gobblin.publisher.BaseDataPublisher
data.publisher.final.dir=$GOBBLIN_WORK_DIR/job-output
data.publisher.replace.final.dir=false

# Directory where job/task state files are stored
state.store.dir=$GOBBLIN_WORK_DIR/state-store

# Directory where error files from the quality checkers are stored
qualitychecker.row.err.file=$GOBBLIN_WORK_DIR/err

# Directory where job locks are stored
job.lock.dir=$GOBBLIN_WORK_DIR/locks

# Directory where metrics log files are stored
metrics.log.dir=$GOBBLIN_WORK_DIR/metrics

# Interval of task state reporting in milliseconds
task.status.reportintervalinms=5000

# MapReduce properties
mr.job.root.dir=$GOBBLIN_WORK_DIR/working 

and I set GOBBLIN_WORK_DIR as the folder on the hdfs://xxx..etc
Message has been deleted

Vamsikrushna L

unread,
Dec 29, 2015, 1:25:20 AM12/29/15
to gobblin-users
Hello,

Even I am getting the same error.
Please let me know if you know te solution.

Thanks in advance!

Sahil Takiar

unread,
Jan 5, 2016, 11:58:48 PM1/5/16
to Vamsikrushna L, gobblin-users

--
You received this message because you are subscribed to the Google Groups "gobblin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gobblin-user...@googlegroups.com.
To post to this group, send email to gobbli...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gobblin-users/a9837255-13ec-469b-a905-a4549acb1785%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Vamsikrushna L

unread,
Jan 6, 2016, 1:22:26 AM1/6/16
to gobblin-users, vamsi....@gmail.com
Hi Sahil

Thanks a lot for your reply.
I could able to fix this problem, but I am getting below exception.
Please check this.

        at gobblin.runtime.AbstractJobLauncher.runWorkUnits(AbstractJobLauncher.java:579)
        at gobblin.runtime.mapreduce.MRJobLauncher$TaskRunner.run(MRJobLauncher.java:546)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

Error: java.io.IOException: Not all tasks running in container attempt_1451645045308_0046_m_000000_1 completed successfully
        at gobblin.runtime.AbstractJobLauncher.runWorkUnits(AbstractJobLauncher.java:579)
        at gobblin.runtime.mapreduce.MRJobLauncher$TaskRunner.run(MRJobLauncher.java:546)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

Error: java.io.IOException: Not all tasks running in container attempt_1451645045308_0046_m_000000_2 completed successfully
        at gobblin.runtime.AbstractJobLauncher.runWorkUnits(AbstractJobLauncher.java:579)
        at gobblin.runtime.mapreduce.MRJobLauncher$TaskRunner.run(MRJobLauncher.java:546)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

WARN [AbstractJobLauncher] Not committing dataset  of job job_GobblinKafkaQuickStartMR_1452060688329 with commit policy COMMIT_ON_FULL_SUCCESS and state FAILED
ERROR [AbstractJobLauncher] Failed to launch and run job job_GobblinKafkaQuickStartMR_1452060688329: java.io.IOException: Failed to commit dataset state for some dataset(s) of job job_GobblinKafkaQuickStartMR_1452060688329
java.io.IOException: Failed to commit dataset state for some dataset(s) of job job_GobblinKafkaQuickStartMR_1452060688329
        at gobblin.runtime.JobContext.commit(JobContext.java:346)
        at gobblin.runtime.AbstractJobLauncher.launchJob(AbstractJobLauncher.java:258)
        at gobblin.runtime.mapreduce.CliMRJobLauncher.run(CliMRJobLauncher.java:60)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at gobblin.runtime.mapreduce.CliMRJobLauncher.main(CliMRJobLauncher.java:133)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Failed to launch the job due to the following exception:
gobblin.runtime.JobException: Job job_GobblinKafkaQuickStartMR_1452060688329 failed

Prashant Bhardwaj

unread,
Jan 6, 2016, 4:50:35 AM1/6/16
to gobblin-users, vamsi....@gmail.com
This is not complete log. Please post yarn logs for better understanding. For accessing yarn logs use "yarn logs -applicationId <application ID>".
Message has been deleted

Vamsikrushna L

unread,
Jan 6, 2016, 6:29:43 AM1/6/16
to gobblin-users, vamsi....@gmail.com
I could able to fix the problem.
It is working fine.

On Wednesday, January 6, 2016 at 3:44:23 PM UTC+5:30, Vamsikrushna L wrote:
Hi Prashant,

Thanks a lot for your response.

PFA log.

Thanks and regards,
Vamsi.

Bala Kasaram

unread,
May 27, 2016, 3:19:57 AM5/27/16
to gobblin-users, vamsi....@gmail.com
What was the problem, I am facing same kind of issue. can you tell me how you fixed this?

Sahil Takiar

unread,
May 27, 2016, 12:29:52 PM5/27/16
to Bala Kasaram, gobblin-users, Vamsikrushna L
Reply all
Reply to author
Forward
0 new messages