i am run goblin in standalone mode.Getting : java.net.UnknownHostException error

96 views
Skip to first unread message

subashin...@gmail.com

unread,
Sep 15, 2016, 8:18:00 AM9/15/16
to gobblin-users
I am trying to run a simple example from the Getting Started Page of Gobblin http://gobblin.readthedocs.io/en/latest/Getting-Started/.However i get the following error. I have not updated the PATH variable to include Gobblin. So I ran directly from the bin folder  by typing the following command sh gobblin-standalone.sh start . However i am getting the following error when from the log folder in current log.  I have not set hadoop conf variable in my system only hadoop variable is set.


016-09-15 17:32:38 IST ERROR [TaskRetryExecutor-0] gobblin.runtime.Task  266 - Task task_PullFromWikipedia_1473939091517_1 failed
java.lang.RuntimeException: java.net.UnknownHostException: en.wikipedia.org
    at gobblin.runtime.TaskContext.getExtractor(TaskContext.java:121)
    at gobblin.runtime.Task.run(Task.java:127)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.UnknownHostException: en.wikipedia.org
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:178)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:579)
    at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:625)
    at sun.security.ssl.BaseSSLSocketImpl.connect(BaseSSLSocketImpl.java:160)
    at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
    at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
    at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
    at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:275)
    at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:371)
    at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:191)
    at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:976)
    at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:177)
    at sun.net.www.protocol.https.HttpsURLConnectionImpl.connect(HttpsURLConnectionImpl.java:153)
    at gobblin.example.wikipedia.WikipediaExtractor.performHttpQuery(WikipediaExtractor.java:271)
    at gobblin.example.wikipedia.WikipediaExtractor.retrievePageRevisions(WikipediaExtractor.java:306)
    at gobblin.example.wikipedia.WikipediaExtractor.<init>(WikipediaExtractor.java:185)
    at gobblin.example.wikipedia.WikipediaSource.getExtractor(WikipediaSource.java:103)
    at gobblin.runtime.TaskContext.getExtractor(TaskContext.java:119)
    ... 8 more
2016-09-15 17:32:38 IST INFO  [TaskRetryExecutor-0] gobblin.runtime.Task  289 - publish.data.at.job.level is true. Will publish data at the job level.
2016-09-15 17:32:38 IST INFO  [TaskRetryExecutor-0]

I am not able resolve this error. Kindly solve this.

subashin...@gmail.com

unread,
Sep 15, 2016, 8:20:43 AM9/15/16
to gobblin-users
This error repeats again and again  .I am running in local mode.
gobblin.runtime.local.LocalJobLauncher  125 - 2 out of 2 tasks of job job_PullFromWikipedia_1473939091517 are running
2016-09-15 17:23:32 IST INFO  [JobScheduler-0] gobblin.runtime.local.LocalJobLauncher  125 - 2 out of 2 tasks of job job_PullFromWikipedia_1473939091517 are running
2016-09-15 17:24:32 IST INFO  [JobScheduler-0] gobblin.runtime.local.LocalJobLauncher  125 - 2 out of 2 tasks of job job_PullFromWikipedia_1473939091517 are running
2016-09-15 17:25:32 IST INFO  [JobScheduler-0] gobblin.runtime.local.LocalJobLauncher  125 - 2 out of 2 tasks of job job_PullFromWikipedia_1473939091517 are running
2016-09-15 17:26:32 IST INFO  [JobScheduler-0] gobblin.runtime.local.LocalJobLauncher  125 - 2 out of 2 tasks of job job_PullFromWikipedia_1473939091517 are running
2016-09-15 17:27:32 IST INFO  [JobScheduler-0] gobblin.runtime.local.LocalJobLauncher  125 - 2 out of 2 tasks of job job_PullFromWikipedia_1473939091517 are running
2016-09-15 17:28:32 IST INFO  [JobScheduler-0] gobblin.runtime.local.LocalJobLauncher  125 - 2 out of 2 tasks of job job_PullFromWikipedia_1473939091517 are running
2016-09-15 17:29:32 IST INFO  [JobScheduler-0] gobblin.runtime.local.LocalJobLauncher  125 - 2 out of 2 tasks of job job_PullFromWikipedia_1473939091517 are running
2016-09-15 17:30:32 IST INFO  [JobScheduler-0] gobblin.runtime.local.LocalJobLauncher  125 - 2 out of 2 tasks of job job_PullFromWikipedia_1473939091517 are running
2016-09-15 17:31:32 IST INFO  [JobScheduler-0] gobblin.runtime.local.LocalJobLauncher  125 - 2 out of 2 tasks of job job_PullFromWikipedia_1473939091517 are running
2016-09-15 17:32:32 IST INFO  [JobScheduler-0] gobblin.runtime.local.LocalJobLauncher  125 - 2 out of 2 tasks of job job_PullFromWikipedia_1473939091517 are running
2016-09-15 17:32:38 IST ERROR [TaskRetryExecutor-0] gobblin.runtime.Task  266 - Task task_PullFromWikipedia_1473939091517_0 failed
2016-09-15 17:32:38 IST INFO  [TaskRetryExecutor-0] gobblin.runtime.TaskExecutor  205 - Scheduled retry of failed task task_PullFromWikipedia_1473939091517_0 to run in 1200 seconds
2016-09-15 17:32:38 IST ERROR [TaskRetryExecutor-0] gobblin.runtime.Task  266 - Task task_PullFromWikipedia_1473939091517_1 failed
2016-09-15 17:32:38 IST INFO  [TaskRetryExecutor-0] gobblin.runtime.TaskExecutor  205 - Scheduled retry of failed task task_PullFromWikipedia_1473939091517_1 to run in 1200 seconds
2016-09-15 17:33:32 IST INFO  [JobScheduler-0] gobblin.runtime.local.LocalJobLauncher  125 - 2 out of 2 tasks of job job_PullFromWikipedia_1473939091517 are running
2016-09-15 17:34:32 IST INFO  [JobScheduler-0] gobblin.runtime.local.LocalJobLauncher  125 - 2 out of 2 tasks of job job_PullFromWikipedia_1473939091517 are running
2016-09-15 17:35:32 IST INFO  [JobScheduler-0] gobblin.runtime.local.LocalJobLauncher  125 - 2 out of 2 tasks of job job_PullFromWikipedia_1473939091517 are running
2016-09-15 17:36:32 IST INFO  [JobScheduler-0] gobblin.runtime.local.LocalJobLauncher  125 - 2 out of 2 tasks of job job_PullFromWikipedia_1473939091517 are running
2016-09-15 17:37:32 IST INFO  [JobScheduler-0] gobblin.runtime.local.LocalJobLauncher  125 - 2 out of 2 tasks of job job_PullFromWikipedia_1473939091517 are running
2016-09-15 17:38:32 IST INFO  [JobScheduler-0] gobblin.runtime.local.LocalJobLauncher  125 - 2 out of 2 tasks of job job_PullFromWikipedia_1473939091517 are running
2016-09-15 17:39:32 IST INFO  [JobScheduler-0] gobblin.runtime.local.LocalJobLauncher  125 - 2 out of 2 tasks of job job_PullFromWikipedia_1473939091517 are running
2016-09-15 17:40:32 IST INFO  [JobScheduler-0] gobblin.runtime.local.LocalJobLauncher  125 - 2 out of 2 tasks of job job_PullFromWikipedia_1473939091517 are running
2016-09-15 17:41:32 IST INFO  [JobScheduler-0] gobblin.runtime.local.LocalJobLauncher  125 - 2 out of 2 tasks of job job_PullFromWikipedia_1473939091517 are running
2016-09-15 17:42:32 IST INFO  [JobScheduler-0] gobblin.runtime.local.LocalJobLauncher  125 - 2 out of 2 tasks of job job_PullFromWikipedia_1473939091517 are running
2016-09-15 17:43:32 IST INFO  [JobScheduler-0] gobblin.runtime.local.LocalJobLauncher  125 - 2 out of 2 tasks of job job_PullFromWikipedia_1473939091517 are running

Vicky Kak

unread,
Sep 16, 2016, 1:15:28 AM9/16/16
to gobblin-users
Looks to me as the network issue, can you clean the working-dir and restart the gobblin and see if it works.
I also got the network issue multiple time in manytimes.

Thanks,
Vicky


On Thursday, September 15, 2016 at 5:48:00 PM UTC+5:30, subashin...@gmail.com wrote:

Krithika Rajendran

unread,
Sep 16, 2016, 3:48:00 AM9/16/16
to gobblin-users
The same error persists even after doing the changes you mentioned

Vicky Kak

unread,
Sep 16, 2016, 4:05:21 AM9/16/16
to gobblin-users
Can you share the job configuration file?
Also make sure that you are able to connect to the en.wikipedia.org, can you ping en.wikipedia.org and see if that works on your machine? I know it is the basics but the UnknownHostException is prompting me to check these basic steps before we look into the code.



On Thursday, September 15, 2016 at 5:48:00 PM UTC+5:30, subashin...@gmail.com wrote:

Krithika Rajendran

unread,
Sep 20, 2016, 1:58:30 AM9/20/16
to gobblin-users


Yes i cannot ping en.wikpedia.org or even google.com. So you mean to say it is due to problem in the machine configuration? .
  I am sharing you the job configuration file  wikipedia.pull  below:

 job.name=PullFromWikipedia
job.group=Wikipedia
job.description=A getting started example for Gobblin

source.class=gobblin.example.wikipedia.WikipediaSource
source.page.titles=LinkedIn,Wikipedia:Sandbox
source.revisions.cnt=5

wikipedia.api.rooturl=https://en.wikipedia.org/w/api.php
wikipedia.avro.schema={"namespace": "example.wikipedia.avro","type": "record","name": "WikipediaArticle","fields": [{"name": "revid", "type": ["double", "null"]},{"name": "pageid", "type": ["double", "null"]},{"name": "title", "type": ["string", "null"]},{"name": "user", "type": ["string", "null"]},{"name": "anon", "type": ["string", "null"]},{"name": "userid",  "type": ["double", "null"]},{"name": "timestamp", "type": ["string", "null"]},{"name": "size",  "type": ["double", "null"]},{"name": "contentformat",  "type": ["string", "null"]},{"name": "contentmodel",  "type": ["string", "null"]},{"name": "content", "type": ["string", "null"]}]}
gobblin.wikipediaSource.maxRevisionsPerPage=10

converter.classes=gobblin.example.wikipedia.WikipediaConverter

extract.namespace=gobblin.example.wikipedia

writer.destination.type=HDFS
writer.output.format=AVRO
writer.partitioner.class=gobblin.example.wikipedia.WikipediaPartitioner

data.publisher.type=gobblin.publisher.BaseDataPublisher

Vicky Kak

unread,
Sep 20, 2016, 3:13:03 AM9/20/16
to gobblin-users

Yep, the job is failing in connecting to the URL.
ping should work
vickey@vickey:~$ ping en.wikpedia.org
PING en.wikpedia.org (91.198.174.192) 56(84) bytes of data.
64 bytes from en.wikpedia.org (91.198.174.192): icmp_seq=1 ttl=51 time=183 ms
64 bytes from en.wikpedia.org (91.198.174.192): icmp_seq=2 ttl=51 time=183 ms
64 bytes from en.wikpedia.org (91.198.174.192): icmp_seq=3 ttl=51 time=183 ms
Reply all
Reply to author
Forward
0 new messages