Trouble loading batch data

Subbu Subramaniam

unread,

Jul 10, 2015, 1:42:11 PM7/10/15

to druid...@googlegroups.com

Hi,

I tried to setup a local druid cluster and load batch data into historical nodes

I followed the instructions as in http://druid.io/docs/latest/tutorials/tutorial-loading-batch-data.html.

According to http://druid.io/docs/latest/misc/tasks.html the index task is a variation that does not require external hadoop set up.

When I post the task for indexing, it fails with the following error in the logs.

I tried 4 different ways of staring the indexer [the instructions mention to include the "<hadoop_config_path>" at the end of the classpath]

Omitting hadoop-config path altogether when starting the indexer
Point to a local hadoop install (not running) top-level directory
Point to the local install's config directory (with nothing running)
Specfically add hadoop-common-2.3.0.jar to the class path

All of these result in the same error. What am I doing wrong?

Thanks

-Subbu

java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
	at io.druid.indexing.common.task.IndexTask.determinePartitions(IndexTask.java:268) ~[druid-indexing-service-0.8.0-rc2-SNAPSHOT.jar:0.8.0-rc2-SNAPSHOT]
	at io.druid.indexing.common.task.IndexTask.run(IndexTask.java:192) ~[druid-indexing-service-0.8.0-rc2-SNAPSHOT.jar:0.8.0-rc2-SNAPSHOT]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:235) [druid-indexing-service-0.8.0-rc2-SNAPSHOT.jar:0.8.0-rc2-SNAPSHOT]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:214) [druid-indexing-service-0.8.0-rc2-SNAPSHOT.jar:0.8.0-rc2-SNAPSHOT]
	at java.util.concurrent.FutureTask.run(FutureTask.java:262) [?:1.7.0_51]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_51]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_51]
	at java.lang.Thread.run(Thread.java:744) [?:1.7.0_51]
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream
	at java.net.URLClassLoader$1.run(URLClassLoader.java:366) ~[?:1.7.0_51]
	at java.net.URLClassLoader$1.run(URLClassLoader.java:355) ~[?:1.7.0_51]
	at java.security.AccessController.doPrivileged(Native Method) ~[?:1.7.0_51]
	at java.net.URLClassLoader.findClass(URLClassLoader.java:354) ~[?:1.7.0_51]
	at java.lang.ClassLoader.loadClass(ClassLoader.java:425) ~[?:1.7.0_51]
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) ~[?:1.7.0_51]
	at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ~[?:1.7.0_51]
	... 8 more
2015-07-10T17:19:50,352 ERROR [main] io.druid.cli.CliPeon - Error when starting up.  Failing.
java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
	at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]
	at io.druid.indexing.worker.executor.ExecutorLifecycle.join(ExecutorLifecycle.java:162) ~[druid-indexing-service-0.8.0-rc2-SNAPSHOT.jar:0.8.0-rc2-SNAPSHOT]
	at io.druid.cli.CliPeon.run(CliPeon.java:212) [druid-services-0.8.0-rc2-SNAPSHOT.jar:0.8.0-rc2-SNAPSHOT]
	at io.druid.cli.Main.main(Main.java:91) [druid-services-0.8.0-rc2-SNAPSHOT.jar:0.8.0-rc2-SNAPSHOT]
Caused by: java.util.concurrent.ExecutionException: java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
	at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299) ~[guava-16.0.1.jar:?]
	at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) ~[guava-16.0.1.jar:?]
	at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) ~[guava-16.0.1.jar:?]
	at io.druid.indexing.worker.executor.ExecutorLifecycle.join(ExecutorLifecycle.java:159) ~[druid-indexing-service-0.8.0-rc2-SNAPSHOT.jar:0.8.0-rc2-SNAPSHOT]
	... 2 more
Caused by: java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
	at io.druid.indexing.common.task.IndexTask.determinePartitions(IndexTask.java:268) ~[druid-indexing-service-0.8.0-rc2-SNAPSHOT.jar:0.8.0-rc2-SNAPSHOT]
	at io.druid.indexing.common.task.IndexTask.run(IndexTask.java:192) ~[druid-indexing-service-0.8.0-rc2-SNAPSHOT.jar:0.8.0-rc2-SNAPSHOT]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:235) ~[druid-indexing-service-0.8.0-rc2-SNAPSHOT.jar:0.8.0-rc2-SNAPSHOT]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:214) ~[druid-indexing-service-0.8.0-rc2-SNAPSHOT.jar:0.8.0-rc2-SNAPSHOT]
	at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[?:1.7.0_51]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[?:1.7.0_51]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ~[?:1.7.0_51]
	at java.lang.Thread.run(Thread.java:744) ~[?:1.7.0_51]
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream
	at java.net.URLClassLoader$1.run(URLClassLoader.java:366) ~[?:1.7.0_51]
	at java.net.URLClassLoader$1.run(URLClassLoader.java:355) ~[?:1.7.0_51]
	at java.security.AccessController.doPrivileged(Native Method) ~[?:1.7.0_51]
	at java.net.URLClassLoader.findClass(URLClassLoader.java:354) ~[?:1.7.0_51]
	at java.lang.ClassLoader.loadClass(ClassLoader.java:425) ~[?:1.7.0_51]
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) ~[?:1.7.0_51]
	at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ~[?:1.7.0_51]
	at io.druid.indexing.common.task.IndexTask.determinePartitions(IndexTask.java:268) ~[druid-indexing-service-0.8.0-rc2-SNAPSHOT.jar:0.8.0-rc2-SNAPSHOT]
	at io.druid.indexing.common.task.IndexTask.run(IndexTask.java:192) ~[druid-indexing-service-0.8.0-rc2-SNAPSHOT.jar:0.8.0-rc2-SNAPSHOT]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:235) ~[druid-indexing-service-0.8.0-rc2-SNAPSHOT.jar:0.8.0-rc2-SNAPSHOT]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:214) ~[druid-indexing-service-0.8.0-rc2-SNAPSHOT.jar:0.8.0-rc2-SNAPSHOT]
	at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[?:1.7.0_51]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[?:1.7.0_51]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ~[?:1.7.0_51]
	at java.lang.Thread.run(Thread.java:744) ~[?:1.7.0_51]

Subbu Subramaniam

unread,

Jul 10, 2015, 2:32:12 PM7/10/15

to druid...@googlegroups.com

OK, I got it to work now.

the hadoop common jar that I had had an issue.

I don't remember any of the tutorial pages mentioning to include hadoop jars in the classpath (and the page I mentioned asks to include hadoop-config in the classpath), so may be it is a good idea to clarify that?

thanks

-Subbu

Fangjin Yang

unread,

Jul 10, 2015, 5:23:06 PM7/10/15

to druid...@googlegroups.com, mcvs...@gmail.com

Hi Subbu,

It would be great if you could contribute your findings to our documentation. All druid docs are located in the github repo. This one is the one we have so far for working with other versions of hadoop:

https://github.com/druid-io/druid/blob/master/docs/content/operations/other-hadoop.md

In the future, we hope to rework the dependency system to make it easier to work with different versions of Hadoop. You may be interested in reading this proposal:

https://groups.google.com/forum/#!topic/druid-development/7KJCsQ9GvGo

Subbu Subramaniam

unread,

Jul 10, 2015, 6:29:53 PM7/10/15

to druid...@googlegroups.com

Fangjin,

I have another question regarding loading batch data.

The wiki says to expect a "Received SUCCESS status for task:" message on the indexer logs. I do not get one. Also, the indexer console shows the task as still running, but clicking on the logs show that the task is completed.

I do see the "Announcing segment" log in the historical node logs, and the timeBoundary query works as well.

Can you help me here?

thanks

Also, to note:

I started the co-ordinator, and updated the number of replicants for _default to 1, stopped it, and restarted it, and then did the batch operation.

Clicking on 'update' in the co-ordinator console did not update the mysql row. Instead, it created a new rule with a different version, and the co-ordinator continued to pick up the first version it found, because that shows up first in mysql query. So, I had to delete the older mysql rule row.

-Subbu

--
You received this message because you are subscribed to a topic in the Google Groups "Druid User" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-user/Yt5yF9EReUY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/e0c8629e-5c7a-45a1-b1ac-9a61ae88017a%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Fangjin

unread,

Jul 10, 2015, 7:53:24 PM7/10/15

to druid...@googlegroups.com

Hi Subbu, inline.

On Fri, Jul 10, 2015 at 3:29 PM, Subbu Subramaniam <mcvs...@gmail.com> wrote:

Fangjin,

I have another question regarding loading batch data.

The wiki says to expect a "Received SUCCESS status for task:" message on the indexer logs. I do not get one. Also, the indexer console shows the task as still running, but clicking on the logs show that the task is completed.
I do see the "Announcing segment" log in the historical node logs, and the timeBoundary query works as well.

Can you post the task logs?

Can you help me here?

thanks

Also, to note:

I started the co-ordinator, and updated the number of replicants for _default to 1, stopped it, and restarted it, and then did the batch operation.

Clicking on 'update' in the co-ordinator console did not update the mysql row. Instead, it created a new rule with a different version, and the co-ordinator continued to pick up the first version it found, because that shows up first in mysql query. So, I had to delete the older mysql rule row.

The rule updates do not happen instantly, but the console will update the next time the coordinator runs. We should update the UI to address this in the near future.

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.

To post to this group, send email to druid...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/CAMLk_USKc1Dbwk27tiFEicfv26BWNtrf%2BqFcG8SqVhD35EQWng%40mail.gmail.com.

Subbu Subramaniam

unread,

Jul 13, 2015, 11:33:18 AM7/13/15

to druid...@googlegroups.com

Fangjin,

I understand that the rule update in co-ordinator does not happen immediately. What I wanted to point out was that it seems that the update never happens.

When I click on 'update' in the UI, the old row in mysql is not removed. Instead, a new row is added with the new rule that allows one replicant for the segment.

The co-ordinator always matches the first rule that it comes across when reading mysql database, so the new rule is never applied , even if the co-ordinator is restarted.

I had to manually remove the old row in mysql using the mysql 'delete' command.

Here is the mysql result when I clicked on 'update' in the UI

mysql> select * from druid_rules;

+-----------------------------------+------------+--------------------------+-----------------------------------------------------------------+

+-----------------------------------+------------+--------------------------+-----------------------------------------------------------------+

| _default_2015-07-10T22:05:44.110Z | _default | 2015-07-10T22:05:44.110Z | [{"tieredReplicants":{"_default_tier":2},"type":"loadForever"}] |

| _default_2015-07-10T22:06:58.879Z | _default | 2015-07-10T22:06:58.879Z | [{"tieredReplicants":{"_default_tier":1},"type":"loadForever"}] |

+-----------------------------------+------------+--------------------------+-----------------------------------------------------------------+

2 rows in set (0.00 sec)

And then I had to do this in order for the new rule to take effect

mysql> delete from druid_rules where version = '2015-07-10T22:05:44.110Z';

Query OK, 1 row affected (0.00 sec)

I will post the logs for the indexer and co-ordinator in another email.

thanks,

-Subbu

To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/CAGMm%3DjQL7ON7umh88wvnFxWYR8TWRce-uhxo6zedLCerym%2BxXQ%40mail.gmail.com.

Subbu Subramaniam

unread,

Jul 13, 2015, 12:03:06 PM7/13/15

to druid...@googlegroups.com

Hi Fangjin,

Attached are the task logs and the logs from the servers (after I issued the curl command). Looks like it is not able to execute the "stop" command? Let me know if you would like more logs.

-Subbu

taskLogs.txt

historicalLogs.txt

overlord.txt

coordinator.txt

Fangjin Yang

unread,

Jul 13, 2015, 11:45:42 PM7/13/15

to druid...@googlegroups.com, mcvs...@gmail.com

Hi Subbu, please see inline.

On Monday, July 13, 2015 at 8:33:18 AM UTC-7, Subbu Subramaniam wrote:

Fangjin,

I understand that the rule update in co-ordinator does not happen immediately. What I wanted to point out was that it seems that the update never happens.

When I click on 'update' in the UI, the old row in mysql is not removed. Instead, a new row is added with the new rule that allows one replicant for the segment.

The old row is never removed. We keep it around for audit information. Druid always uses the latest rule for a datasource it finds.

The co-ordinator always matches the first rule that it comes across when reading mysql database, so the new rule is never applied , even if the co-ordinator is restarted.

I had to manually remove the old row in mysql using the mysql 'delete' command.

Here is the mysql result when I clicked on 'update' in the UI

mysql> select * from druid_rules;
+-----------------------------------+------------+--------------------------+-----------------------------------------------------------------+
| id | dataSource | version | payload |
+-----------------------------------+------------+--------------------------+-----------------------------------------------------------------+
| _default_2015-07-10T22:05:44.110Z | _default | 2015-07-10T22:05:44.110Z | [{"tieredReplicants":{"_default_tier":2},"type":"loadForever"}] |
| _default_2015-07-10T22:06:58.879Z | _default | 2015-07-10T22:06:58.879Z | [{"tieredReplicants":{"_default_tier":1},"type":"loadForever"}] |
+-----------------------------------+------------+--------------------------+-----------------------------------------------------------------+
2 rows in set (0.00 sec)

And then I had to do this in order for the new rule to take effect

How did you verify the new rule was not taking effect? The reason I ask is because the rule logic has been around for some time and your use case is one we've verified quite a few times. There should also be several tests for this use case.

Fangjin Yang

unread,

Jul 13, 2015, 11:47:26 PM7/13/15

to druid...@googlegroups.com, mcvs...@gmail.com

Hi Subbu,

Looking at the task logs, I see:

2015-07-13T15:50:04,447 INFO [task-runner-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
  "id" : "index_wikipedia_2015-07-13T15:49:53.261Z",
  "status" : "SUCCESS",
  "duration" : 648
}

If you refresh the indexer console after the task completes, does the status update?

Subbu Subramaniam

unread,

Jul 14, 2015, 12:17:25 PM7/14/15

to druid...@googlegroups.com

Hi Fangjin,

I verified by stopping the co-ordinator and restarting it. The old rule seemed to take effect, and it was complaining that it cannot find replicants. I forget the details, now that I am past the problem.

-Subbu

To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/c5d18323-b386-4d6a-8b23-b2e1c650862b%40googlegroups.com.

Subbu Subramaniam

unread,

Jul 14, 2015, 12:21:06 PM7/14/15

to druid...@googlegroups.com

Hi Fangjin,

No, refreshing the console does not update the status of the task either.

If I stop the indexer and restart it, then all tasks disappear and the console is empty.

thanks

-Subbu

To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/252465dd-2927-43fd-9dc3-f8ada2e6e195%40googlegroups.com.

Fangjin Yang

unread,

Jul 14, 2015, 1:01:16 PM7/14/15

to druid...@googlegroups.com, mcvs...@gmail.com

Hmmm, there are some ZK connection problems that occur at the end of the task, possibly causing the finished status to be missed. We will investigate and update this thread.

2015-07-13T15:50:52,697 WARN [main-SendThread(ssubrama-ld1.linkedin.biz:2181)] org.apache.zookeeper.ClientCnxn - Session 0x14e79f3d5e1000f for server ssubrama-ld1.linkedin.biz/127.0.0.1:2181, unexpected error, closing socket connection and attempting reconnect
java.io.IOException: Xid out of order. Got Xid 56 with err 0 expected Xid 55 for a packet with details: clientPath:null serverPath:null finished:false header:: 55,14  replyHeader:: 0,0,-4  request:: org.apache.zookeeper.MultiTransactionRecord@731bdbc5 response:: org.apache.zookeeper.MultiResponse@0
	at org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:798) ~[zookeeper-3.4.6.jar:3.4.6-1569965]
	at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:94) ~[zookeeper-3.4.6.jar:3.4.6-1569965]
	at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366) ~[zookeeper-3.4.6.jar:3.4.6-1569965]
	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) [zookeeper-3.4.6.jar:3.4.6-1569965]
2015-07-13T15:50:52,798 INFO [main-EventThread] org.apache.curator.framework.state.ConnectionStateManager - State change: SUSPENDED
2015-07-13T15:50:53,643 INFO [main-SendThread(ssubrama-ld1.linkedin.biz:2181)] org.apache.zookeeper.ClientCnxn - Opening socket connection to server ssubrama-ld1.linkedin.biz/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error)
2015-07-13T15:50:53,643 INFO [main-SendThread(ssubrama-ld1.linkedin.biz:2181)] org.apache.zookeeper.ClientCnxn - Socket connection established to ssubrama-ld1.linkedin.biz/0:0:0:0:0:0:0:1:2181, initiating session
2015-07-13T15:50:53,645 WARN [main-SendThread(ssubrama-ld1.linkedin.biz:2181)] org.apache.zookeeper.ClientCnxnSocket - Connected to an old server; r-o mode will be unavailable
2015-07-13T

Fangjin Yang

unread,

Jul 14, 2015, 1:02:11 PM7/14/15

to druid...@googlegroups.com, fangj...@gmail.com, mcvs...@gmail.com

Hi Subbu, do you by any chance have the coordinator logs after you changed the rules? I'd like to make sure there's nothing fishy going on there.

Gian Merlino

unread,

Jul 14, 2015, 1:11:51 PM7/14/15

to druid...@googlegroups.com

Hi Subbu, what version is your ZK server? Googling around a bit, it looks like "java.io.IOException: Xid out of order." can happen with some older ZK servers.

To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/CAMLk_USYKyMOdLEaJAKb8mkfCsmdLrtNkUO6-ta1VLONYo6PRw%40mail.gmail.com.

Paul Otto

unread,

Aug 17, 2015, 6:12:50 PM8/17/15

to Druid User

This just started happening to me, after things working during a couple of initial POC tests of a dataload. The cluster this is happening on is currently using zookeeper 3.3.4, and coordinator version 0.7.0.

- Paul

Gian Merlino

unread,

Aug 17, 2015, 6:48:41 PM8/17/15

to Druid User

ZooKeeper 3.3.4 is quite old and has a number of known bugs- do things work for you with a newer one, like 3.4.6?

To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/ffd25cb2-b8c6-4718-8ac3-8724a88b6570%40googlegroups.com.

Paul Otto

unread,

Aug 17, 2015, 7:47:34 PM8/17/15

to druid...@googlegroups.com

All versions of zk are pretty old... ;) In all seriousness though, I know there are a number of bug fixes in zk in the 3.4.x series, including previous issues with ephemeral nodes and other node issues. I've been hesitant to bump due to a nasty little problem with reverse DNS introduced in 3.4.x which impacts one of my clusters, and is unresolved even in 3.5-alpha (based on this problem consistently remaining).

I will bump zk tonight and verify that it resolved this issue for me. I do want you to be aware though, in case others have been holding back their Zookeeper version - right or wrong, the Kafka docs still recommend 3.3.4.

Thanks,
Paul

To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/CAPP9sfZUJgtB1PPi41eDh5zi%2B--%2BkRQVi%3DTq9h%3DZ6-YB7t01cA%40mail.gmail.com.

Paul Otto

unread,

Aug 18, 2015, 3:12:34 AM8/18/15

to Druid User

Hi Gian,

I can confirm that upgrading the Zookeeper ensemble to 3.4.6 on the cluster in question resolved this issue. Thanks for your help!

Cheers,

Paul

To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.

To post to this group, send email to druid...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/ffd25cb2-b8c6-4718-8ac3-8724a88b6570%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Druid User" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-user/Yt5yF9EReUY/unsubscribe.

To unsubscribe from this group and all its topics, send an email to druid-user+unsubscribe@googlegroups.com.

Reply all

Reply to author

Forward