Indexing task never finishes

626 views
Skip to first unread message

Carlos Nunez

unread,
Apr 17, 2016, 10:08:27 PM4/17/16
to Druid User
Hello, 

My indexing tasks never finish. The indexing task points to an S3 file, and the ingestion spec contains the following job properties to be able to access the file:

"jobProperties" : {
   
"fs.s3n.awsAccessKeyId" : "YOUR_ACCESS_KEY",
   
"fs.s3n.awsSecretAccessKey" : "YOUR_SECRET_KEY",
   
"fs.s3n.impl" : "org.apache.hadoop.fs.s3native.NativeS3FileSystem",
   
"io.compression.codecs" : "org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec"
}


1) On the overlord console, my task stays in the running section. 
2) When clicking on the "log (all)" or "log (last 8kb)" I get redirected to a blank page.
3) I ssh into the Middle Manager, and tail the log at 

var/druid/task/<task_id>/log

I see the following exceptions:

2016-04-18T01:08:14,092 INFO [main] io.druid.indexing.common.actions.RemoteTaskActionClient - Submitting action for task[index_hadoop_wikiticker_2016-04-18T01
:07:19.194Z] to overlord[http://localhost:8090/druid/indexer/v1/action]: LockTryAcquireAction{interval=2015-09-12T00:00:00.000Z/2015-09-13T00:00:00.000Z}
2016-04-18T01:08:14,092 INFO [main] com.metamx.http.client.pool.ChannelResourceFactory - Generating: http://localhost:8090
2016-04-18T01:08:14,094 WARN [HttpClient-Netty-Boss-0] org.jboss.netty.channel.SimpleChannelUpstreamHandler - EXCEPTION, please implement org.jboss.netty.hand
ler.codec.http.HttpContentDecompressor.exceptionCaught() for proper handling.
java.net.ConnectException: Connection refused: localhost/127.0.0.1:8090
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:1.8.0_72-internal]
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) ~[?:1.8.0_72-internal]
        at org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152) ~[netty-3.10.4.Final.jar:?]
        at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105) [netty-3.10.4.Final.jar:?]
        at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79) [netty-3.10.4.Final.jar:?]
        at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) [netty-3.10.4.Final.jar:?]
        at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42) [netty-3.10.4.Final.jar:?]
        at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) [netty-3.10.4.Final.jar:?]
        at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) [netty-3.10.4.Final.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_72-internal]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_72-internal]
        at java.lang.Thread.run(Thread.java:745) [?:1.8.0_72-internal]
2016-04-18T01:08:14,094 WARN [main] io.druid.indexing.common.actions.RemoteTaskActionClient - Exception submitting action for task[index_hadoop_wikiticker_201
6-04-18T01:07:19.194Z]
org.jboss.netty.channel.ChannelException: Faulty channel in resource pool
        at com.metamx.http.client.NettyHttpClient.go(NettyHttpClient.java:137) ~[http-client-1.0.4.jar:?]
        at com.metamx.http.client.AbstractHttpClient.go(AbstractHttpClient.java:14) ~[http-client-1.0.4.jar:?]
        at io.druid.indexing.common.actions.RemoteTaskActionClient.submit(RemoteTaskActionClient.java:101) [druid-indexing-service-0.9.0.jar:0.9.0]
        at io.druid.indexing.common.task.HadoopIndexTask.isReady(HadoopIndexTask.java:137) [druid-indexing-service-0.9.0.jar:0.9.0]
        at io.druid.indexing.worker.executor.ExecutorLifecycle.start(ExecutorLifecycle.java:168) [druid-indexing-service-0.9.0.jar:0.9.0]
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_72-internal]
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_72-internal]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_72-internal]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_72-internal]
        at com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler.start(Lifecycle.java:350) [java-util-0.27.7.jar:?]
        at com.metamx.common.lifecycle.Lifecycle.start(Lifecycle.java:259) [java-util-0.27.7.jar:?]
        at io.druid.guice.LifecycleModule$2.start(LifecycleModule.java:155) [druid-api-0.3.16.jar:0.9.0]
        at io.druid.cli.GuiceRunnable.initLifecycle(GuiceRunnable.java:91) [druid-services-0.9.0.jar:0.9.0]
        at io.druid.cli.CliPeon.run(CliPeon.java:237) [druid-services-0.9.0.jar:0.9.0]
        at io.druid.cli.Main.main(Main.java:105) [druid-services-0.9.0.jar:0.9.0]
Caused by: java.net.ConnectException: Connection refused: localhost/127.0.0.1:8090
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:1.8.0_72-internal]
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) ~[?:1.8.0_72-internal]
        at org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152) ~[netty-3.10.4.Final.jar:?]
        at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105) ~[netty-3.10.4.Final.jar:?]
        at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79) ~[netty-3.10.4.Final.jar:?]
        at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) ~[netty-3.10.4.Final.jar:?]
        at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42) ~[netty-3.10.4.Final.jar:?]
        at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) ~[netty-3.10.4.Final.jar:?]
        at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) ~[netty-3.10.4.Final.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_72-internal]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_72-internal]
        at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_72-internal]


I am running 9.0 stable.
I am using S3 for deep storage and indexing logs. 

My topology is the following:
2xHistorical m4.xlarge
1xCoordinator t2.large
1xOverlord m4.xlarge
1xMiddle Manager m4.xlarge
1xZookeeper t2.large
1xBroker m4.xlarge
Postgres for Metadata

The same indexing task pointed at my the cluster running inside my local machine worked. The only difference are JVM configs and the fact that now every node has it's own box. 

Thanks in advance for any help. 


ago...@redborder.com

unread,
Apr 18, 2016, 2:58:48 AM4/18/16
to Druid User
Are you overlord and middleManager running on the same machine??

Have your historicals free space?? What are your druid rules?

Regards,
Andrés

Carlos Nunez

unread,
Apr 18, 2016, 3:18:04 AM4/18/16
to Druid User
Hello Andres, 

The Middle Manager and Overlord are running in different boxes. 

The historical nodes should have enough space since I am using  S3.

Here is the configuration:

druid.service=druid/historical
druid.port=8083

# HTTP server threads
druid.server.http.numThreads=25

# Processing threads and buffers
druid.processing.buffer.sizeBytes=536870912
druid.processing.numThreads=7

# Segment storage
druid.segmentCache.locations=[{"path":"var/druid/segment-cache","maxSize"\:130000000000}]
druid.server.maxSize=130000000000


I haven't set any rules. I just booted up this cluster.

Nishant Bangarwa

unread,
Apr 18, 2016, 7:31:19 AM4/18/16
to Druid User
Hi Carlos, 
Looks like you might have set druid.host in your overlord runtime.properties to localhost which leads to the task not being able to talk to it. 
Can you try removing druid.host so that it can pick up the public ip or set it manually to the public ip of the machine on which overlord is running. ?  

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/abcc6bfc-8b53-4c9c-8961-da29fcc7ffc6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Carlos Nunez

unread,
Apr 18, 2016, 2:48:41 PM4/18/16
to Druid User
Thank you Nishant. 

You were right, I was missing the "druid.host" property from my broker and middle manager. Once I set those to the ip addresses of each hosting machine, my indexing task was successful. 

Reply all
Reply to author
Forward
0 new messages