Connection issues in indexing service?

397 views
Skip to first unread message

Amy Troschinetz

unread,
Oct 15, 2014, 10:40:56ā€ÆAM10/15/14
to druid-de...@googlegroups.com
Overnight I attempted to ingest a some data using the static S3 firehose. Most of the S3 files were ingested properly, however two of them failed, see below for stack trace.

From my cursory googling, it seems like both of these issues were likely network hiccups. I looked in the available documentation for something like a HTTP Retry setting but I didn't find one. Is there some way to enable retry logic for index tasks?

Failure 1:

2014-10-14 21:59:44,084 ERROR [task-runner-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[IndexTask{id=index_click_conversion_2014-10-14T21:55:19.854Z, type=index, dataSource=click_conversion}]
java.lang.IllegalStateException: org.apache.http.ConnectionClosedException: Premature end of Content-Length delimited message body (expected: 497929513; received: 458410378
	at org.apache.commons.io.LineIterator.hasNext(LineIterator.java:107)
	at io.druid.data.input.impl.FileIteratingFirehose.hasMore(FileIteratingFirehose.java:34)
	at io.druid.indexing.common.task.IndexTask.getDataIntervals(IndexTask.java:205)
	at io.druid.indexing.common.task.IndexTask.run(IndexTask.java:171)
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:219)
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:198)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.http.ConnectionClosedException: Premature end of Content-Length delimited message body (expected: 497929513; received: 458410378
	at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:184)
	at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:138)
	at org.jets3t.service.io.InterruptableInputStream.read(InterruptableInputStream.java:78)
	at org.jets3t.service.impl.rest.httpclient.HttpMethodReleaseInputStream.read(HttpMethodReleaseInputStream.java:146)
	at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:238)
	at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
	at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:116)
	at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
	at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
	at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
	at java.io.InputStreamReader.read(InputStreamReader.java:184)
	at java.io.BufferedReader.fill(BufferedReader.java:154)
	at java.io.BufferedReader.readLine(BufferedReader.java:317)
	at java.io.BufferedReader.readLine(BufferedReader.java:382)
	at org.apache.commons.io.LineIterator.hasNext(LineIterator.java:96)
	... 9 more
2014-10-14 21:59:44,095 INFO [task-runner-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
  "id" : "index_click_conversion_2014-10-14T21:55:19.854Z",
  "status" : "FAILED",
  "duration" : 254888
}

Failure 2:

2014-10-15 01:38:54,047 ERROR [task-runner-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[IndexTask{id=index_click_conversion_2014-10-14T23:59:37.671Z, type=index, dataSource=click_conversion}]
java.lang.IllegalStateException: javax.net.ssl.SSLException: SSL peer shut down incorrectly
	at org.apache.commons.io.LineIterator.hasNext(LineIterator.java:107)
	at io.druid.data.input.impl.FileIteratingFirehose.hasMore(FileIteratingFirehose.java:34)
	at io.druid.indexing.common.task.IndexTask.generateSegment(IndexTask.java:399)
	at io.druid.indexing.common.task.IndexTask.run(IndexTask.java:184)
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:219)
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:198)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
Caused by: javax.net.ssl.SSLException: SSL peer shut down incorrectly
	at sun.security.ssl.InputRecord.readV3Record(InputRecord.java:557)
	at sun.security.ssl.InputRecord.read(InputRecord.java:509)
	at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:927)
	at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:884)
	at sun.security.ssl.AppInputStream.read(AppInputStream.java:102)
	at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:166)
	at org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:90)
	at org.apache.http.impl.io.AbstractSessionInputBuffer.read(AbstractSessionInputBuffer.java:212)
	at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:182)
	at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:138)
	at org.jets3t.service.io.InterruptableInputStream.read(InterruptableInputStream.java:78)
	at org.jets3t.service.impl.rest.httpclient.HttpMethodReleaseInputStream.read(HttpMethodReleaseInputStream.java:146)
	at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:238)
	at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
	at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:116)
	at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
	at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
	at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
	at java.io.InputStreamReader.read(InputStreamReader.java:184)
	at java.io.BufferedReader.fill(BufferedReader.java:154)
	at java.io.BufferedReader.readLine(BufferedReader.java:317)
	at java.io.BufferedReader.readLine(BufferedReader.java:382)
	at org.apache.commons.io.LineIterator.hasNext(LineIterator.java:96)
	... 9 more
2014-10-15 01:38:54,052 INFO [task-runner-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Removing task directory: /tmp/persistent/task/index_click_conversion_2014-10-14T23:59:37.671Z/work
2014-10-15 01:38:54,280 INFO [task-runner-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
  "id" : "index_click_conversion_2014-10-14T23:59:37.671Z",
  "status" : "FAILED",
  "duration" : 4513488
}


Data Software Engineer




This e-mail, including attachments, contains confidential and/or proprietary information, and may be used only by the person or entity to which it is addressed. The reader is hereby notified that any dissemination, distribution or copying of this e-mail is prohibited. If you have received this e-mail in error, please notify the sender by replying to this message and delete this e-mail immediately.

Nishant Bangarwa

unread,
Oct 15, 2014, 11:20:19ā€ÆAM10/15/14
to druid-de...@googlegroups.com
Hi Amy,

I also looked around for the above exceptions and both of them seems to be pointing to network issues. Ā 
the retry logic present in the peons is for the communication between the peon and the overlord.Ā 

to support retry logic while reading from s3, you can write a decorating Iterator over the existing Ā LineIterator created in StaticS3FirehoseFactory around Line no. 130
that tries to open a reconnect and open a new inputStream and skips lines which have been already read.
Feel free to submit a PR for the changes or create a github issue. Ā 
Ā 

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/4E1AACF8-7540-40E5-8D86-32E811C4D2A4%40rmn.com.
For more options, visit https://groups.google.com/d/optout.



--

Nishant Bangarwa

unread,
Oct 15, 2014, 11:41:17ā€ÆAM10/15/14
to druid-de...@googlegroups.com
In theory the best way to handle this would be to have some way to fix this via some setting inĀ jets3t library for retry logic.
found that jets3t also support retry configs as well -Ā https://jets3t.s3.amazonaws.com/toolkit/configuration.html
I guess you can try setting httpclient.retry-max to around 20 and increase the httpclient.connection-timeout-ms in case the connection is closed due to timeout. Ā 

Fangjin Yang

unread,
Oct 15, 2014, 11:56:27ā€ÆAM10/15/14
to druid-de...@googlegroups.com
I think having retries in some of the s3 logic would be pretty neat and should be implemented. We have s3 retries for segment downloading interactions. The indexing service is a low level service and just fails tasks right now in the event of exceptions or network outages and doesn't manage its own retries.


On Wednesday, October 15, 2014 11:41:17 AM UTC-4, Nishant Bangarwa wrote:
In theory the best way to handle this would be to have some way to fix this via some setting inĀ jets3t library for retry logic.
found that jets3t also support retry configs as well -Ā https://jets3t.s3.amazonaws.com/toolkit/configuration.html
I guess you can try setting httpclient.retry-max to around 20 and increase the httpclient.connection-timeout-ms in case the connection is closed due to timeout. Ā 
To unsubscribe from this group and stop receiving emails from it, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.
Nishant
Software Engineer|METAMARKETS
mĀ +91-9729200044
Nishant
Software Engineer|METAMARKETS
mĀ +91-9729200044
Reply all
Reply to author
Forward
0 new messages