Hello,
In our batch ingestions we experience read timeouts communicating with S3. This is pretty much the only time we see ingestion tasks fail.
My understanding is that some timeouts are to be expected when communicating with S3. Maybe the peons could do a better job handling them, but we can tolerate some low rate of failure - we have a daemon that watches for failed tasks and restarts them.
However, we're trying to scale up our indexer capacity to deal with backfill situations, and observationally it appears that as we increase the number of peons running on a single VM, the rate of failures due to S3 read timeouts also increases. I can't say this for sure or give you hard numbers - at this point, it's just "anecdata". But for example, I've been experimenting with running 20 peons on a 40-CPU m4.10xlarge, and for 27 tasks I have 14 failures.
Is there anything that can be done to reduce these failures? If not, does anyone have a sense what is a "healthy" number of workers to run in parallel?
java.lang.IllegalStateException: java.net.SocketTimeoutException: Read timed out
at org.apache.commons.io.LineIterator.hasNext(LineIterator.java:106) ~[commons-io-2.4.jar:2.4]
at io.druid.data.input.impl.FileIteratingFirehose.hasMore(FileIteratingFirehose.java:52) ~[druid-api-0.9.1.1.jar:0.9.1.1]
at io.druid.indexing.common.task.IndexTask.generateSegment(IndexTask.java:389) ~[druid-indexing-service-0.9.1.1.jar:0.9.1.1]
at io.druid.indexing.common.task.IndexTask.run(IndexTask.java:221) ~[druid-indexing-service-0.9.1.1.jar:0.9.1.1]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:436) [druid-indexing-service-0.9.1.1.jar:0.9.1.1]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:408) [druid-indexing-service-0.9.1.1.jar:0.9.1.1]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_60]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_60]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_60]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_60]
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method) ~[?:1.8.0_60]
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) ~[?:1.8.0_60]
at java.net.SocketInputStream.read(SocketInputStream.java:170) ~[?:1.8.0_60]
at java.net.SocketInputStream.read(SocketInputStream.java:141) ~[?:1.8.0_60]
at sun.security.ssl.InputRecord.readFully(InputRecord.java:465) ~[?:1.8.0_60]
at sun.security.ssl.InputRecord.readV3Record(InputRecord.java:593) ~[?:1.8.0_60]
at sun.security.ssl.InputRecord.read(InputRecord.java:532) ~[?:1.8.0_60]
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973) ~[?:1.8.0_60]
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:930) ~[?:1.8.0_60]
at sun.security.ssl.AppInputStream.read(AppInputStream.java:105) ~[?:1.8.0_60]
at org.apache.http.impl.io.AbstractSessionInputBuffer.read(AbstractSessionInputBuffer.java:198) ~[httpcore-4.4.3.jar:4.4.3]
at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:178) ~[httpcore-4.4.3.jar:4.4.3]
at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:137) ~[httpclient-4.5.1.jar:4.5.1]
at org.jets3t.service.io.InterruptableInputStream.read(InterruptableInputStream.java:78) ~[jets3t-0.9.4.jar:0.9.4]
at org.jets3t.service.impl.rest.httpclient.HttpMethodReleaseInputStream.read(HttpMethodReleaseInputStream.java:146) ~[jets3t-0.9.4.jar:0.9.4]
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) ~[?:1.8.0_60]
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) ~[?:1.8.0_60]
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) ~[?:1.8.0_60]
at java.io.InputStreamReader.read(InputStreamReader.java:184) ~[?:1.8.0_60]
at java.io.BufferedReader.fill(BufferedReader.java:161) ~[?:1.8.0_60]
at java.io.BufferedReader.readLine(BufferedReader.java:324) ~[?:1.8.0_60]
at java.io.BufferedReader.readLine(BufferedReader.java:389) ~[?:1.8.0_60]
at org.apache.commons.io.LineIterator.hasNext(LineIterator.java:95) ~[commons-io-2.4.jar:2.4]
... 9 more