io.druid.java.util.common.ISE: Transaction failure publishing segments, aborting

1,060 views
Skip to first unread message

Arpan Khagram

unread,
Jul 14, 2017, 1:42:13 AM7/14/17
to Druid User
Hi Team,

We have got 2 FAILED tasks recently for KAFKA Indexing service with exception "io.druid.java.util.common.ISE: Transaction failure publishing segments, aborting" for both of them. We tried to check overlord logs but we did not get any exception there to explain this failure.

We have not mentioned any completion time (completionTimeout) or task duration (taskDuration) so by default the task duration is 60 min and completion timeout is 30 mins. As per this calculation - both of these tasks failed before 90 mins (60+30).

index_kafka_DSLAM_6572efbbc2747ad_hpjglfkm (duration of the task : 79 mins)
start time : 2017-07-14T01:21:50,698
end time :2017-07-14T02:40:38,157

2017-07-14T00:38:49,704 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running
 task[KafkaIndexTask{id=index_kafka_DSLAM_fe0ce3ca206fc86_gpmgblmn, type=index_kafka, dataSource=DSLAM}]
io.druid.java.util.common.ISE: Transaction failure publishing segments, aborting
        at io.druid.indexing.kafka.KafkaIndexTask.run(KafkaIndexTask.java:517) ~[?:?]
        at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:436) [druid
-indexing-service-0.10.0.jar:0.10.0]
        at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:408) [druid
-indexing-service-0.10.0.jar:0.10.0]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_91]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_91]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_91]
        at java.lang.Thread.run(Thread.java:745) [?:1.8.0_91]

Regards,
Arpan Khagram
+91 8308993200

Arpan Khagram

unread,
Jul 25, 2017, 5:32:17 AM7/25/17
to Druid User
Hi Druid Team - can you please let me know because this error is coming everyday for few of the tasks and there is no reason i can find anywhere (checked overlord logs, middle manager task logs ).

Also KAFKA Indexing tasks recover and start listening from earlier points but this is irritating as tasks are failing without any known reason.

Regards,
Arpan Khagram

pja...@yahoo-inc.com

unread,
Jul 27, 2017, 2:20:01 PM7/27/17
to Druid User
The information given here is limited to actually figure out what might be wrong. Do you see any WARN logs in overlord or task log. Do you see any info log at overlord saying "Not updating metadata, existing state is not the expected start state."

hellobab...@gmail.com

unread,
Jul 28, 2017, 8:49:10 AM7/28/17
to Druid User
the task payload partionoffset is no equals with the value in mysql db.  

在 2017年7月14日星期五 UTC+8下午1:42:13,Arpan Khagram写道:

pja...@yahoo-inc.com

unread,
Jul 28, 2017, 7:46:17 PM7/28/17
to Druid User

yes that is a most probable reason why this might happen, state at overlord gets out of sync with datasource metadata in db. Although this should only happen when either someone manually edits the payload in the metadata store or there is a bug. The way to resolve is to reset the supervisor.

Arpan Khagram

unread,
Jul 30, 2017, 2:58:57 PM7/30/17
to Druid User
Hi all, to me it looks to be related to https://github.com/druid-io/druid/issues/3600 and overlord logs suggests the same.

We already tried resetting supervisor but it did not help. Overload logs are not suggesting any issues-  its just logging that the task has FAILED.

Regards,
Arpan Khagram

Prithvi S

unread,
Aug 8, 2017, 5:39:33 AM8/8/17
to Druid User
I also see similar issue. Any idea how to resolve this?

2017-08-08T09:15:23,071 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[KafkaIndexTask{id=index_kafka_Cube-msg-ABC_7bcc8190970f320_empolehl, type=index_kafka, dataSource=TEST-msg-fei1}]
io.druid.java.util.common.ISE: Transaction failure publishing segments, aborting
at io.druid.indexing.kafka.KafkaIndexTask.run(KafkaIndexTask.java:517) ~[?:?]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:436) [druid-indexing-service-0.10.0.jar:0.10.0]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:408) [druid-indexing-service-0.10.0.jar:0.10.0]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_92]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_92]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_92]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_92]
2017-08-08T09:15:23,076 INFO [task-runner-0-priority-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_kafka_TEST-msg-ABC_7bcc8190970f320_empolehl] status changed to [FAILED].

Jan Kogut

unread,
Sep 10, 2017, 8:18:10 PM9/10/17
to Druid User
Hello,

reporting the same problem (Druid 0.10.1),

7 tasks failed with this ERROR within 15sec period:

2017-09-10T17:09:25,886 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[KafkaIndexTask{id=index_kafka_pro\
d_dataSource_1_events_f04b3a4a7aee511_lieonnmb
, type=index_kafka, dataSource=prod_dataSource_1_events}]

io
.druid.java.util.common.ISE: Transaction failure publishing segments, aborting

2017-09-10T17:09:27,302 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[KafkaIndexTask{id=index_kafka_pro\
d_dataSource_1_events_dc8185cd1637933_dadcakie
, type=index_kafka, dataSource=prod_dataSource_1_events}]

io
.druid.java.util.common.ISE: Transaction failure publishing segments, aborting

2017-09-10T17:09:31,947 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[KafkaIndexTask{id=index_kafka_pro\
d_dataSource_1_events_43db9a6973f3fbd_mhfhgfpm
, type=index_kafka, dataSource=prod_dataSource_1_events}]

io
.druid.java.util.common.ISE: Transaction failure publishing segments, aborting

2017-09-10T17:09:34,506 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[KafkaIndexTask{id=index_kafka_pro\
d_dataSource_1_events_1bc3508aa30c8a5_mbjollmg
, type=index_kafka, dataSource=prod_dataSource_1_events}]

io
.druid.java.util.common.ISE: Transaction failure publishing segments, aborting

2017-09-10T17:09:36,081 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[KafkaIndexTask{id=index_kafka_pro\
d_dataSource_1_events_35186bcfb37a63a_dhcemlbo
, type=index_kafka, dataSource=prod_dataSource_1_events}]

io
.druid.java.util.common.ISE: Transaction failure publishing segments, aborting

2017-09-10T17:09:40,234 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[KafkaIndexTask{id=index_kafka_pro\
d_dataSource_1_events_ab94c121d583257_ikjbibgf
, type=index_kafka, dataSource=prod_dataSource_1_events}]

io
.druid.java.util.common.ISE: Transaction failure publishing segments, aborting

2017-09-10T17:09:40,619 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[KafkaIndexTask{id=index_kafka_pro\
d_dataSource_1_events_6fdbd1b7dd0da35_nikgmkph
, type=index_kafka, dataSource=prod_dataSource_1_events}]

io
.druid.java.util.common.ISE: Transaction failure publishing segments, aborting



dataSource_1 task success ratio: 94.77 % with Overall Overlord task success ratio: 97.4 %



Regards,
Jan

pja...@oath.com

unread,
Sep 11, 2017, 4:42:09 PM9/11/17
to Druid User
Can you see any pattern - Is it happening after overlord process is restarted or middleManagers are restarted or anything else. Would it be possible for you or anyone else in this thread to share full task logs of the failed task (remember task log will contain datasource, dimensions, metrics information and runtime properties etc. that may include any passwords written in runtime properties file) and also the relevant overlord log around the time the task failed.
Reply all
Reply to author
Forward
0 new messages