Could not allocate segment for row with timestamp for streaming supervisor job

140 views
Skip to first unread message

Laxmikant Pandhare

unread,
Jul 2, 2025, 5:56:38 PMJul 2
to Druid User
Hi All, Below job is failing since 1st July 2025.

I have a streaming supervisor job with DAY segment granularity. But, this data gets old data as well like 3-4 months prior than today. So, tasks are failing with below error

[TaskStatus{id=index_kafka_xyz_6407adce8aa704b_knmhjjlh, status=FAILED, duration=28829, errorMsg=org.apache.druid.java.util.common.ISE: Could not allocate segment for row with timestamp[2025-06-07T...}].

I cannot give YEAR as segment granularity as it is little big data around 800GB per year.

I saw couple of old rows also got added like mentioned below
Created new segment [xyz_2025-04-09T00:00:00.000Z_2025-04-10T00:00:00.000Z_2025-05-19T13:44:23.613Z_1535]

I am not sure why it is failing for 7th of June data only. Can someone help here.

Thank you

John Kowtko

unread,
Jul 2, 2025, 6:14:16 PMJul 2
to Druid User
Some thoughts on what could be relevant --

Do you have any other segments in the system that overlap with that day?  E.g. HOUR or MONTH segments?   

Do you have timezone information (other than UTC) specified for the ingestion supervisor?

Do you have any batch jobs (compaction, reindex) that are inserting segments for that day?

Can you ingest data on June 6 or 8?

Thanks.   John

Laxmikant Pandhare

unread,
Jul 8, 2025, 1:17:43 PMJul 8
to Druid User
Hi John - thank you for your reply.

So, Segment for that specific time is already created un past. It creates hourly segments.
No, only UTC timezone across all data.
Yes, compaction job is running but it is not running for that specific time period.
Now, I started getting issue for other data as well.
 "errorMsg": "org.apache.druid.java.util.common.ISE: Failed to add a row with timestamp[2025-07-01T20:35:32.000Z]\n..

It is frequently happening now. Recently, I increased compaction priority as we got too many segments for single datasets almost 160k segments. 

Laxmikant Pandhare

unread,
Jul 8, 2025, 2:02:23 PMJul 8
to Druid User
Below is the segments it created. It created too many small segments

SELECT count(*)
FROM "sys"."segments"
where "datasource" = 'xyz' and num_rows <= 1000

Count is 155,282

SELECT count(*)
FROM "sys"."segments"
where "datasource" = 'xyz' and num_rows > 1000

Count is 3,467

This is the reason I increased Compaction priority and running it with 16 subtasks.

John Kowtko

unread,
Jul 8, 2025, 3:54:49 PMJul 8
to Druid User
If you have existing HOUR segments then this could be part of the problem.  In the version of the product I have worked with, the Supervisor will conform to existing segment granularity until it reaches an open untouched time interval, then it will start using the target granularity that you specified in the supervisor spec.  But it is possible that it is erroring out in this case instead of just conforming.

You cannot mix granularity with appended segments, only with "overwrite" segments.   

Can you check your ingestion task log to see what the time interval is that was announced?  Or did it not get to that stage yet?

Also, can you try suspending the streaming ingestion and then recompact the time interval in question to DAY granularity, and see if the problem clears up?

Thanks.  John

Laxmikant Pandhare

unread,
Jul 8, 2025, 4:15:08 PMJul 8
to Druid User
Sorry, segment granularity if DAY only 

I attached supervisor in the attachment. Please check below task logs below.

2025-07-08T19:55:00,716 ERROR [task-runner-0-priority-0] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Encountered exception in run() before persisting.

org.apache.druid.java.util.common.ISE: Could not allocate segment for row with timestamp[2025-06-07T07:12:56.000Z]

at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.runInternal(SeekableStreamIndexTaskRunner.java:680) ~[druid-indexing-service-27.0.0.jar:27.0.0]

at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.run(SeekableStreamIndexTaskRunner.java:266) ~[druid-indexing-service-27.0.0.jar:27.0.0]

at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTask.runTask(SeekableStreamIndexTask.java:151) ~[druid-indexing-service-27.0.0.jar:27.0.0]

at org.apache.druid.indexing.common.task.AbstractTask.run(AbstractTask.java:173) ~[druid-indexing-service-27.0.0.jar:27.0.0]

at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:477) ~[druid-indexing-service-27.0.0.jar:27.0.0]

at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:449) ~[druid-indexing-service-27.0.0.jar:27.0.0]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_372]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_372]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_372]

at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_372]

And few more logs that shows timestamp

{"ingestionStatsAndErrors":{"type":"ingestionStatsAndErrors","taskId":"index_kafka_xyz_bc4524ba6271ca1_jclhlbab","payload":{"ingestionState":"BUILD_SEGMENTS","unparseableEvents":{},"rowStats":{"buildSegments":{"processed":610,"processedBytes":1712634,"processedWithError":0,"thrownAway":0,"unparseable":0}},"errorMsg":"org.apache.druid.java.util.common.ISE: Could not allocate segment for row with timestamp[2025-06-07T07:12:56.000Z]\n\tat org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.runInternal(SeekableStreamIndexTaskRunner.java:680)\n\tat org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.run(SeekableStreamIndexTaskRunner.java:266)\n\tat org.apache.druid.indexing.seekablestream.SeekableStreamIndexTask.runTask(SeekableStreamIndexTask.java:151)\n\tat org.apache.druid.indexing.common.task.AbstractTask.run(AbstractTask.java:173)\n\tat org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:477)\n\tat org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:449)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:266)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat java.lang.Thread.run(Thread.java:750)\n","segmentAvailabilityConfirmed":false,"segmentAvailabilityWaitTimeMs":0}}}

Please let me know if I can check anything else here.
supervisor xyz.txt

John Kowtko

unread,
Jul 8, 2025, 6:13:20 PMJul 8
to Druid User
All of the similar issues that I could find point to conflicting segments, either in the druid_segments or druid_pendingsegments tables.    

Check in the Metadata DB for anything with start/end dates that span across that day.

Also please confirm if you can ingest into the day before or the day after.

Thanks.  John

Laxmikant Pandhare

unread,
Jul 8, 2025, 6:25:22 PMJul 8
to Druid User
Yes, I can see conflicting segment in druid_pendingsegments for above xyz dataset. I can see 670 rows in druid_pendingsegments. Added sample row for reference below.

| xyz_2025-03-30T00:00:00.000Z_2025-03-31T00:00:00.000Z_2025-05-20T14:12:30.034Z_658  | xyz       | 2025-07-08T05:31:06.107Z | 2025-03-30T00:00:00.000Z | 2025-03-31T00:00:00.000Z | index_kafka_xyz_45ac74460d6d6e7_0                      | xyz_2025-06-06T00:00:00.000Z_2025-06-07T00:00:00.000Z_2025-07-07T13:21:39.376Z_10   | 02004B08CDA09223496E70400342B2F82807015B |


I tried batch load for 2025-06-08 and able to load it to the datasource.

Can I delete above rows manually from MySQL druid_pendingsegments or it will create chaos in dataset?

John Kowtko

unread,
Jul 8, 2025, 6:41:06 PMJul 8
to Druid User
If the Supervisor is turned off and there are no ingestion tasks on that datasource, then there should be no druid_pendingsegments records.   So first turn everything off and then recheck.  

John Kowtko

unread,
Jul 8, 2025, 6:43:08 PMJul 8
to Druid User
if you want to be absolutely safe, you could always create a temp table and copy those records over to it first, before deleting them from the main table.

Laxmikant Pandhare

unread,
Jul 8, 2025, 6:47:09 PMJul 8
to Druid User
Yes, I suspended supervisor and there is no running task and stopped compaction as well so no task running for xyz datasource still there are segments 600+ in druid_pendingsegments.

Laxmikant Pandhare

unread,
Jul 8, 2025, 6:51:06 PMJul 8
to Druid User
is it okay to clean all segments from druid_pendingsegments?

Laxmikant Pandhare

unread,
Jul 9, 2025, 1:20:42 PMJul 9
to Druid User
Even after cleaning and Hard reset of supervisor. It is ailing with same issue and adding those segments into druid_pendingsegments
what can be the resolution for such error. Even I removed all segments with date 2025-06-07 but still error persist.

Laxmikant Pandhare

unread,
Jul 9, 2025, 2:30:13 PMJul 9
to Druid User
I updated supervisor like "lateMessageRejectionPeriod": "P30D". It is working now but i am unsure about the impact of this change in our supervisor overall on Druid Cluster.

John Kowtko

unread,
Jul 10, 2025, 6:06:25 PMJul 10
to Druid User
If that is the case, then it appears that you are opening up too many time intervals for ingestion.   Each time a new time is opened the ingestion task asks for a new pending segment record.  Afaik these requests all go through the overlord ... so if there are a number of tasks and each is ingesting data over many intervals, the number of pending segment requests could be large.

Is your Metadata DB sized well?  I.e. will the druid tables all fit within the MySQL buffer cache?

Laxmikant Pandhare

unread,
Jul 23, 2025, 3:23:51 PMJul 23
to Druid User
Yes, Metadata MySQL is sized well but again job started failing as new row came within 30 days time.

John Kowtko

unread,
Jul 24, 2025, 9:45:42 AMJul 24
to Druid User
Can you check the task/action/run/time Clarity metric to see what it's values are?  These should all be subsecond ...

Laxmikant Pandhare

unread,
Aug 4, 2025, 2:02:07 PMAug 4
to Druid User
I am unable to get task/action/run/time as there is no metrics we enabled in our Prod environment. 

John Kowtko

unread,
Aug 5, 2025, 12:50:11 PMAug 5
to Druid User
Can you monitor query performance from the MySQL side then?   Either via an admin UI, or doing "show processlist" from the MySQL command line?

Laxmikant Pandhare

unread,
Aug 5, 2025, 1:41:50 PMAug 5
to Druid User
Sure, I will try this. But, issue still persists. I gave segment granularity like below

        "segmentGranularity": {
          "type": "period",
          "period": "P7D",
          "timeZone": "Etc/GMT",
          "origin": "1970-01-01T00:00:00.000Z"
        }

But, this datasets have older than 7 days too and it is failing with same error for data prior to last day. How I can keep segments active for old data. I am unable to skip data in Druid Supervisors. Is there any way to skip data older than 7 days in Druid Supervisor.

Laxmikant Pandhare

unread,
Aug 6, 2025, 1:48:53 PMAug 6
to Druid User
Is there any suggestion to skip data older than 7 days in druid supervisor. There are only few old rows impacting the supervisor.
I tried to gave "lateMessageRejectionPeriod": "PT604800S" of 7 Days but it is not skipping any old data. Is there any way to skip old data in Druid or need to check on source side kafka or kinesis? 

Laxmikant Pandhare

unread,
Aug 11, 2025, 9:34:47 PMAug 11
to Druid User
There were lot of rows in druid_tasklocks and druid_pendingSegments for my dataset. After manual cleanup which is not the recommended way, I was able to start data ingestion.
Reply all
Reply to author
Forward
0 new messages