Druid migrated Ec2 to EKS

54 views
Skip to first unread message

VP

unread,
Jan 28, 2025, 9:05:29 AMJan 28
to Druid User
Hi,

We recently migrated some dev servers from EC2 to EKS. Seeing an issue where Datasources says one segment to load but Services says empty/load drop queues. Historical and Coordinator logs show segment load exception but not clear as to the cause. No errors in Zookeeper or S3. Tried deleting the data source and re-ingesting. Any ideas?

VP.

org.apache.druid.segment.loading.SegmentLoadingException: Exception loading segment[***_2024-08-01T00:00:00.000Z_2024-08-02T00:00:00.000Z_2025-01-21T23:43:06.031Z]

at org.apache.druid.server.coordination.SegmentLoadDropHandler.loadSegment(SegmentLoadDropHandler.java:289) ~[druid-server-29.0.1.jar:29.0.1]

at org.apache.druid.server.coordination.SegmentLoadDropHandler.loadSegment(SegmentLoadDropHandler.java:266) ~[druid-server-29.0.1.jar:29.0.1]

at org.apache.druid.server.coordination.SegmentLoadDropHandler.addSegment(SegmentLoadDropHandler.java:343) ~[druid-server-29.0.1.jar:29.0.1]

at org.apache.druid.server.coordination.SegmentLoadDropHandler$1.lambda$addSegment$1(SegmentLoadDropHandler.java:572) ~[druid-server-29.0.1.jar:29.0.1]

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]

at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]

at java.lang.Thread.run(Thread.java:840) ~[?:?]

Caused by: org.apache.druid.segment.loading.SegmentLoadingException: Failed to load segment ***2024-08-01T00:00:00.000Z_2024-08-02T00:00:00.000Z_2025-01-21T23:43:06.031Z in all locations.

at org.apache.druid.segment.loading.SegmentLocalCacheManager.loadSegmentWithRetry(SegmentLocalCacheManager.java:278) ~[druid-server-29.0.1.jar:29.0.1]

at org.apache.druid.segment.loading.SegmentLocalCacheManager.getSegmentFiles(SegmentLocalCacheManager.java:228) ~[druid-server-29.0.1.jar:29.0.1]

at org.apache.druid.segment.loading.SegmentLocalCacheLoader.getSegment(SegmentLocalCacheLoader.java:56) ~[druid-server-29.0.1.jar:29.0.1]

at org.apache.druid.server.SegmentManager.getSegmentReference(SegmentManager.java:325) ~[druid-server-29.0.1.jar:29.0.1]

at org.apache.druid.server.SegmentManager.loadSegment(SegmentManager.java:268) ~[druid-server-29.0.1.jar:29.0.1]

at org.apache.druid.server.coordination.SegmentLoadDropHandler.loadSegment(SegmentLoadDropHandler.java:281) ~[druid-server-29.0.1.jar:29.0.1]

... 9 more


Also, another error lately

Unknown exception / org.apache.druid.java.util.common.IOE: Retries exhausted, couldn't fulfill request to [http://x.x.x.x:8081/druid/coordinator/v1/metadata/segments?includeOvershadowedStatus&includeRealtimeSegments]. / java.lang.RuntimeException
Message has been deleted

VP

unread,
Jan 30, 2025, 8:57:15 AMJan 30
to Druid User
Here is another message on the same server with a different data source -- 

2025-01-30T13:45:48,411 WARN [SimpleDataSegmentChangeHandler-0] org.apache.druid.server.coordination.BatchDataSegmentAnnouncer - No path to unannounce segment[xxxx_2025-01-15T00:00:00.000Z_2025-01-16T00:00:00.000Z_2025-01-28T18:28:25.347Z]

2025-01-30T13:45:48,411 INFO [SimpleDataSegmentChangeHandler-0] org.apache.druid.server.SegmentManager - Told to delete a queryable for a dataSource[xx] that doesn't exist.

2025-01-30T13:45:48,411 WARN [SimpleDataSegmentChangeHandler-0] org.apache.druid.segment.loading.SegmentLocalCacheManager - Asked to cleanup something[xxx-01-15T00:00:00.000Z_2025-01-16T00:00:00.000Z_2025-01-28T18:28:25.347Z] that didn't exist.  Skipping.

2025-01-30T13:45:48,411 WARN [SimpleDataSegmentChangeHandler-0] org.apache.druid.server.coordination.SegmentLoadDropHandler - Unable to delete segmentInfoCacheFile[var/druid/segment-cache/info_dir/xx-01-15T00:00:00.000Z_2025-01-16T00:00:00.000Z_2025-01-28T18:28:25.347Z]

2025-01-30T13:45:48,411 ERROR [SimpleDataSegmentChangeHandler-0] org.apache.druid.server.coordination.SegmentLoadDropHandler - Failed to load segment for dataSource: {exceptionType=org.apache.druid.segment.loading.SegmentLoadingException, exceptionMessage=Exception loading segment[xx-01-15T00:00:00.000Z_2025-01-16T00:00:00.000Z_2025-01-28T18:28:25.347Z], class=org.apache.druid.server.coordination.SegmentLoadDropHandler, segment=DataSegment{binaryVersion=9, id=xx-01-15T00:00:00.000Z_2025-01-16T00:00:00.000Z_2025-01-28T18:28:25.347Z, loadSpec={type=>s3_zip, bucket=>dev-12-us-druid-data, key=>druid/segments/xx/2025-01-15T00:00:00.000Z_2025-01-16T00:00:00.000Z/2025-01-28T18:28:25.347Z/0/index.zip, S3Schema=>s3a}, dimensions=[xxx], shardSpec=SingleDimensionShardSpec{dimension='xxx', start='null', end='null', partitionNum=0, numCorePartitions=1}, lastCompactionState=null, size=24104}}

org.apache.druid.segment.loading.SegmentLoadingException: Exception loading segment[xx-01-15T00:00:00.000Z_2025-01-16T00:00:00.000Z_2025-01-28T18:28:25.347Z]

at org.apache.druid.server.coordination.SegmentLoadDropHandler.loadSegment(SegmentLoadDropHandler.java:289) ~[druid-server-29.0.1.jar:29.0.1]

at org.apache.druid.server.coordination.SegmentLoadDropHandler.loadSegment(SegmentLoadDropHandler.java:266) ~[druid-server-29.0.1.jar:29.0.1]

at org.apache.druid.server.coordination.SegmentLoadDropHandler.addSegment(SegmentLoadDropHandler.java:343) ~[druid-server-29.0.1.jar:29.0.1]

at org.apache.druid.server.coordination.SegmentLoadDropHandler$1.lambda$addSegment$1(SegmentLoadDropHandler.java:572) ~[druid-server-29.0.1.jar:29.0.1]

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]

at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]

at java.lang.Thread.run(Thread.java:840) ~[?:?]

Caused by: org.apache.druid.segment.loading.SegmentLoadingException: Failed to load segment xx-01-15T00:00:00.000Z_2025-01-16T00:00:00.000Z_2025-01-28T18:28:25.347Z in all locations.

at org.apache.druid.segment.loading.SegmentLocalCacheManager.loadSegmentWithRetry(SegmentLocalCacheManager.java:278) ~[druid-server-29.0.1.jar:29.0.1]

at org.apache.druid.segment.loading.SegmentLocalCacheManager.getSegmentFiles(SegmentLocalCacheManager.java:228) ~[druid-server-29.0.1.jar:29.0.1]

at org.apache.druid.segment.loading.SegmentLocalCacheLoader.getSegment(SegmentLocalCacheLoader.java:56) ~[druid-server-29.0.1.jar:29.0.1]

at org.apache.druid.server.SegmentManager.getSegmentReference(SegmentManager.java:325) ~[druid-server-29.0.1.jar:29.0.1]

at org.apache.druid.server.SegmentManager.loadSegment(SegmentManager.java:268) ~[druid-server-29.0.1.jar:29.0.1]

at org.apache.druid.server.coordination.SegmentLoadDropHandler.loadSegment(SegmentLoadDropHandler.java:281) ~[druid-server-29.0.1.jar:29.0.1]

... 9 more


Message has been deleted

gi...@imply.io

unread,
Apr 1, 2025, 5:46:30 AMApr 1
to Druid User
I think this can happen if your druid.segmentCache.locations do not add up to at least your druid.server.maxSize.

Gian
Reply all
Reply to author
Forward
0 new messages