Could you please share how you identified this? This shouldn't happen under normal operating conditions.
Just to clarify, move/replication may happen on other servers but for any given server, LOAD operations are always top priority.
The load queue being stuck could simply mean that your historical is busy downloading the segments.
Once you have metrics available, you can compare the values of `segment/loading/rateKbps` to check if a historical is indeed slower than the others.
You can try tweaking the following configs and see if it makes loading on that historical faster:
- Increase `druid.coordinator.loadqueuepeon.http.batchSize`
- Increase `druid.segmentCache.numLoadingThreads`
If the above doesn't work, try increasing `completionTimeout` or tweak the size / number of segments generated by the indexing tasks.
You can do this by tuning the `maxRowsPerSegment` and / or fixing the segment granularity in the streaming supervisor.
I wouldn't advise disabling smart segment loading, as it is almost never the reason for slow segment loading.
But if you want to explore that option, you may set the following parameters in your coordinator dynamic config:
smartSegmentLoading: false
maxSegmentsInNodeLoadingQueue: 0 (unlimited)
maxSegmentsToMove: 0 (disable balancing)
replicationThrottleLimit: 100 (reduce replication)
Let me know if any of these solutions work for you.