Hi Team,
I have recently upgraded to Druid version 25.0.0 and post that I am facing an issue with compaction jobs(both manual and auto-compaction) which were not starting up even though the Overlord was trying to assign it to a middlemanager. The task is never starting up, and after the configured timeout, its simply failing. Upon looking at Overlord logs I could see it reported that the task failed to start:
org.apache.druid.indexing.overlord.RemoteTaskRunner - Task assignment timed out on worker [worker-endpoint], never ran task [task-id]! Timeout: (300000 >= PT5M)!: {class=org.apache.druid.indexing.overlord.RemoteTaskRunner}
On further digging into the middlemanager logs I could see the below exception in parsing the compaction config:
2023-08-23T09:35:57,761 ERROR [TaskMonitorCache-0] org.apache.curator.framework.recipes.cache.PathChildrenCache -
com.fasterxml.jackson.databind.exc.InvalidTypeIdException: Could not resolve type id 'compaction' as a subtype of `org.apache.druid.segment.indexing.TuningConfig`: known type ids = [KafkaTuningConfig, index, index_parallel, kafka, realtime] (for POJO property 'tuningConfig')
at [Source: (byte[])"{"type":"compact","id":"myid","resource":{"availabilityGroup":"ag","requiredCapacity":1},"dataSource":"ds","ioConfig":{"type":"compact","inputSpec":{"type":"interval","interval":"2023-08-21T00:00:00.000Z/2023-08-22T00:00:00.000Z","sha256OfSortedSegmentIds":null},"dropExisting":false},"dimensionsSpec":null,"transform"[truncated 1658 bytes]; line: 1, column: 666] (through reference chain: org.apache.druid.indexing.common.task.CompactionTask["tuningConfig"])
at com.fasterxml.jackson.databind.exc.InvalidTypeIdException.from(InvalidTypeIdException.java:43) ~[jackson-databind-2.10.2.jar:2.10.2]
at com.fasterxml.jackson.databind.DeserializationContext.invalidTypeIdException(DeserializationContext.java:1758) ~[jackson-databind-2.10.2.jar:2.10.2]
at com.fasterxml.jackson.databind.DeserializationContext.handleUnknownTypeId(DeserializationContext.java:1265) ~[jackson-databind-2.10.2.jar:2.10.2]
at com.fasterxml.jackson.databind.jsontype.impl.TypeDeserializerBase._handleUnknownTypeId(TypeDeserializerBase.java:290) ~[jackson-databind-2.10.2.jar:2.10.2]
at com.fasterxml.jackson.databind.jsontype.impl.TypeDeserializerBase._findDeserializer(TypeDeserializerBase.java:162) ~[jackson-databind-2.10.2.jar:2.10.2]
at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer._deserializeTypedForId(AsPropertyTypeDeserializer.java:113) ~[jackson-databind-2.10.2.jar:2.10.2]
at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer.deserializeTypedFromObject(AsPropertyTypeDeserializer.java:97) ~[jackson-databind-2.10.2.jar:2.10.2]
at com.fasterxml.jackson.databind.deser.AbstractDeserializer.deserializeWithType(AbstractDeserializer.java:254) ~[jackson-databind-2.10.2.jar:2.10.2]
at com.fasterxml.jackson.databind.deser.SettableBeanProperty.deserialize(SettableBeanProperty.java:527) ~[jackson-databind-2.10.2.jar:2.10.2]
at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeWithErrorWrapping(BeanDeserializer.java:528) ~[jackson-databind-2.10.2.jar:2.10.2]
at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased(BeanDeserializer.java:417) ~[jackson-databind-2.10.2.jar:2.10.2]
at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1287) ~[jackson-databind-2.10.2.jar:2.10.2]
at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserializeFromObject(BeanDeserializer.java:326) ~[jackson-databind-2.10.2.jar:2.10.2]
at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeOther(BeanDeserializer.java:194) ~[jackson-databind-2.10.2.jar:2.10.2]
at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:161) ~[jackson-databind-2.10.2.jar:2.10.2]
at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer._deserializeTypedForId(AsPropertyTypeDeserializer.java:130) ~[jackson-databind-2.10.2.jar:2.10.2]
at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer.deserializeTypedFromObject(AsPropertyTypeDeserializer.java:97) ~[jackson-databind-2.10.2.jar:2.10.2]
at com.fasterxml.jackson.databind.deser.AbstractDeserializer.deserializeWithType(AbstractDeserializer.java:254) ~[jackson-databind-2.10.2.jar:2.10.2]
at com.fasterxml.jackson.databind.deser.impl.TypeWrappedDeserializer.deserialize(TypeWrappedDeserializer.java:68) ~[jackson-databind-2.10.2.jar:2.10.2]
at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4202) ~[jackson-databind-2.10.2.jar:2.10.2]
at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3266) ~[jackson-databind-2.10.2.jar:2.10.2]
at org.apache.druid.indexing.worker.WorkerTaskMonitor$1.childEvent(WorkerTaskMonitor.java:165) ~[druid-indexing-service-0.20.0.jar:0.20.0]
at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:538) [curator-recipes-4.3.0.jar:4.3.0]
at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:532) [curator-recipes-4.3.0.jar:4.3.0]
at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:100) [curator-framework-4.3.0.jar:4.3.0]
at org.apache.curator.shaded.com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) [curator-client-4.3.0.jar:?]
at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:92) [curator-framework-4.3.0.jar:4.3.0]
at org.apache.curator.framework.recipes.cache.PathChildrenCache.callListeners(PathChildrenCache.java:530) [curator-recipes-4.3.0.jar:4.3.0]
at org.apache.curator.framework.recipes.cache.EventOperation.invoke(EventOperation.java:35) [curator-recipes-4.3.0.jar:4.3.0]
at org.apache.curator.framework.recipes.cache.PathChildrenCache$9.run(PathChildrenCache.java:808) [curator-recipes-4.3.0.jar:4.3.0]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_262]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_262]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_262]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_262]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_262]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_262]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_262]
The compaction payload on the Druid UI console also matches the type which is reported in the exception:
"tuningConfig": {
"type": "compaction",
Even when I submit a manual compaction job with the below tuning config, somehow the actual submitted job on the console shows the type as compaction
"tuningConfig": {
"type": "index_parallel",
"maxRowsPerSegment": 5000000,
"maxRowsInMemory": 5000000,
"maxNumSegmentsToMerge": 100
}
Below is the my compaction spec:
{
"dataSource": "my-datasource",
"taskPriority": 25,
"inputSegmentSizeBytes": 100000000000000,
"maxRowsPerSegment": null,
"skipOffsetFromLatest": "PT31H",
"tuningConfig":
{
"maxRowsInMemory": null,
"appendableIndexSpec": null,
"maxBytesInMemory": null,
"maxTotalRows": null,
"splitHintSpec": null,
"partitionsSpec":
{
"type": "dynamic",
"maxRowsPerSegment": 5000000,
"maxTotalRows": null
},
"indexSpec": null,
"indexSpecForIntermediatePersists": null,
"maxPendingPersists": null,
"pushTimeout": null,
"segmentWriteOutMediumFactory": null,
"maxNumConcurrentSubTasks": 5,
"maxRetry": null,
"taskStatusCheckPeriodMs": null,
"chatHandlerTimeout": null,
"chatHandlerNumRetries": null,
"maxNumSegmentsToMerge": null,
"totalNumMergeTasks": null,
"maxColumnsToMerge": 10,
"forceGuaranteedRollup": false,
"type": "index_parallel"
},
"granularitySpec": null,
"dimensionsSpec": null,
"metricsSpec": null,
"transformSpec": null,
"ioConfig": null,
"taskContext": null
}
Any pointers on what could be happening here? Is this workaround for this or any config I need to set correctly?