Ingestion task failing due to failure in sub task

13 views
Skip to first unread message

Tanay Maheshwari

unread,
Jul 28, 2025, 11:25:05 AMJul 28
to Druid User
Hi, 

My ingestion task (index_parallel) is failing with error - Failed in phase[segment generation]

On checking the logs of failed task I am getting this error - 
2025-07-28T15:15:20,090 ERROR [task-runner-0-priority-0] org.apache.druid.indexing.common.task.batch.parallel.SinglePhaseSubTask - Encountered exception in parallel sub task.
java.lang.RuntimeException: file:/tmp/task/slot1/single_phase_sub_task_demand_view_dlbapgom_2025-07-28T15:15:14.425Z/work/indexing-tmp/druid-input-entity2154864322419852959.tmp is not a Parquet file. Expected magic number at tail, but found [32, 125, 10, 125]
at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:556) ~[?:?]
at org.apache.parquet.hadoop.ParquetFileReader.<init>(ParquetFileReader.java:790) ~[?:?]
at org.apache.parquet.hadoop.ParquetFileReader.open(ParquetFileReader.java:657) ~[?:?]
at org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:162) ~[?:?]
at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135) ~[?:?]
at org.apache.druid.data.input.parquet.ParquetReader$1.hasNext(ParquetReader.java:113) ~[?:?]
at org.apache.druid.java.util.common.parsers.CloseableIteratorWithMetadata$1.hasNext(CloseableIteratorWithMetadata.java:71) ~[druid-processing-32.0.0.jar:32.0.0]
at org.apache.druid.data.input.IntermediateRowParsingReader$1.hasNext(IntermediateRowParsingReader.java:66) ~[druid-processing-32.0.0.jar:32.0.0]
at org.apache.druid.java.util.common.parsers.CloseableIterator$1.hasNext(CloseableIterator.java:42) ~[druid-processing-32.0.0.jar:32.0.0]
at org.apache.druid.java.util.common.parsers.CloseableIterator$2.findNextIteratorIfNecessary(CloseableIterator.java:83) ~[druid-processing-32.0.0.jar:32.0.0]
at org.apache.druid.java.util.common.parsers.CloseableIterator$2.hasNext(CloseableIterator.java:93) ~[druid-processing-32.0.0.jar:32.0.0]
at org.apache.druid.java.util.common.parsers.CloseableIterator$1.hasNext(CloseableIterator.java:42) ~[druid-processing-32.0.0.jar:32.0.0]
at org.apache.druid.indexing.common.task.FilteringCloseableInputRowIterator.hasNext(FilteringCloseableInputRowIterator.java:66) ~[druid-indexing-service-32.0.0.jar:32.0.0]
at org.apache.druid.indexing.common.task.batch.parallel.SinglePhaseSubTask.generateAndPushSegments(SinglePhaseSubTask.java:427) ~[druid-indexing-service-32.0.0.jar:32.0.0]
at org.apache.druid.indexing.common.task.batch.parallel.SinglePhaseSubTask.runTask(SinglePhaseSubTask.java:267) [druid-indexing-service-32.0.0.jar:32.0.0]
at org.apache.druid.indexing.common.task.AbstractTask.run(AbstractTask.java:179) [druid-indexing-service-32.0.0.jar:32.0.0]
at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:478) [druid-indexing-service-32.0.0.jar:32.0.0]
at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:450) [druid-indexing-service-32.0.0.jar:32.0.0]
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131) [guava-32.0.1-jre.jar:?]
at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:75) [guava-32.0.1-jre.jar:?]
at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82) [guava-32.0.1-jre.jar:?]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.base/java.lang.Thread.run(Thread.java:840) [?:?]


I am not able to understand what action I can take here since its a failure in subtask, what caused this error and how can I mitigate this now.

Any help would be appreciated.


Thanks

Henrique Martins

unread,
Jul 29, 2025, 6:16:36 AMJul 29
to Druid User
Hi Tanay.

Did you change the format you save your segments recently? 
It's saying it was expecting a specific magic number in the end of the file but it actually found "[32, 125, 10, 125]". It could be a json format instead.
Are you using druid-parquet-extensions?

Br,
Henrique Martins.

Tanay Maheshwari

unread,
Jul 29, 2025, 6:31:39 AMJul 29
to Druid User
Hi Henrique,

Thanks for replying. I was able to debug the issue last night. Posting here for everyone's visibility-

We used to create a 0 Byte _SUCCESS file to mark the completion of data. However due to a change manifest data started coming in these files. Druid read them as parquet file and failed. And as you pointed out correctly it is a json format. 

FIX:
used objectGlob field in ioConfig to filter only parquet files at source.

The reason why this did not cause an issue when 0B files were created was because they were getting skipped as per- imply doc.

Follow Up Question: before trying objectGlob field I tried "filter" field with prefixes however it was not read by ingestion-spec. This is how I used it - 
"ioConfig": {      
  "type": "index_parallel",      
  "appendToExisting": true,      
  "inputSource": {        
     "type": "google",        
     "prefixes": ["#REPLACE_ME"],        
     "filter": ".*\\.parquet"

Reply all
Reply to author
Forward
0 new messages