Bug Fix: CDF Pipeline incorrectly marked FAILED due to Dataproc CancelJob on DONE job (b/460875216)

5 views
Skip to first unread message

C.J. Collier

unread,
Nov 14, 2025, 7:50:15 PMNov 14
to CDAP Developer

Hi CDAP-Dev,

I'd like to give a heads-up about a bug fix I'm proposing for CDAP. Currently, Data Fusion can incorrectly mark a pipeline as FAILED if, during the ephemeral Dataproc cluster deprovisioning, it attempts to cancel a Dataproc job that has already completed successfully.

The Issue:

The CDAP RemoteExecutionTwillController sends a CancelJob request to Dataproc. If the job is already in the DONE state, Dataproc returns an error. This error is then caught in AbstractDataprocProvisioner, which treats it as a pipeline failure, even though the pipeline logic was successful. This leads to false-negative pipeline statuses.

The Fix:

I've implemented changes to:

  1. RemoteExecutionTwillController.java: Add a status check before attempting to kill the remote process in the complete() method's error handling path, to avoid sending a cancel request if the job is already in a terminal state.
  2. AbstractDataprocProvisioner.java: Gracefully handle the specific API error from Dataproc when a CancelJob is attempted on a DONE job, logging a warning instead of failing the pipeline.

Unit tests have been added to cover these changes.

Internal tracking for this issue is in Buganizer: b/460875216

A Pull Request on GitHub will follow shortly.

Thanks,

C.J. Collier
Dataproc Subject Matter Expert

Reply all
Reply to author
Forward
0 new messages