Award Documents getting stuck enroute due to database blocking

7 views
Skip to first unread message

Ronald Gouldner

unread,
Jun 21, 2018, 8:27:35 PM6/21/18
to KC.Tech.Collab
Lately we have been running into an issue with Award documents getting stuck in an enroute status.

The documents should just go final because we have limited routing only for certain rare types of awards.  So most awards go straight to final.

The documents get stuck in the enroute status until we bounce our system which is done daily at 3am or until we locate the blocking session and kill it.
You can see this in the route log.  This document was submitted at 12:21pm 6/20.
image.png

When we investigate we find the following query was blocked from 12:21 until 3:03am.
SELECT DOC_HDR_ID, APP_DOC_ID, APP_DOC_STAT, APP_DOC_STAT_MDFN_DT, APRV_DT, CRTE_DT, STAT_MDFN_DT, RTE_LVL, DOC_HDR_STAT_CD, TTL, DOC_VER_NBR, DOC_TYP_ID, FNL_DT, INITR_PRNCPL_ID, OBJ_ID, RTE_STAT_MDFN_DT, RTE_PRNCPL_ID, VER_NBR
FROM KREW_DOC_HDR_T
WHERE (DOC_HDR_ID = :1 ) FOR UPDATE WAIT 3600000

Note it is waiting for 3600000 which look like 1 hr in milliseconds but it is actually supposed to be in minutes so 3600000 is 60,000 minutes or 1000 hours or 41.66 days which seems maybe a little too long :)

The weird part is we can't determine any other sql in the session which is blocking this statement.  So I now suspect the issue is the code is not closing the transaction for this select for update leaving it running for the 41.66 days or until we happen to bounce our server at 3am, whichever comes first :)   I think it may just be this select statement competing with other instances of the same select for update. 

We caught this blocking while it was happening once in the past however oracle indicated that the session that was blocking with this update had no active SQL.  The other session appeared to not be doing anything.  So we killed that session and this update completed.

The page listed above which explains "for update" also describes why "real-world large online systems" don't use the "for update" clause.  I don't understand why KC (rice) would need to use the select for update.  The status changes in the background from S -> R -> F.  The S-R happens but the R-F gets stuck.  There is no user watching the data or editing it so I think this could just be some bad code that needs to be found and fixed to perform the two updates when they are needed.

Has anyone else run into this?

I see a few possible solutions.
1) find the code that is doing the select for update and change it to a straight update commands.
2) lower the for wait time limit.  I am not sure if this will cause a rollback or let the update complete when it times out.  But this still seems dangerous because during the time limit other updates will potentially be blocked.  I don't see any parameters for this amount of time and I didn't find 3,600,000 in the code anywhere.  Maybe it is something like 60*60*1000 or something in the code.

Reply all
Reply to author
Forward
0 new messages