That stack trace might be related, but probably not directly the cause of the issue.
That error is specifically when one part of the UI tries to use the server APIs to return/view the current state of a stage, but doesn't tell us what has happened during the actual build stage that possibly caused that stage to be in a corrupt state (it shouldn't be possible for you to get to this state - think I have seen this reported by someone before, but never been able to get to the root cause) or why things are getting stuck now.
One possibility is that something gets into a bad state if the server is shut down in the middle of running/triggering pipelines (which may or may not relate to what happened with your upgrade), or if there is some kind of DB issue when trying to update the stage's status. I think I can see how this might theoretically be possible, but don't have a specific scenario to know how to trigger it.
If the reason it gets stuck is the same as the reason you see that stack trace for the UI, I imagine the "fix" might involve correcting the data for the stage in the STAGES table in the database, if we can't find any other workaround.
Perhaps we should focus on what's actually going wrong right now with the stage in question.
- When it hangs forever, what is it doing?
- Are jobs running on an agent or has it not even been allocated to an agent?
- is an agent allocated but never picks it up?
- If it is running on the agent, what do the agent logs say for the job that is stuck?
- Are there other earlier stack traces/errors in the logs for the same stage/job?
- What does the relevant "stuck" stage/job look like on the dashboard, e.g
- If the Stage Details view loads without error, what does it look like? (this view)
-Chad