ConcurrentModificationException during pipeline serialization (MAX_SURVIVABILITY)

82 views
Skip to first unread message

Ullrich Hafner

unread,
Nov 16, 2021, 8:09:22 AM11/16/21
to JenkinsCI Developers
As far as I understand the concept of CPS pipelines, steps may be serialized to disk during runtime if the durability level is set to MAX_SURVIVABILITY. It seems that one of my steps causes a ConcurrentModificationException while the step’s memory is written to disk and in parallel the fields (or local variables?) are modified by my step (see [1] for the bug report). Is there anything I need to do in my plugin to avoid such problems?

[1] https://issues.jenkins.io/browse/JENKINS-67145

Devin Nusbaum

unread,
Nov 16, 2021, 11:42:35 AM11/16/21
to jenkin...@googlegroups.com
The CPS VM thread is responsible for saving the Pipeline's execution state, so if you are using a non-blocking step execution (and it looks like you are), it is possible that your step is executing on a background thread while the Pipeline program is being saved on the CPS VM thread. You should account for this in any mutable data reachable from non-blocking step executions that is part of the execution's serialized state, for example by using CopyOnWriteArrayList, replacing fields rather than mutating them, using writeReplace, etc.

The exception in the Jira ticket suggests that one of the objects inside of one of the AnnotatedReport instances returned by a scanIssues step, which is stored a local variable in the user's Pipeline, is being modified. Perhaps the issue is that there is a filename match in the FileStatistics between the issues collected by the user's different parallel branches, so when this code runs for the first time inside of AnnotatedReport.addAll in PublishIssuesStep.Execution.Run it uses the exact FileStatistics instance from an existing AnnotatedReport instance, but when it runs after that it modifies the previous FileStatistics instance here, and then since the FileStatistics instance is also reachable via one of the AnnotatedReport local variables in the Pipeline you have a potential ConcurrentModificationError depending on the serialization timing.

On Tue, Nov 16, 2021 at 8:09 AM Ullrich Hafner <ullrich...@gmail.com> wrote:
As far as I understand the concept of CPS pipelines, steps may be serialized to disk during runtime if the durability level is set to MAX_SURVIVABILITY. It seems that one of my steps causes a ConcurrentModificationException while the step’s memory is written to disk and in parallel the fields (or local variables?) are modified by my step (see [1] for the bug report). Is there anything I need to do in my plugin to avoid such problems?

[1] https://issues.jenkins.io/browse/JENKINS-67145

--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/6F19E406-45C9-4D4F-82C9-D2712E996199%40gmail.com.

Ullrich Hafner

unread,
Nov 16, 2021, 12:09:59 PM11/16/21
to JenkinsCI Developers
Thanks for this in depth description of the problem! I see how much work is required in order to rewrite that part of the code.

Ullrich Hafner

unread,
Nov 17, 2021, 5:57:40 AM11/17/21
to JenkinsCI Developers
After thinking about the problem in more detail have an additional question:

I can replace all of those list modifying operations with a CopyOnWriteArrayList. What is still not clear for me is, how this works if multiple objects are affected? E.g. I have a loop over a number of files that should be post-processed. The result of the post-processing is stored in a CopyOnWriteArrayList. If your background thread serializes my instance during the loop it can happen that the loop variable is already set, but the list with the results is not. That means I need some lock around the whole block of variables, or how is this handled?

Jesse Glick

unread,
Nov 17, 2021, 8:43:16 AM11/17/21
to jenkin...@googlegroups.com
On Wed, Nov 17, 2021 at 5:57 AM Ullrich Hafner <ullrich...@gmail.com> wrote:
[…] is stored in a CopyOnWriteArrayList. If your background thread serializes my instance during the loop it can happen that the loop variable is already set, but the list with the results is not.

Hard to follow what you are saying without a code reference, but a `CopyOnWriteArrayList` is designed to be safe to save from another thread; that is its whole purpose.

I need some lock around the whole block of variables

Block of fields? Again hard to discuss without a concrete example, but in general if there is a block of data that should be replaced atomically, best to keep it all in its own object with `final` fields, and refer to that one object with a `volatile` field. 

Ullrich Hafner

unread,
Nov 17, 2021, 10:02:42 AM11/17/21
to JenkinsCI Developers
Actually I do not have fields anymore in the step:

As far as I understood the problem in JENKINS-67145 my step is somehow serialized during the execution in lines 205-215. Or can I view the code in a step as an atomic unit? 

--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-de...@googlegroups.com.

Jesse Glick

unread,
Nov 17, 2021, 10:17:11 AM11/17/21
to jenkin...@googlegroups.com
On Wed, Nov 17, 2021 at 10:02 AM Ullrich Hafner <ullrich...@gmail.com> wrote:
I do not have fields anymore in the step

Any class which could be referenced directly or indirectly as part of the program state (object graph rooted in Pipeline virtual threads) must be written to expect concurrent access from a serializer thread (JBoss Marshalling into `program.dat`). The code you linked to is uninteresting in this context except insofar as an `AnnotatedReport` object might be passed eventually back to the program as the return value of a step. Therefore that object should either be immutable, or all mutations must have been finished by the time the step completes, or (if necessary, but suspicious) all fields referring to mutable structures must be thread-safe. The same applies to classes referenced from `build.xml` (typically via `Action`s) since these would be serialized via XStream. Other Java objects created transiently during the course of a step’s execution need not be considered.

Devin Nusbaum

unread,
Nov 17, 2021, 10:36:26 AM11/17/21
to jenkin...@googlegroups.com
> As far as I understood the problem in JENKINS-67145 my step is somehow serialized during the execution in lines 205-215.

Kind of, but I was talking about the publishIssues step, not the scanForIssues step. I think the scanForIssues steps in the parallel branches have already completed when the problem manifests and can be ignored based on the information in the Jira ticket.

> Or can I view the code in a step as an atomic unit? 

You only need to worry about the serialized state of the Pipeline, and in this case I think you just need to worry about local variables in the Pipeline (e.g. AnnotatedReport objects returned by scanForIssues) and the StepExecution object for currently-running steps (e.g. publishIssues). Local variables in PublishIssuesStep.Execution.run and other Java code can be ignored. Also, since you are using SynchronousNonBlockingStepExecution, PublishIssuesStep.Execution is not resumable and you may as well mark Execution.step as transient (but I don't think that would fix the issue!).

In your case, I think the main issue is with the AnnotatedReport objects that are returned by scanForIssues that end up being local variables in the Pipeline. You need to be careful about the ways in which you modify those objects in the pubishIssues step, since local Groovy variables are serialized as part of the Pipeline's state, and PublishIssuesStep.Execution.run is running on a background thread. My original message describes the only way that I noticed that publishIssues might modify one of those existing AnnotatedReport objects (on a background thread separate from the CPS VM thread that will serialize the Pipeline) that was returned by a previous scanForIssues step.

My best guess as to minimally fixing the problem is to introduce a copy constructor for FileStatistics and use it here like this:

statisticsMapping.merge(additionalStatistics.getFileName(), new FileStatistics(additionalStatistics), this::merge);

This way, if one of the FileStatistics from one of the other AnnotatedReport instances returned by a scanIssues step has the same file name, RepositoryStatistics.merge will only end up modifying the copied FileStatistics rather than the original object.

A more robust change would be to make AnnotatedReport and all of its fields (and their fields recursively) immutable like Jesse mentioned.

Ullrich Hafner

unread,
Nov 17, 2021, 11:13:36 AM11/17/21
to JenkinsCI Developers

Am 17.11.2021 um 16:36 schrieb 'Devin Nusbaum' via Jenkins Developers <jenkin...@googlegroups.com>:

> As far as I understood the problem in JENKINS-67145 my step is somehow serialized during the execution in lines 205-215.

Kind of, but I was talking about the publishIssues step, not the scanForIssues step. I think the scanForIssues steps in the parallel branches have already completed when the problem manifests and can be ignored based on the information in the Jira ticket.

> Or can I view the code in a step as an atomic unit? 

You only need to worry about the serialized state of the Pipeline, and in this case I think you just need to worry about local variables in the Pipeline (e.g. AnnotatedReport objects returned by scanForIssues) and the StepExecution object for currently-running steps (e.g. publishIssues). Local variables in PublishIssuesStep.Execution.run and other Java code can be ignored. Also, since you are using SynchronousNonBlockingStepExecution, PublishIssuesStep.Execution is not resumable and you may as well mark Execution.step as transient (but I don't think that would fix the issue!).


Ah ok, this is what I thought initially.  Then I can continue using my approach here :-) I totally misunderstood the problem.

In your case, I think the main issue is with the AnnotatedReport objects that are returned by scanForIssues that end up being local variables in the Pipeline. You need to be careful about the ways in which you modify those objects in the pubishIssues step, since local Groovy variables are serialized as part of the Pipeline's state, and PublishIssuesStep.Execution.run is running on a background thread. My original message describes the only way that I noticed that publishIssues might modify one of those existing AnnotatedReport objects (on a background thread separate from the CPS VM thread that will serialize the Pipeline) that was returned by a previous scanForIssues step.

My best guess as to minimally fixing the problem is to introduce a copy constructor for FileStatistics and use it here like this:

statisticsMapping.merge(additionalStatistics.getFileName(), new FileStatistics(additionalStatistics), this::merge);

This way, if one of the FileStatistics from one of the other AnnotatedReport instances returned by a scanIssues step has the same file name, RepositoryStatistics.merge will only end up modifying the copied FileStatistics rather than the original object.


This is the problem, thanks! I thought that I am creating a new object but actually I am doing not. I’m changing the existing ones, that is a design flaw that I did overlook. I think a simple solution is to create new objects as you already mentioned. Thanks!   

A more robust change would be to make AnnotatedReport and all of its fields (and their fields recursively) immutable like Jesse mentioned.

On Wed, Nov 17, 2021 at 10:02 AM Ullrich Hafner <ullrich...@gmail.com> wrote:


Am 17.11.2021 um 14:42 schrieb 'Jesse Glick' via Jenkins Developers <jenkin...@googlegroups.com>:

On Wed, Nov 17, 2021 at 5:57 AM Ullrich Hafner <ullrich...@gmail.com> wrote:
[…] is stored in a CopyOnWriteArrayList. If your background thread serializes my instance during the loop it can happen that the loop variable is already set, but the list with the results is not.

Hard to follow what you are saying without a code reference, but a `CopyOnWriteArrayList` is designed to be safe to save from another thread; that is its whole purpose.

I need some lock around the whole block of variables

Block of fields? Again hard to discuss without a concrete example, but in general if there is a block of data that should be replaced atomically, best to keep it all in its own object with `final` fields, and refer to that one object with a `volatile` field. 


Actually I do not have fields anymore in the step:

As far as I understood the problem in JENKINS-67145 my step is somehow serialized during the execution in lines 205-215. Or can I view the code in a step as an atomic unit? 

--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/CANfRfr0ouBxhg1arOFwosdrit-O9WdBeiiAbgSAA00%3DPnbBkPQ%40mail.gmail.com.


--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/80549ED3-122A-49E6-9327-0B2D3A376EB9%40gmail.com.

--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-de...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages