thread safety of scripted pipeline parallel and usage of nested parallel

1,366 views
Skip to first unread message

abstrakta

unread,
Oct 30, 2021, 8:18:29 AM10/30/21
to jenkinsci-users
Hi, Jenkins friends.I wish that I'm in the right place that post these Jenkins usage question.

I find that the Scriped Pipeline parallel works like threads.Unlike the Declarative Pipeline parallel,the Scriped Pipeline parallel just use one executor.Closure in parallel parameters works like a thread.
My question is:

1.Does Jenkins garantee the data thread safety of parallel closure internally?

2.Does I need to care about the thread safety of the commands that executes in scriped parallel closure?

3.Is there any limit of usage in the commands that executes in parallel?Can I use nested scripted parallel? Why the documentation of Declarative Pipeline parallel in Pipeline Syntax reference says that "Note that a stage must have one and only one of steps, stages, parallel, or matrix. It is not possible to nest a parallel or matrix block within a stage directive if that stage directive is nested within a parallel or matrix block itself."

I test some nested Pipeline code that might cause thread race condition many times.Jekins always give the right answer that shared data is modified correctly.Is this thread safety garanteed in the design of Jenkins parallel directive?

Pipeline code like this:

pipeline{
    agent any
    stages
    {
        stage('Parallel BuiLd') {
            steps {
                script {
                    def i = 0
                    def data = 0
                    def builds = [:]
                    stash name: 'src', include: 'src/**'
                    //generate 1000 parallel block
                    for (i = 0; i<1000; i++) {
                        // make the Map of Closure
                        builds["$i"] = {
                            //modify shared data, need thread mutex lock?
                            data++
                            //unstash or other command, need thread mutex lock?
                            unstah name: 'src'
                            def tests = [:]
                            // ... generate tests Map
                            // Can I use nested parallel?
                            parallel tests
                        }
                    }
                    parallel builds
                    println data //It does always print 1000
                }
            }
        }
    }
}

The variable data is always modified to 1000. So Jenkins garantee the thread safety of parallel?

Ivan Fernandez Calvo

unread,
Oct 30, 2021, 12:03:34 PM10/30/21
to Jenkins Users
No, if you plan to use shared variables across parallel steps you should use Java types that are thread safe, if not you will have random/weird results. I have several pipelines that uses a map to store results in parallel steps 

abstrakta

unread,
Oct 30, 2021, 12:44:57 PM10/30/21
to jenkinsci-users
Thanks for your reply.
So parallel directive is like spawning some Java threads?Do you have any pipeline code can demo this thread safety issue and how to fix it using Java types that are thread safe ?
I guess that some directive "stash unstash archive" should be thread safety internally.Because I find some articles use parallel unstash in different slave node without thread protection.Is my guess correct?I can't find any other articles that discuss this parallel thread safety issue.
---Original---
From: "Ivan Fernandez Calvo"<kuisat...@gmail.com>
Date: Sun, Oct 31, 2021 00:03 AM
To: "Jenkins Users"<jenkins...@googlegroups.com>;
Subject: Re: thread safety of scripted pipeline parallel and usage of nested parallel
--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/0712b8c5-42dc-439f-a017-2a5ca45ad1e9n%40googlegroups.com.

Ivan Fernandez Calvo

unread,
Oct 31, 2021, 7:19:47 AM10/31/21
to Jenkins Users
The following example should work (I did not test it), I my case I have used maps like the “result” variable, that it is a simple map not synchronized and stores data from all task but it is not read from the different tasks. The other two cases “data” and “mapSync” uses concurrent classes, they are thread safe and synchronized so you can share data across tasks, I dunno is they are in the allow list for pipeline, if not you have to approve it use in pipelines in the Jenkins config. Finally, the last part of the pipeline uses nested parallel task, from my experience is not a good idea, the parallel explosion of task is a little incontrolable and there are other solution like launch a job from those task and inside that job launch parallel task, in this way you only have 1 parallel level that is easy to control and understand when something goes wrong.

import groovy.transform.Field
import java.util.concurrent.atomic.AtomicInteger
import java.util.concurrent.ConcurrentHashMap

@Field def results = [:]
@Field AtomicInteger data = new AtomicInteger()
@Filed ConcurrentHashMap mapSync = new ConcurrentHashMap<String,Integer>()

pipeline{
    agent any
    stages
    {
        stage(‘Parallel BuiLd’) {
            steps {
                script {
                    def i = 0
                    def builds = [:]
                    mapSync[“odd”] = 0
                    mapSync[“even”] = 0
                    stash name: ‘src’, include: ‘src/**’
                    //generate 1000 parallel block
                    for (i = 0; i<1000; i++) {
                        // make the Map of Closure
                        builds[“$i”] = {
                            results[“$i”] = 1
                            data++
                            if(i%2==0){
                              mapSync[“odd”] = mapSync[“odd”]++ 
                            } else {
                              mapSync[“even”] = mapSync[“even”]++ 
                            }
                        }
                    }
                    parallel builds
                    println results.toString()
                    println data
                    println mapSync

abstrakta

unread,
Oct 31, 2021, 10:49:09 AM10/31/21
to jenkinsci-users
OK,I know what you means. But I test these pipeline that data is not synchronized many times( about more than 700 times,I use periodical build to test it automatically). Shared data that is not synchronized is processed correctly every time. Do you really encounter the situation that shared data is corrupt in parallel task?

In some scenarios, I want to parallel build one more artifacts that used in different platforms from the same source code. And then I parallel test all artifacts in their corresponding platforms(One artifact might be used in one more platforms).

build platform: A B C
test platform: A(A1 A2 A3), B(B1 B2), C(C1 C2)

                           -> A1
                    -> A -> A2
                   |       -> A3
                  /    
source code --> B -> B1
                  \        -> B2
                   |
                    -> C -> C1
                           -> C2
                           -> C3


Pipeline code like this:

pipeline {
    agent any
    stages {
        stage('Parallel Build') {
            steps {
                script {
                    def builds = [:]
                    def tests = [:]

                    stash name: 'src', include: 'src/**'
                   
                    def build_action = { platform, tests ->
                       
                            //build in some slave node.
                            node ("${platform}") {
                                //unstash need thread mutex lock?
                                unstash name: 'src'
                               
                                sh "make ${platform}"
                               
                                //archive need thread mutex lock?
                                archive name: "./${platform}"
                            }
                           
                            //nested parallel, outside of node block
                            parallel tests
                    }
                   
                    def test_action = { platform ->
                        //test in some slave node.
                        node ("${platform}") {
                            sh "make ${platform}-test"
                        }
                    }
                   
                    tests["A1"] = test_action("A1")
                    tests["A2"] = test_action("A2")
                    tests["A3"] = test_action("A3")
                    builds["A"] = build_action("A",tests)

                    tests = [:]
                    tests["B1"] = test_action("B1")
                    tests["B2"] = test_action("B2")
                    builds["B"] = build_action("B",tests)
                   
                    tests = [:]
                    tests["C1"] = test_action("C1")
                    tests["C2"] = test_action("C2")
                    builds["C"] = build_action("B",tests)
                   
                    parallel builds
                }
            }
        }
    }
}

I have been running this pipeline for some time. It works well. But I worry about the thread safety of unstash(and archive,or other commands). I want to find some evidence that Jenkins garantee these parallel thread safety and nested parallel is ok. Otherwise I must split these platforms into different build jobs(which I think it's more difficult to manage projects.) Because Jenkins must garantee parallel build jobs more safe? I just don't know the underlying works that Jenkins do about parallel.I'm just curious about these thread safety.If Jenkins take care of these synchronization,that is the best. If such, I don't need to care about these thread things. If not,I want to find correct ways to do parallel tasks.

------------------ 原始邮件 ------------------
发件人: "jenkinsci-users" <kuisat...@gmail.com>;
发送时间: 2021年10月31日(星期天) 晚上7:19
收件人: "Jenkins Users"<jenkins...@googlegroups.com>;
主题: Re: thread safety of scripted pipeline parallel and usage of nested parallel

abstrakta

unread,
Oct 31, 2021, 1:55:29 PM10/31/21
to jenkinsci-users
sorry, pipeline code should be correct like this:
---原始邮件---
发件人: "'abstrakta' via Jenkins Users"<jenkins...@googlegroups.com>
发送时间: 2021年10月31日(周日) 晚上10:48
收件人: "jenkinsci-users"<jenkins...@googlegroups.com>;
主题: 回复: thread safety of scripted pipeline parallel and usage of nested parallel

kuisathaverat

unread,
Nov 1, 2021, 7:03:29 AM11/1/21
to jenkins...@googlegroups.com
Jenkins pipelines have two types of code, the steps and directives provided by Jenkins, and the Groovy code.
All the Jenkins steps and directives are thread safe. On your Groovy you must guarantee is thread-safe if it needs to be.
Unstash and archive are Jenkins steps you can use them in parallel tasks and they must work as expected, I only see one case that archive can cause issues and is when you archive the same artifact name from different tasks, but is completely logical that fails or corrupt the file due to you ar doing something wrong (the same happens with stash in parallel tasks)
The last thought, try to minimize all the groovy code you add to your pipelines, all Groovy code run in the Jenkins controller, try to delegate on scripts, commands, tools, ... as much as possible.

You received this message because you are subscribed to a topic in the Google Groups "Jenkins Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/jenkinsci-users/9fzFnwLkfxU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to jenkinsci-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/tencent_E81808632457C23792EEBFC46A5A093E5C08%40qq.com.


--

Devin Nusbaum

unread,
Nov 1, 2021, 9:28:14 AM11/1/21
to jenkins...@googlegroups.com
The Pipeline execution engine uses a "green thread" model internally - there is only ever one Java thread executing the Groovy DSL for a given Pipeline (step execution may use additional Java threads, but this should always be transparent to users). See the final paragraph under https://github.com/jenkinsci/workflow-cps-plugin#technical-design for a few more details. You should not need to worry about Java thread safety across the various branches in a parallel step as long as you are not doing anything with Java threads directly in your Pipeline.

abstrakta

unread,
Nov 2, 2021, 12:42:14 AM11/2/21
to jenkinsci-users
So nested parallel is allowed too ?
---Original---
From: "'Devin Nusbaum' via Jenkins Users"<jenkins...@googlegroups.com>
Date: Mon, Nov 1, 2021 21:27 PM
To: "jenkinsci-users"<jenkins...@googlegroups.com>;

dnus...@cloudbees.com

unread,
Nov 2, 2021, 9:52:06 AM11/2/21
to Jenkins Users
Scripted Pipeline supports nested parallel steps at the execution level, although visualizations like Blue Ocean only support one level of parallelization, and Declarative Pipeline does not natively support nested parallel, and I do not think nested parallel has much real-world use or testing. Also, there have been some subtle bugs related to Pipeline durability with nested parallel steps in the past, so you may want to do some related testing with your Pipeline before committing to nested parallel if durability is important to you.

Thanks,
Devin

abstrakta

unread,
Nov 3, 2021, 11:04:33 AM11/3/21
to jenkinsci-users
https://www.jenkins.io/doc/book/pipeline/scaling-pipeline/ says that durability is the ability of restarting running jobs even if Jenkins server restarts or system crashes.It seems not much useful. Jenkins server seems not so easy to crash.
What are some subtle bugs related to Pipeline durability with nested parallel steps in the past?I'd like to use need parallel if possible.In some situations, nested parallel is useful.
For example,you want build artifacts in A,B,C platforms parallelly.Then you run test artifacts in corresponding A(A1 A2 A3), B(B1 B2 B3), C(C1 C2 C3).If not using nested parallel, you must wait for A,B,C all finish their building to start test.If building A cost 10mins, building B cost 40mins, branch A don't need to wait the remaining 30min.Nested parallel can reduce this kind of waitting time,branch A just start its own parallel test after branch A finish, without waitting branch B.In the meantime, if branch A fails, branch B will continue to run B's test, which is originally not related to branch A's fail.
If you spilt nested parallel into two sequential parallel, you must handle the fails of different branch, only start test for successful build platforms.Or if you spilt nested parallel into two Jenkins build jobs,it seems splitting two related things, which make things harder to understand.

Thanks.

------------------ 原始邮件 ------------------
发件人: "jenkinsci-users" <jenkins...@googlegroups.com>;
发送时间: 2021年11月2日(星期二) 晚上9:52
Reply all
Reply to author
Forward
0 new messages