[JIRA] (JENKINS-60526) jenkins master allocating and removing mesos nodes slower

2 views
Skip to first unread message

hanbing2133@sina.cn (JIRA)

unread,
Dec 18, 2019, 10:36:03 AM12/18/19
to jenkinsc...@googlegroups.com
bing han created an issue
 
Jenkins / Bug JENKINS-60526
jenkins master allocating and removing mesos nodes slower
Issue Type: Bug Bug
Assignee: Unassigned
Components: core
Created: 2019-12-18 15:35
Environment: centos 7 meos plugin 0.17 jenkins 2.138.4
Priority: Critical Critical
Reporter: bing han

There are a lot of building job in jenkins using mesos nodes (more than two hundrend), which happens allocating and removing mesos nodes slower than normal situaton.There are a lot of building job in jenkins using mesos nodes (more than two hundrend), which happens allocating and removing mesos nodes slower than normal situaton.We compile the Jenkins core code with some debug information log to jenkins.log As follow(core/src/main/java/jenkins/model/Nodes.java) 

 public void removeNode(final @Nonnull Node node) throws IOException {                   Logger.getLogger(Nodes.class.getName()).log(Level.INFO,"node.getNodeName boefore")        if (node == nodes.get(node.getNodeName())) {           Logger.getLogger(Nodes.class.getName()).log(Level.INFO,"removeNode Queue.withLock(new Runnable()  before")         

  Queue.withLock(new Runnable() {               

   @Override               

public void run() {

     Logger.getLogger(Nodes.class.getName()).log(Level.INFO,"removeNode public void run()  enter")                 

                Computer c = node.toComputer();                 

                if (c != null)

{                                                 c.recordTermination();                                                         c.disconnect(OfflineCause.create(hudson.model.Messages._Hudson_NodeBeingRemoved()));                    }

                   

            if (node == nodes.remove(node.getNodeName()))

{                                                                       jenkins.updateComputerList();                                         jenkins.trimLabels();                    }

               

      } 

          });            // no need for a full save() so we just do the minimum            Util.deleteRecursive(new File(getNodesDir(), node.getNodeName()));
            NodeListener.fireOnDeleted(node);       

   }   

}

We find the problems by using compiled jenkins core to testing run a lot of mesos nodes. Jenkins master deleting mesos nodes us cycle. tasksMesos pending deleting slave cleanup.log show the longest duration time is as follow:

Started at Sun Oct 13 16:17:01 CST 2019

Finished as Sun Oct 20:10:01 CST 2019

During this cleaning mesos nodes period, most of the time is waiting a lock defined in Queue class. The log is as follow:

Oct 13 2019 4:17:35 PM jenkins.model.Nodes. removeNode

INFO: removeNode Queue.withLock(new Runnable()  before

Oct 13 2019 6:50:59 PM jenkins.model.Nodes$6 run

INFO: removeNode public void run()  enter

At the same time, jenkins allocates mesos nodes due to the lock. Therefore, is there some solution for the problem which happens allocating and removing mesos nodes slower than normal situaton?

Add Comment Add Comment
 
This message was sent by Atlassian Jira (v7.13.6#713006-sha1:cc4451f)
Atlassian logo
Reply all
Reply to author
Forward
0 new messages