[JIRA] (JENKINS-58879) EC2 plugin queue locks have resurfaced

14 views
Skip to first unread message

joyce.z.yee@gmail.com (JIRA)

unread,
Aug 9, 2019, 2:58:02 PM8/9/19
to jenkinsc...@googlegroups.com
Joyce Yee created an issue
 
Jenkins / Bug JENKINS-58879
EC2 plugin queue locks have resurfaced
Issue Type: Bug Bug
Assignee: FABRIZIO MANFREDI
Components: ec2-plugin
Created: 2019-08-09 18:57
Environment: Jenkins 2.186 (from the jenkins/jenkins:2.186 docker image)
Amazon EC2 plugin (1.44.1)
Priority: Blocker Blocker
Reporter: Joyce Yee

After upgrading to 1.44.1, we are observing the return of long-standing queue locking as described in this [PR|https://github.com/jenkinsci/ec2-plugin/pull/346].

It looks like the issue is reintroduced within the scope of this commit.

I think it's because the `clock` is now instantiated within the queue lock, and therefore halts the ability for the `EC2RetentionStrategy` to skip the `NodeProvisioner.update()` cycle triggered with `internalCheck`, if the last check happened < 1 minute ago. 

Downgrading back down to version `1.42` where we patch in the code introduced in the original commit (https://github.com/jenkinsci/ec2-plugin/pull/346) fixes the issue and we no longer observe these frequently occurring queue locks.

 

Add Comment Add Comment
 
This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)

joyce.z.yee@gmail.com (JIRA)

unread,
Aug 9, 2019, 3:00:02 PM8/9/19
to jenkinsc...@googlegroups.com
Joyce Yee updated an issue
Change By: Joyce Yee
After upgrading to 1.44.1, we are observing the return of long-standing queue locking as described in this [ PR | ( [https://github.com/jenkinsci/ec2-plugin/pull/346] ] ) .

It looks like the issue is reintroduced within the scope of this [commit|
[https://github.com/jenkinsci/ec2-plugin/commit/4973e9f4371b25abdcf78605380e12e57be336aa #diff-e577be8c21e85e6a1f2ff095f583f5d2R237]].


I think it's because the `clock` is now instantiated within the queue lock, and therefore halts the ability for the `EC2RetentionStrategy` to skip the `NodeProvisioner.update()` cycle triggered with `internalCheck`, if the last check happened < 1 minute ago. 

Downgrading back down to version `1.42` where we patch in the code introduced in the original commit ([https://github.com/jenkinsci/ec2-plugin/pull/346]) fixes the issue and we no longer observe these frequently occurring queue locks.

 

joyce.z.yee@gmail.com (JIRA)

unread,
Aug 9, 2019, 3:00:02 PM8/9/19
to jenkinsc...@googlegroups.com
Joyce Yee updated an issue
After upgrading to 1.44.1, we are observing the return of long-standing queue locking as described in this PR ([https://github.com/jenkinsci/ec2-plugin/pull/346]).

It looks like the issue is reintroduced within the scope of this [commit|#diff-e577be8c21e85e6a1f2ff095f583f5d2R237]
] .


I think it's because the `clock` is now instantiated within the queue lock, and therefore halts the ability for the `EC2RetentionStrategy` to skip the `NodeProvisioner.update()` cycle triggered with `internalCheck`, if the last check happened < 1 minute ago. 

Downgrading back down to version `1.42` where we patch in the code introduced in the original commit ([https://github.com/jenkinsci/ec2-plugin/pull/346]) fixes the issue and we no longer observe these frequently occurring queue locks.

 

joyce.z.yee@gmail.com (JIRA)

unread,
Aug 9, 2019, 3:14:02 PM8/9/19
to jenkinsc...@googlegroups.com
Joyce Yee updated an issue
After upgrading to 1.44.1, we are observing the return of long-standing queue locking as described in this PR ([https://github.com/jenkinsci/ec2-plugin/pull/346]).

It looks like the issue is reintroduced within the scope of this commit ( [ https://github.com/jenkinsci/ec2-plugin/ commit | /4973e9f4371b25abdcf78605380e12e57be336aa #diff-e577be8c21e85e6a1f2ff095f583f5d2R237] ) .


I think it's because the `clock` is now instantiated within the queue lock, and therefore halts the ability for the `EC2RetentionStrategy` to skip the `NodeProvisioner.update()` cycle triggered with `internalCheck`, if the last check happened < 1 minute ago. 

Downgrading back down to version `1.42` where we patch in the code introduced in the original commit ([https://github.com/jenkinsci/ec2-plugin/pull/346]) fixes the issue and we no longer observe these frequently occurring queue locks.

 

joyce.z.yee@gmail.com (JIRA)

unread,
Aug 9, 2019, 3:47:01 PM8/9/19
to jenkinsc...@googlegroups.com
Joyce Yee updated an issue
After upgrading to 1.44.1, we are observing the return of long-standing queue locking as described in this PR ([https://github.com/jenkinsci/ec2-plugin/pull/346]).

It looks like the issue is reintroduced within the scope of this
commit ( [ https://github.com/jenkinsci/ec2-plugin/ commit /4973e9f4371b25abdcf78605380e12e57be336aa | #diff-e577be8c21e85e6a1f2ff095f583f5d2R237] )
.

I think it's because
since the `clock` is now only instantiated within  the queue lock `readResolve` , and therefore halts not in the ability constructor of `EC2RetentionStrategy`, the problem is re-introduced for workers that are being newly created (not for workers that were persisted from XML). The clock may not exist when the ` EC2RetentionStrategy check ` method is invoked, which prevents the strategy's ability to to skip the `NodeProvisioner.update()` cycle triggered with `internalCheck`, if the last check happened < 1 minute ago.   The solution is probably to instantiate the clock in both the constructor and `readResolve`.

Downgrading back down to version `1.42` where we patch in the code introduced in the original commit ([https://github.com/jenkinsci/ec2-plugin/pull/346]) fixes the issue and we no longer observe these frequently occurring queue locks.

 

fabrizio.manfredi@gmail.com (JIRA)

unread,
Aug 10, 2019, 3:03:03 PM8/10/19
to jenkinsc...@googlegroups.com
FABRIZIO MANFREDI started work on Bug JENKINS-58879
 
Change By: FABRIZIO MANFREDI
Status: Open In Progress

fabrizio.manfredi@gmail.com (JIRA)

unread,
Aug 10, 2019, 3:04:02 PM8/10/19
to jenkinsc...@googlegroups.com
FABRIZIO MANFREDI commented on Bug JENKINS-58879
 
Re: EC2 plugin queue locks have resurfaced

Some  improvement has been released in the 1.45 to reduce the query and the locking of the queue, can you test ?

fabrizio.manfredi@gmail.com (JIRA)

unread,
Aug 10, 2019, 3:04:02 PM8/10/19
to jenkinsc...@googlegroups.com

mrjunzehe@gmail.com (JIRA)

unread,
Aug 13, 2019, 11:18:03 AM8/13/19
to jenkinsc...@googlegroups.com
Junze He commented on Bug JENKINS-58879
 
Re: EC2 plugin queue locks have resurfaced

We tested it with 1.45 and didn't notice any locks. Thanks!

Reply all
Reply to author
Forward
0 new messages