[JIRA] (JENKINS-57410) Splunk Plugin Causing threadlock When Testing Connections

Skip to first unread message

rsmith@cloudbees.com (JIRA)

unread,
May 10, 2019, 6:09:03 PM5/10/19
to jenkinsc...@googlegroups.com
Ryan Smith created an issue
 
Jenkins / Bug JENKINS-57410
Splunk Plugin Causing threadlock When Testing Connections
Issue Type: Bug Bug
Assignee: Ted
Components: splunk-devops-plugin
Created: 2019-05-10 22:08
Environment: CloudBees Jenkins Enterprise 2.138.2.2-rollingSplunk Plugin for Jenkins 1.7.1
Labels: Splunk
Priority: Minor Minor
Reporter: Ryan Smith

Filing on behalf of john...@viasat.com:

We are using the Splunk Plugin for Jenkins 1.7.1 (latest).

I am seeing some odd behavior that I would like to get some further feedback on please.

In the configure system section there is a option to test the Splunk connection. If this is pressed several times it appears the the two splunkins-worker threads will become locked indefinitely until Jenkins is restarted (not sending any data to Splunk). Looking at netstat there are several sockets in CLOSE_WAIT to the Splunk HTTP input host.

I am able to reproduce this at will within our test environment. It appears the test connection is using a call back to Jenkins for which we have NGINX in front. Once in this state additional attempts to test the connection fail due to a 504 timeout:

Here is the NGINX log of interest:

2019/04/29 16:01:36 [error] 9051#9051: *831865 upstream prematurely closed connection while reading response header from upstream, client: 10.68.153.30, server: wdc1jenkinst01.hq.corp.viasat.com, request: "POST /descriptorByName/com.splunk.splunkjenkins.SplunkJenkinsInstallation/testHttpInput HTTP/1.1", upstream: "http://127.0.0.1:8080/descriptorByName/com.splunk.splunkjenkins.SplunkJenkinsInstallation/testHttpInput", host: "jenkins.test.viasat.com", referrer: "https://jenkins.test.viasat.com/configure"
{code:java}

Here is the netstat output as well:

before: 
{code:java}
 Mon Apr 29 16:46:47 MDT 2019 
[root@wdc1jenkinst01 ~]# netstat -ntap | grep java|grep :8088 
tcp 0 0 10.68.154.134:35287 10.137.14.19:8088 ESTABLISHED 22706/java 
tcp 0 0 10.68.154.134:35291 10.137.14.19:8088 ESTABLISHED 22706/java
 
                                                            
 
                                                            

date;netstat -natp|grep java|grep :8088 
Mon Apr 29 16:52:33 MDT 2019 
tcp 102 0 10.68.154.134:11909 10.137.14.79:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36547 10.137.14.19:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:11895 10.137.14.79:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36207 10.137.14.19:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36569 10.137.14.19:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36581 10.137.14.19:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:11925 10.137.14.79:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36575 10.137.14.19:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:11885 10.137.14.79:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36529 10.137.14.19:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36531 10.137.14.19:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:11919 10.137.14.79:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36553 10.137.14.19:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:11915 10.137.14.79:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36519 10.137.14.19:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:11901 10.137.14.79:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36541 10.137.14.19:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36223 10.137.14.19:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36511 10.137.14.19:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36563 10.137.14.19:8088 CLOSE_WAIT 22706/java

 
                                                            

Attached is a thread dump taken before/after. Please let me know if anything else is needed to further investigate. I can reproduce at will and this is a test env so we can pretty much do whatever you need to get the relevant data.

Add Comment Add Comment
 
This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)

rsmith@cloudbees.com (JIRA)

unread,
May 10, 2019, 6:10:02 PM5/10/19
to jenkinsc...@googlegroups.com
Ryan Smith updated an issue
Change By: Ryan Smith
Environment: CloudBees Jenkins Enterprise 2.138.2.2- rollingSplunk rolling
Splunk
Plugin for Jenkins 1.7.1

rsmith@cloudbees.com (JIRA)

unread,
May 10, 2019, 6:10:03 PM5/10/19
to jenkinsc...@googlegroups.com
Ryan Smith updated an issue
Filing on behalf of john...@viasat.com:

We are using the Splunk Plugin for Jenkins 1.7.1 (latest).

I am seeing some odd behavior that I would like to get some further feedback on please.

In the configure system section there is a option to test the Splunk connection. If this is pressed several times it appears the the two splunkins-worker threads will become locked indefinitely until Jenkins is restarted (not sending any data to Splunk). Looking at netstat there are several sockets in CLOSE_WAIT to the Splunk HTTP input host.

I am able to reproduce this at will within our test environment. It appears the test connection is using a call back to Jenkins for which we have NGINX in front. Once in this state additional attempts to test the connection fail due to a 504 timeout:

Here is the NGINX log of interest:

{code :java }

2019/04/29 16:01:36 [error] 9051#9051: *831865 upstream prematurely closed connection while reading response header from upstream, client: 10.68.153.30, server: wdc1jenkinst01.hq.corp.viasat.com, request: "POST /descriptorByName/com.splunk.splunkjenkins.SplunkJenkinsInstallation/testHttpInput HTTP/1.1", upstream: "http://127.0.0.1:8080/descriptorByName/com.splunk.splunkjenkins.SplunkJenkinsInstallation/testHttpInput", host: "jenkins.test.viasat.com", referrer: "https://jenkins.test.viasat.com/configure"
{code
:java }


Here is the netstat output as well:

before:
 
{code}
Mon Apr 29 16:46:47 MDT 2019 
[root@wdc1jenkinst01 ~]# netstat -ntap | grep java|grep :8088 
tcp 0 0 10.68.154.134:35287 10.137.14.19:8088 ESTABLISHED 22706/java 
tcp 0 0 10.68.154.134:35291 10.137.14.19:8088 ESTABLISHED 22706/java{code}
{code}

{code}

date;netstat -natp|grep java|grep :8088 
Mon Apr 29 16:52:33 MDT 2019 
tcp 102 0 10.68.154.134:11909 10.137.14.79:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36547 10.137.14.19:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:11895 10.137.14.79:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36207 10.137.14.19:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36569 10.137.14.19:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36581 10.137.14.19:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:11925 10.137.14.79:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36575 10.137.14.19:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:11885 10.137.14.79:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36529 10.137.14.19:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36531 10.137.14.19:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:11919 10.137.14.79:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36553 10.137.14.19:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:11915 10.137.14.79:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36519 10.137.14.19:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:11901 10.137.14.79:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36541 10.137.14.19:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36223 10.137.14.19:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36511 10.137.14.19:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36563 10.137.14.19:8088 CLOSE_WAIT 22706/java
{code}


Attached is a thread dump taken before/after. Please let me know if anything else is needed to further investigate. I can reproduce at will and this is a test env so we can pretty much do whatever you need to get the relevant data.

rsmith@cloudbees.com (JIRA)

unread,
May 10, 2019, 6:10:04 PM5/10/19
to jenkinsc...@googlegroups.com
Ryan Smith updated an issue
Filing on behalf of john...@viasat.com:

We are using the Splunk Plugin for Jenkins 1.7.1 (latest).

I am seeing some odd behavior that I would like to get some further feedback on please.

In the configure system section there is a option to test the Splunk connection. If this is pressed several times it appears the the two splunkins-worker threads will become locked indefinitely until Jenkins is restarted (not sending any data to Splunk). Looking at netstat there are several sockets in CLOSE_WAIT to the Splunk HTTP input host.

I am able to reproduce this at will within our test environment. It appears the test connection is using a call back to Jenkins for which we have NGINX in front. Once in this state additional attempts to test the connection fail due to a 504 timeout:

Here is the NGINX log of interest:

{code:java}
2019/04/29 16:01:36 [error] 9051#9051: *831865 upstream prematurely closed connection while reading response header from upstream, client: 10.68.153.30, server: wdc1jenkinst01.hq.corp.viasat.com, request: "POST /descriptorByName/com.splunk.splunkjenkins.SplunkJenkinsInstallation/testHttpInput HTTP/1.1", upstream: "http://127.0.0.1:8080/descriptorByName/com.splunk.splunkjenkins.SplunkJenkinsInstallation/testHttpInput", host: "jenkins.test.viasat.com", referrer: "https://jenkins.test.viasat.com/configure"
{code:java}

Here is the netstat output as well:

before:

 
{code
:java }

Mon Apr 29 16:46:47 MDT 2019 
[root@wdc1jenkinst01 ~]# netstat -ntap | grep java|grep :8088 
tcp 0 0 10.68.154.134:35287 10.137.14.19:8088 ESTABLISHED 22706/java 
tcp 0 0 10.68.154.134:35291 10.137.14.19:8088 ESTABLISHED 22706/java{code}
{code :java }

{code
:java }

date;netstat -natp|grep java|grep :8088 
Mon Apr 29 16:52:33 MDT 2019 
tcp 102 0 10.68.154.134:11909 10.137.14.79:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36547 10.137.14.19:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:11895 10.137.14.79:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36207 10.137.14.19:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36569 10.137.14.19:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36581 10.137.14.19:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:11925 10.137.14.79:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36575 10.137.14.19:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:11885 10.137.14.79:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36529 10.137.14.19:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36531 10.137.14.19:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:11919 10.137.14.79:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36553 10.137.14.19:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:11915 10.137.14.79:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36519 10.137.14.19:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:11901 10.137.14.79:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36541 10.137.14.19:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36223 10.137.14.19:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36511 10.137.14.19:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36563 10.137.14.19:8088 CLOSE_WAIT 22706/java
{code :java }


Attached is a thread dump taken before/after. Please let me know if anything else is needed to further investigate. I can reproduce at will and this is a test env so we can pretty much do whatever you need to get the relevant data.

rsmith@cloudbees.com (JIRA)

unread,
May 10, 2019, 6:11:03 PM5/10/19
to jenkinsc...@googlegroups.com
Ryan Smith updated an issue
Filing on behalf of john...@viasat.com:

We are using the Splunk Plugin for Jenkins 1.7.1 (latest).

I am seeing some odd behavior that I would like to get some further feedback on please.

In the configure system section there is a option to test the Splunk connection. If this is pressed several times it appears the the two splunkins-worker threads will become locked indefinitely until Jenkins is restarted (not sending any data to Splunk). Looking at netstat there are several sockets in CLOSE_WAIT to the Splunk HTTP input host.

I am able to reproduce this at will within our test environment. It appears the test connection is using a call back to Jenkins for which we have NGINX in front. Once in this state additional attempts to test the connection fail due to a 504 timeout:

Here is the NGINX log of interest:

{code}
2019/04/29 16:01:36 [error] 9051#9051: *831865 upstream prematurely closed connection while reading response header from upstream, client: 10.68.153.30, server: wdc1jenkinst01.hq.corp.viasat.com, request: "POST /descriptorByName/com.splunk.splunkjenkins.SplunkJenkinsInstallation/testHttpInput HTTP/1.1", upstream: "http://127.0.0.1:8080/descriptorByName/com.splunk.splunkjenkins.SplunkJenkinsInstallation/testHttpInput", host: "jenkins.test.viasat.com", referrer: "https://jenkins.test.viasat.com/configure"
{code}

Here is the netstat output as well:

before Before :
 
{code}

Mon Apr 29 16:46:47 MDT 2019 
[root@wdc1jenkinst01 ~]# netstat -ntap | grep java|grep :8088 
tcp 0 0 10.68.154.134:35287 10.137.14.19:8088 ESTABLISHED 22706/java 
tcp 0 0 10.68.154.134:35291 10.137.14.19:8088 ESTABLISHED 22706/java
{code}

After:

{code}

date;netstat -natp|grep java|grep :8088 
Mon Apr 29 16:52:33 MDT 2019 
tcp 102 0 10.68.154.134:11909 10.137.14.79:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36547 10.137.14.19:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:11895 10.137.14.79:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36207 10.137.14.19:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36569 10.137.14.19:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36581 10.137.14.19:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:11925 10.137.14.79:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36575 10.137.14.19:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:11885 10.137.14.79:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36529 10.137.14.19:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36531 10.137.14.19:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:11919 10.137.14.79:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36553 10.137.14.19:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:11915 10.137.14.79:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36519 10.137.14.19:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:11901 10.137.14.79:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36541 10.137.14.19:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36223 10.137.14.19:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36511 10.137.14.19:8088 CLOSE_WAIT 22706/java 
tcp 102 0 10.68.154.134:36563 10.137.14.19:8088 CLOSE_WAIT 22706/java
{code}

Attached is a thread dump taken before/after. Please let me know if anything else is needed to further investigate. I can reproduce at will and this is a test env so we can pretty much do whatever you need to get the relevant data.

rsmith@cloudbees.com (JIRA)

unread,
May 10, 2019, 6:11:05 PM5/10/19
to jenkinsc...@googlegroups.com
Ryan Smith updated an issue
Filing on behalf of john...@viasat.com:

We are using the Splunk Plugin for Jenkins 1.7.1 (latest).

I am seeing some odd behavior that I would like to get some further feedback on please.

In the configure system section there is a option to test the Splunk connection. If this is pressed several times it appears the the two splunkins-worker threads will become locked indefinitely until Jenkins is restarted (not sending any data to Splunk). Looking at netstat there are several sockets in CLOSE_WAIT to the Splunk HTTP input host.

I am able to reproduce this at will within our test environment. It appears the test connection is using a call back to Jenkins for which we have NGINX in front. Once in this state additional attempts to test the connection fail due to a 504 timeout:

Here is the NGINX log of interest:

{code}
2019/04/29 16:01:36 [error] 9051#9051: *831865 upstream prematurely closed connection while reading response header from upstream, client: 10.68.153.30, server: wdc1jenkinst01.hq.corp.viasat.com, request: "POST /descriptorByName/com.splunk.splunkjenkins.SplunkJenkinsInstallation/testHttpInput HTTP/1.1", upstream: "http://127.0.0.1:8080/descriptorByName/com.splunk.splunkjenkins.SplunkJenkinsInstallation/testHttpInput", host: "jenkins.test.viasat.com", referrer: "https://jenkins.test.viasat.com/configure"
{code}

Here is the netstat output as well:

before:

 
{code}
Mon Apr 29 16:46:47 MDT 2019 
[root@wdc1jenkinst01 ~]# netstat -ntap | grep java|grep :8088 
tcp 0 0 10.68.154.134:35287 10.137.14.19:8088 ESTABLISHED 22706/java 
tcp 0 0 10.68.154.134:35291 10.137.14.19:8088 ESTABLISHED 22706/java
{code}
{code}

rsmith@cloudbees.com (JIRA)

unread,
May 10, 2019, 6:46:02 PM5/10/19
to jenkinsc...@googlegroups.com
Ryan Smith updated an issue
Change By: Ryan Smith
Attachment: threads_before.out
Attachment: threads_after.out

xiao.xj@outlook.com (JIRA)

unread,
May 20, 2019, 12:51:03 PM5/20/19
to jenkinsc...@googlegroups.com
Ted updated Bug JENKINS-57410
 

fixed in 1.7.2

Change By: Ted
Status: Open Fixed but Unreleased
Resolution: Fixed

xiao.xj@outlook.com (JIRA)

unread,
May 20, 2019, 12:51:04 PM5/20/19
to jenkinsc...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages