Jenkins jobs not completing

46 views
Skip to first unread message

rssouthw

unread,
Dec 20, 2021, 4:27:50 PM12/20/21
to Jenkins Users

We use Jenkins to check that various things are running on our infrastructure hosts.
This is typically done with something like:

     ssh hostname /the/script/to/check

If the check exits with a zero status, Jenkins goes on to mark the build as good.

Sometime in the last couple months, the script and the ssh exits with a zero status
but Jenkins hangs on and does not mark the job as complete for a long time, like close to 30 minutes.

If we invoke these manually, there is no issue.   The check runs very quickly (under a couple seconds).

We've updated Jenkins several times (now on 2.325) in an attempt to fix the issue.

Any ideas where to look?

Thanks in advance.


Jeremy Mordkoff

unread,
Dec 21, 2021, 10:56:12 AM12/21/21
to Jenkins Users
30 minutes sounds like a TCP timeout. Are there any firewalls or NAT devices in the path? Sometimes they close the connection when the first FIN is sent but before the last FIN-ACK and that can cause SSH to hang up. 

A second possibility is asymetric routes where the replies come in on a different interface then the outgoing packets.

In either case, tcpdump can pinpoint the issue. 

rssouthw

unread,
Dec 22, 2021, 1:29:59 PM12/22/21
to Jenkins Users
Nothing like a TCP timeout I think.   I forgot to include that during my manual check, I do the same exact ssh /path command,  running as the same user as our Jenkins process, from the same host that we have Jenkins running on.   It always completes very quickly. 

lf you click down into the console log of the job itself, you can see it completes (several of our scripts say "done" or "exiting" at the end).
At the project page level shows it's complete, but the very top page shows Jenkins still thinking it's running.   The job can't be cancelled with the "X".

But yes, it has something to do with the job being on the other end of an ssh but it's not clear what.      If the check is on the same host, Jenkins works great.  Again, we've been using this methodology for years, and it's worked great.   It's only been since, oh, 2-3 months the issue has shown up.

Jeremy Mordkoff

unread,
Dec 22, 2021, 1:56:54 PM12/22/21
to jenkins...@googlegroups.com
I'm assuming then that there's no network activity at the end of the 30 minute window. What is the master doing? Perhaps cleaning up old builds? How many do you keep? FYI...I normally keep about 100 per pipeline & branch. 

Jeremy Mordkoff
Director, Engineering Services

Headquarters: 5700 Tennyson Parkway, Plano, Texas, USA
Email: Jeremy....@DZSi.com





From: jenkins...@googlegroups.com <jenkins...@googlegroups.com> on behalf of rssouthw <rogersout...@gmail.com>
Sent: Wednesday, December 22, 2021 1:29 PM
To: Jenkins Users <jenkins...@googlegroups.com>
Subject: Re: Jenkins jobs not completing
 
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.

--
You received this message because you are subscribed to a topic in the Google Groups "Jenkins Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/jenkinsci-users/01yoX_YnaGk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to jenkinsci-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/ffe78b3b-fb83-4fb6-9108-ff825b10abb5n%40googlegroups.com.


Disclaimer

The information contained in this communication from the sender is confidential. It is intended solely for use by the recipient and others authorized to receive it. If you are not the recipient, you are hereby notified that any disclosure, copying, distribution or taking action in relation of the contents of this information is strictly prohibited and may be unlawful.

rssouthw

unread,
Dec 22, 2021, 5:12:56 PM12/22/21
to Jenkins Users
Typically, we keep only about the last 10 or so for these "check" jobs. 
We also have real builds on this server, and those are probably not configured to have a max.
Let me grep thru the config.xml files (faster than looking thru the UI) to see what kind of distribution I have.
Reply all
Reply to author
Forward
0 new messages