Hi,
I'm seeing a range of instability on the Ubuntu EC2 build agents on ci.jenkins.io over the past week, including tests randomly taking forever and timing out, and instances randomly being terminated mid-build.
This didn't used to happen previously when I think the builds were running in Azure.
Any ideas?
Chris
--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/cd9d4a3d-fdfb-460a-805b-7bfb12d47d2d%40www.fastmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/CAO49JtGSCj8ZzE1OYJZJRB37S9Kz-6r9eidccN9GFRFg9adphQ%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/5ba67aca-768c-4b63-a80c-79fb00d5eadd%40www.fastmail.com.
Hi Mark,Thanks, that explains it. Off-hand thought - can AWS PrivateLink - or the Azure equivalent - do anything to improve connection reliability in our multi cloud setup? (I know these services are more intended for on prem to cloud links, but maybe it can help here.)
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/5ba67aca-768c-4b63-a80c-79fb00d5eadd%40www.fastmail.com.
On Thu, May 7, 2020 at 12:18 PM Chris Kilding <ch...@chriskilding.com> wrote:Hi Mark,Thanks, that explains it. Off-hand thought - can AWS PrivateLink - or the Azure equivalent - do anything to improve connection reliability in our multi cloud setup? (I know these services are more intended for on prem to cloud links, but maybe it can help here.)I'm hesitant to speculate on solutions until we better understand the problem. I run 11 agents on AWS spot instances that are connected to a master at my house. There are times when those 11 AWS computers are also connected to a Jenkins master on a different computer. I've run that configuration for a year or more and have not had more than 3 or 4 connectivity issues.I am reasonably confident that the network between the AWS data center and the Azure data center is no worse than the network between the master inside my house and the 11 spot instances running on AWS. However, that's just speculation on my part.
In the meantime my workaround is to push trivial commits when I need it to try again. That suffices during PR development, but I can’t do that in master for the fun of it.ChrisOn Thu, 7 May 2020, at 4:12 PM, Mark Waite wrote:
On Thu, May 7, 2020 at 8:27 AM Chris Kilding <ch...@chriskilding.com> wrote:Hi,I'm seeing a range of instability on the Ubuntu EC2 build agents on ci.jenkins.io over the past week, including tests randomly taking forever and timing out, and instances randomly being terminated mid-build.Yes, plenty of ideas, but no solutions yet. Sorry for the unreliability of those build agents.We were hosting all our build agents in Azure when Microsoft was sponsoring the Jenkins project infrastructure. That sponsorship expired late in 2019 so that the Jenkins project was paying the full price of its infrastructure hosted on Azure.AWS became a sponsor in early 2020. In order to immediately reduce costs, we've added AWS EC2 agents using the EC2 plugin. The Jenkins master and the Jenkins containerized agents ("ACI") continue to run on Azure for now, while the Ubuntu agents are now provisioned by the EC2 plugin.Unfortunately, the agents provisioned by the EC2 plugin randomly lose their connection to the Jenkins master on Azure. That loss of connection disrupts the job that is running on the agent and causes the types of annoying behaviors that you have seen.We're currently putting first focus on completing the core release automation project so that we can deliver Jenkins weekly, LTS, and security releases without requiring Kohsuke perform the release. Jenkins weekly releases 2.232, 2.233, and 2.234 were delivered from core release automation without requiring Kohsuke take any action. We're working on the automation for long term support releases and for security releases.Intense focus on the EC2 agent connection failures will have to wait until we either have more people to assist with infrastructure or we have completed the core release automation project. Those who would like to assist with Jenkins infrastructure are welcome to join the weekly infrastructure meetings and to chat on the IRC channel #jenkins-infra.Thanks,Mark Waite
This didn't used to happen previously when I think the builds were running in Azure.Any ideas?Chris--You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkin...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/cd9d4a3d-fdfb-460a-805b-7bfb12d47d2d%40www.fastmail.com.
--You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkin...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/CAO49JtGSCj8ZzE1OYJZJRB37S9Kz-6r9eidccN9GFRFg9adphQ%40mail.gmail.com.
--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkin...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-de...@googlegroups.com.To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/936050d1-9b72-4437-a259-40f6c92b8ea7%40googlegroups.com.
I can't see into the instance logs as a ci.jenkins.io user, so I can't say if this is the cause or not. Maybe the Jenkins infra people can chime in?
I can say that most of the time, only a random handful of tests fail with timeouts while the rest pass. Though occasionally the whole suite can fail at the integration-test phase due to a timeout.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/4ab86182-56f8-4125-a231-93bb820980c4%40www.fastmail.com.