Restarting Jenkins master

Bjoern Metzdorf

unread,

Apr 3, 2015, 5:42:54 PM4/3/15

to jenkin...@googlegroups.com

Hi,

I am looking into how to restart a Jenkins master while builds are running on Mesos based slaves and have a couple of questions:

1. It looks like FrameworkID is not saved in state and thus also not provided upon startup, meaning as soon as the scheduler gets stopped, all tasks are stopped as well (FailoverTimeout would also have to be provided). I have developed a small patch to address this, but contributing to upstream will take some time.

I ran into one issue though: If FrameworkID is passed to MesosSchedulerDriver after the FailoverTimeout has been reached, registration will fail with driver status "aborted", because the FrameworkID is invalid (I'm running this on OSX with libmesos-0.22). Was that a recent change in 0.22? Is it sufficient to check for driver status aborted and register without a FrameworkID as a brand new scheduler?

2. When restarting Jenkins with FrameworkID set (and FailoverTimeout previously set) it seems as if the scheduler is able to pick up the Mesos tasks again, but Jenkins does not see the executors again (nor resumes tasks from before the master restart). It looks though as if Jenkins persisted active slaves in config.xml correctly. I still need to test that a bit better, but maybe you guys know if that is supposed to work or not?

Thanks!

Regards,

Bjoern

Geoffroy Jabouley

unread,

Apr 4, 2015, 2:27:36 PM4/4/15

to jenkin...@googlegroups.com

Hello

i am not 100% sure of that, but i think that when a Jenkins master is force restarted the current running jobs are lost and the jobs queue is cleared.

This is a Jenkins common behavior, so not sure Mesos Plugin can do anything about it.

By using <jenkins_url>/safeRestart to restart your instance, Jenkins will:
- wait for current running jobs to complete

- restart

- keep queued jobs so they are launched after the restart

There is also a Cloudbees plugin to retrieve abord build after a force restart or a crash: https://www.cloudbees.com/products/jenkins-enterprise/plugins/restart-aborted-builds-plugin

see http://stackoverflow.com/questions/8072700/jenkins-manual-restart for information about safely restarting Jenkins.

Regards

--
You received this message because you are subscribed to the Google Groups "Jenkins Mesos Plugin" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkins-meso...@googlegroups.com.
To post to this group, send email to jenkin...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkins-mesos/CAHCJKBj_dL2Sttn2riJ-gWe1%2B55kNX%3DEZ4J1EGUK5y0UO0zzYw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Bjoern Metzdorf

unread,

Apr 6, 2015, 8:24:45 PM4/6/15

to jenkin...@googlegroups.com

Hi Geoffroy,

thanks for the pointers. I actually stumbled across similar ones before and couldn't believe that Jenkins jobs don't survive a restart of their master and get aborted. This means that you are at the mercy of your user's job workload for restarting your master. This encourages a sharded multi master setup, but even then could long running builds mess up the restart schedule (in the case of safeRestart).

But at least the executors should come back up after a restart (JNLP based slaves generally support this), even if the actual builds got aborted. It seems as if some plumbing was missing in the mesos-plugin to make that happen reliably though.

Vinod, any idea?

Thanks!

Regards,

Bjoern

To view this discussion on the web visit https://groups.google.com/d/msgid/jenkins-mesos/CAG_uJ%2BDp2-%2BFPmGAieveQKmZxYqAnrnwEccVN%2BLGTXMCd4c2KQ%40mail.gmail.com.

Manivannan

unread,

Apr 9, 2015, 3:02:46 PM4/9/15

to Bjoern Metzdorf, jenkin...@googlegroups.com

Hi Bjorn ,

Please see inline.

Thanks,

Mani

On Tue, Apr 7, 2015 at 5:54 AM, Bjoern Metzdorf <bjo...@metzdorf.de> wrote:

Hi Geoffroy,

thanks for the pointers. I actually stumbled across similar ones before and couldn't believe that Jenkins jobs don't survive a restart of their master and get aborted. This means that you are at the mercy of your user's job workload for restarting your master. This encourages a sharded multi master setup, but even then could long running builds mess up the restart schedule (in the case of safeRestart).

- You're right, Thats a Jenkins behavior. If you want to force a retstart , you could to a HTTP POST to the endpoint http://yourjenkins/restart

But at least the executors should come back up after a restart (JNLP based slaves generally support this), even if the actual builds got aborted. It seems as if some plumbing was missing in the mesos-plugin to make that happen reliably though.

- Even if the slave comes back online, Jenkins build is anyways going to start from right? I mean it won't start from the point where it was aborted. In that case I believe provisioning a new slave is as good as bringing the old one alive.

To view this discussion on the web visit https://groups.google.com/d/msgid/jenkins-mesos/CAHCJKBhi%3DVAfZCVEu7fcnQehLksuhvdTP%3DSmMjrBR_SjxrZcgw%40mail.gmail.com.

Bjoern Metzdorf

unread,

Apr 9, 2015, 4:01:58 PM4/9/15

to jenkins-mesos

Hey Mani,

yes, generally speaking there is not much difference. But there are cases when it makes sense to keep the slave alive, for example if you want to retain your repo across builds so you don't have to checkout again, or if you have a m2 cache per workspace or if spinning up a new docker container takes too much time for your use case, and so on.

I agree, it's not crucial, but would be nice to have.

Regards,

Bjoern

To view this discussion on the web visit https://groups.google.com/d/msgid/jenkins-mesos/CALWBT9NkX1HvTZmEgH-RCZu7UUrUp%2BZ1AaPNXQm1PFKiDsYHQA%40mail.gmail.com.

Manivannan

unread,

Apr 10, 2015, 8:19:04 AM4/10/15

to Bjoern Metzdorf, jenkins-mesos

Thats is true.

But we have had this specific issue where each build was cloning downloading artifacts and we solved in a different way.

Instead of having workspace in local disk, we use a NFS mount. This mount could then be configured in 'Remote FS root' in Mesos plugin configuration.

To view this discussion on the web visit https://groups.google.com/d/msgid/jenkins-mesos/CAHCJKBiYofxNTK30NTusU7pBCOu%2Bk%3Dr3rdG18VyZ1rXLZvM2og%40mail.gmail.com.

Vinod Kone

unread,

Apr 19, 2015, 6:34:45 PM4/19/15

to Manivannan, Bjoern Metzdorf, jenkins-mesos

Checkpointing and restoring FrameworkID will be a great addition to the plugin. Looking forward to that patch!

Having said that, if Jenkins itself doesn't support reconnecting Jenkins slaves with a restarted Jenkins master (on the same machine), it is not of much use. I haven't played with HA aspects of Jenkins, but if it works I don't think the plugin has to do much (if anything) to support it.

-- Vinod

To view this discussion on the web visit https://groups.google.com/d/msgid/jenkins-mesos/CALWBT9O%3Dbg5gs76RDiddOwi1iR2PCw%2BD0KreiiKBTzOU8_9RfA%40mail.gmail.com.

Reply all

Reply to author

Forward