Jira (PUP-5823) Puppet agent with splay on windows begets random interval between runs

21 views
Skip to first unread message

Chris Spence (JIRA)

unread,
Feb 5, 2016, 6:32:03 AM2/5/16
to puppe...@googlegroups.com
Chris Spence created an issue
 
Puppet / Bug PUP-5823
Puppet agent with splay on windows begets random interval between runs
Issue Type: Bug Bug
Assignee: Kylo Ginsberg
Components: Client
Created: 2016/02/05 3:31 AM
Priority: Normal Normal
Reporter: Chris Spence

Further to the thread at https://groups.google.com/forum/?fromgroups#!searchin/puppet-users/splay/puppet-users/56pWWnUslR8/C7iO-nmeCQAJ I'm finally getting round to logging this as an issue:

We are running puppet (3.7.x) on Windows daemonized. We recently turned on splay because reasons. After having done so the interval between daemonised runs, counter to expectation, has become randomised (though we get an averageish run interval of 30 minutes (30.5 recurring)). The linuxes here have identical config and their runs are as regular as a muesli eating vegetarian. Here are some report times and approximate delta of a representative Windows 2012:
Oct 21 2015 - 16:03:27 (14)
Oct 21 2015 - 15:49:10 (28)
Oct 21 2015 - 15:21:14 (33)
Oct 21 2015 - 14:48:23 (32)
Oct 21 2015 - 14:16:47 (23)
Oct 21 2015 - 13:53:35 (36)
Oct 21 2015 - 13:17:57 (27)
Oct 21 2015 - 12:54:02 (29)
Oct 21 2015 - 12:25:20 (53)
Oct 21 2015 - 11:31:48

Add Comment Add Comment
 
This message was sent by Atlassian JIRA (v6.4.12#64027-sha1:e3691cc)
Atlassian logo

Josh Cooper (JIRA)

unread,
Mar 1, 2016, 1:48:03 AM3/1/16
to puppe...@googlegroups.com

Glenn Sarti (JIRA)

unread,
May 5, 2016, 6:24:03 PM5/5/16
to puppe...@googlegroups.com
Glenn Sarti commented on Bug PUP-5823
 
Re: Puppet agent with splay on windows begets random interval between runs

Copying Josh's comment...

In 2.7-3.1, splay behaved the same on windows and *nix in that splay would vary every agent run. The behavior was changed on *nix in 3.2.0 for https://projects.puppetlabs.com/issues/14766. As part of that change, runinterval is calculated relative to the start of each agent run (instead of the end) and splay is only applied to the first run, but not the 2nd, etc. That way if you're running puppet via cron, the first daemonized run will splay, and each run after should be at a constant runinterval seconds. Previously, users would see "peaks" as runs started to overlap.

I'm not sure how well known/understood the "new" *nix behavior is. The splay and splaylimit settings could use updating: https://docs.puppetlabs.com/references/latest/configuration.html#splay, and I know of at least one ticket: https://tickets.puppetlabs.com/browse/PUP-5542.

Assuming the current *nix behavior is correct, I would agree Windows is not correct, and should be ticketed. Windows runs as a service that essentially runs `puppet agent --onetime` every run interval seconds (https://github.com/puppetlabs/puppet/blob/master/ext/windows/service/daemon.rb#L74), and relies on puppet to splay each time. To match *nix behavior, the daemon would need to calculate the initial splay & splaylimit, and tell the agent to ignore those settings, so we don't get double-splay.

This message was sent by Atlassian JIRA (v6.4.13#64028-sha1:b7939e9)
Atlassian logo

Kylo Ginsberg (JIRA)

unread,
May 5, 2016, 8:02:03 PM5/5/16
to puppe...@googlegroups.com
Kylo Ginsberg assigned an issue to Unassigned
 
Change By: Kylo Ginsberg
Assignee: Kylo Ginsberg

Glenn Sarti (JIRA)

unread,
May 24, 2016, 6:58:05 PM5/24/16
to puppe...@googlegroups.com

Rob Reynolds (JIRA)

unread,
May 25, 2016, 1:07:03 PM5/25/16
to puppe...@googlegroups.com

Craig Gomes (JIRA)

unread,
May 27, 2016, 12:10:04 PM5/27/16
to puppe...@googlegroups.com

Craig Gomes (JIRA)

unread,
May 27, 2016, 12:11:05 PM5/27/16
to puppe...@googlegroups.com

Kenaz Kwa (JIRA)

unread,
Aug 29, 2016, 7:47:22 PM8/29/16
to puppe...@googlegroups.com
Kenaz Kwa updated an issue
Change By: Kenaz Kwa
Team: Agent & Platform Support

Josh Cooper (JIRA)

unread,
May 16, 2017, 1:43:03 PM5/16/17
to puppe...@googlegroups.com
Josh Cooper assigned an issue to Unassigned
Change By: Josh Cooper
Assignee: Daniel Lu
This message was sent by Atlassian JIRA (v6.4.14#64029-sha1:ae256fe)
Atlassian logo

Josh Cooper (JIRA)

unread,
May 16, 2017, 1:43:05 PM5/16/17
to puppe...@googlegroups.com

Geoff Nichols (JIRA)

unread,
Apr 16, 2018, 9:57:02 PM4/16/18
to puppe...@googlegroups.com
Geoff Nichols updated an issue
Change By: Geoff Nichols
Labels: service windows
This message was sent by Atlassian JIRA (v7.7.1#77002-sha1:e75ca93)
Atlassian logo

Branan Riley (JIRA)

unread,
May 10, 2018, 8:27:02 PM5/10/18
to puppe...@googlegroups.com
Branan Riley updated an issue
Change By: Branan Riley
Labels: service daemon triaged windows windows-parity

Kevin Parry (JIRA)

unread,
Apr 26, 2019, 8:36:02 AM4/26/19
to puppe...@googlegroups.com
Kevin Parry commented on Bug PUP-5823
 
Re: Puppet agent with splay on windows begets random interval between runs

Hi Has any attempt been made to resolve this issue?

Kevin Reeuwijk (Jira)

unread,
Jun 8, 2020, 1:54:03 PM6/8/20
to puppe...@googlegroups.com
Kevin Reeuwijk updated an issue
 
Change By: Kevin Reeuwijk
Attachment: agent_startup_time.png
This message was sent by Atlassian Jira (v8.5.2#805002-sha1:a66f935)
Atlassian logo

Kevin Reeuwijk (Jira)

unread,
Jun 8, 2020, 1:54:04 PM6/8/20
to puppe...@googlegroups.com

Kevin Reeuwijk (Jira)

unread,
Jun 8, 2020, 1:58:04 PM6/8/20
to puppe...@googlegroups.com
Kevin Reeuwijk commented on Bug PUP-5823
 
Re: Puppet agent with splay on windows begets random interval between runs

This issue causes runtimes to get reported as abnormal for Windows runs, see attached screenshot. When looker deeper at the metrics, the "Agent startup time (sec)" erroneously includes the splay wait time. The actual puppet run took a normal amount of seconds, but in the higher level reporting this is no longer visible.

Especially on Windows, long agent runs are important to spot. Windows patching can cause a long puppet run, so when I first saw these long runtimes, I was worried all my nodes were continuously re-applying patches or something.

With the current state, the agent runtime metric becomes completely useless, and will admins to disable splay. Please look at fixing this issue as we get more Windows customers.

Jarret Lavallee (Jira)

unread,
Jul 8, 2020, 11:54:03 AM7/8/20
to puppe...@googlegroups.com

Jarret Lavallee (Jira)

unread,
Jul 30, 2020, 6:42:03 PM7/30/20
to puppe...@googlegroups.com

Bogdan Irimie (Jira)

unread,
Nov 5, 2020, 3:51:04 AM11/5/20
to puppe...@googlegroups.com

Bogdan Irimie (Jira)

unread,
Nov 5, 2020, 3:52:04 AM11/5/20
to puppe...@googlegroups.com

Josh Cooper (Jira)

unread,
Feb 5, 2021, 4:01:03 PM2/5/21
to puppe...@googlegroups.com
Josh Cooper commented on Bug PUP-5823
 
Re: Puppet agent with splay on windows begets random interval between runs

The issue of splay and startup times is PUP-10860.

This issue is specifically about how Windows agents handle splay different than Linux.

Related this to, is how agents should behave when they receive a 503. Currently Linux agents return to their original schedule (see PUP-9563). So whatever changes are made for this should be done with thundering herds in mind.

Ciprian Badescu (Jira)

unread,
Nov 17, 2021, 7:23:02 AM11/17/21
to puppe...@googlegroups.com
Ciprian Badescu updated an issue
 
Change By: Ciprian Badescu
Sprint: ready for triage
This message was sent by Atlassian Jira (v8.13.2#813002-sha1:c495a97)
Atlassian logo

Ciprian Badescu (Jira)

unread,
Nov 17, 2021, 7:23:03 AM11/17/21
to puppe...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages