Jira (PUP-9570) Catalog failure on first run due to pluginsync and environment switch

3 views
Skip to first unread message

Reid Vandewiele (JIRA)

unread,
Mar 19, 2019, 2:56:01 PM3/19/19
to puppe...@googlegroups.com
Reid Vandewiele moved an issue
 
Puppet / Bug PUP-9570
Catalog failure on first run due to pluginsync and environment switch
Change By: Reid Vandewiele
Key: PE PUP - 26130 9570
Project: Puppet Enterprise [Internal]
Add Comment Add Comment
 
This message was sent by Atlassian JIRA (v7.7.1#77002-sha1:e75ca93)
Atlassian logo

Jorie Tappa (JIRA)

unread,
Mar 25, 2019, 1:14:03 PM3/25/19
to puppe...@googlegroups.com
Jorie Tappa commented on Bug PUP-9570
 
Re: Catalog failure on first run due to pluginsync and environment switch

This looks like it has major behavior changes that could break a lot of installations, we'll need to go into further detail assessing implications before making decisions. 

Reid Vandewiele (JIRA)

unread,
Mar 25, 2019, 1:17:04 PM3/25/19
to puppe...@googlegroups.com

Happy to talk more. Fwiw, my understanding of the simplest way to fix this may not actually constitute a major behavior change. Happy to talk through that more, just let me know with who.

Josh Cooper (JIRA)

unread,
Mar 29, 2019, 6:59:03 PM3/29/19
to puppe...@googlegroups.com
Josh Cooper commented on Bug PUP-9570

It sounds like this issue is due to PUP-7198, which we fixed in 5.0. It used to be that the agent would submit facts to puppetserver, which would save them asynchronously to puppetdb. The server would then request facts from puppetdb and use it to compile the catalog, despite the fact that the agent just sent them. Due to the async save behavior, it was possible for puppetserver to use the facts from the previous agent run.

In puppet 5, we changed how facts are passed through to the compiler so that it doesn't need to query puppetdb for facts it just saved. Assuming this request is coming from 2016.4.x, I think we can close this as a dup.

Reid Vandewiele (JIRA)

unread,
Mar 29, 2019, 7:05:02 PM3/29/19
to puppe...@googlegroups.com

This is subtly different. It's not about the catalog, it's about first time run and pluginsync, and how that can make the first run fail.

Michael Hudson (JIRA)

unread,
Mar 29, 2019, 9:18:02 PM3/29/19
to puppe...@googlegroups.com

I also want to mention this issue was seen on our 2018.1.7 (agent version 5.5.10) installation and is easily reproducible.

Josh Cooper (JIRA)

unread,
Apr 1, 2019, 2:07:02 PM4/1/19
to puppe...@googlegroups.com
Josh Cooper commented on Bug PUP-9570

Reid Vandewiele and I discussed this more on Friday. Rather than fail the catalog compile, the manifest code should return an empty catalog (exclude all classes). The agent will detect that the catalog was compiled in a different environment than it requested, switch to the "now correct" environment, and proceed as expected.

Joel Weierman (JIRA)

unread,
May 6, 2019, 2:47:04 PM5/6/19
to puppe...@googlegroups.com

Reid Vandewiele (JIRA)

unread,
May 15, 2019, 1:38:04 PM5/15/19
to puppe...@googlegroups.com
Reid Vandewiele commented on Bug PUP-9570
 
Re: Catalog failure on first run due to pluginsync and environment switch

Josh Cooper I spent a fair amount of time trying to figure out how to build a pluginsync_environment fact with no luck. Any ideas or pointers? As near as I can tell, what environment a pluginsync is actually performed against is not saved or referenced anywhere a normal fact could get at it, nor is the information obviously retrievable from the configurer object (which I don't know if can be accessed by a fact).

Talking to Branan Riley and Nick Lewis last night, there are a number of ideas about how to fix it as a bug, but nothing reasonably accessible to a user.

One idea Nick had (I think it was Nick) was for the configurer to do something more intelligent with a failed compilation error. However, at the moment a failure doesn't convey an environment, so to go that route we'd need to pass more data back (e.g. an optional environment key) in the HTTP error response.

Seems like it would be cleanest though to just bake in a pluginsync environment value to the agent as you suggest above. We would then want to follow it up by short-circuiting catalog compilation if the pluginsync environment doesn't match the compilation environment.

Seems like that's all Puppet work and not user fixable.

Josh Cooper (JIRA)

unread,
May 16, 2019, 9:47:03 AM5/16/19
to puppe...@googlegroups.com
Josh Cooper commented on Bug PUP-9570

Ah sorry, my example was bad. You could get access to the current environment in the fact doing something like:

Facter.add(:pluginsync_environment) do
  setcode do
    env = Puppet.lookup(:current_environment) { nil }
    env ? env.name.to_s : Puppet[:environment]
  end
end

That said I think just adding a builtin pluginsync_environment fact is more direct. I'll put up a PR today.

Josh Cooper (JIRA)

unread,
May 28, 2019, 1:26:05 PM5/28/19
to puppe...@googlegroups.com

Josh Cooper (JIRA)

unread,
May 28, 2019, 1:26:05 PM5/28/19
to puppe...@googlegroups.com

Jorie Tappa (JIRA)

unread,
May 28, 2019, 2:12:03 PM5/28/19
to puppe...@googlegroups.com

Josh Cooper (JIRA)

unread,
May 30, 2019, 5:41:04 PM5/30/19
to puppe...@googlegroups.com
Josh Cooper updated an issue
Change By: Josh Cooper
Acceptance Criteria: 1. Agents sends {{pluginsync_environment}} fact when making a catalog request.

2. It's possible to access this fact in a manifest to determine that the agent pluginsync'ed in a different environment than the one the compiler is currently using to compile a catalog. For example, the manifest can compare the agent's pluginsync environment against the server's environment, as set by the node classifier:

{noformat}
node default {
  $server_env = $server_facts['environment']
  $pluginsync_env = $facts['pluginsync_environment']

  if $server_env != $pluginsync_env {
    warning("Node's environment has not converged yet")
  } else {
    include foo
  }
}
{noformat}

3. New fact added to "agent facts" section of the docs: https://puppet.com/docs/puppet/6.4/lang_facts_and_builtin_vars.html#puppet-agent-facts

Jorie Tappa (JIRA)

unread,
Jun 3, 2019, 12:49:03 PM6/3/19
to puppe...@googlegroups.com
Jorie Tappa updated an issue
Change By: Jorie Tappa
Sprint: Coremunity Grooming Platform Core KANBAN

Jorie Tappa (JIRA)

unread,
Jun 10, 2019, 12:42:03 PM6/10/19
to puppe...@googlegroups.com

Josh Cooper (JIRA)

unread,
Jun 11, 2019, 7:25:03 PM6/11/19
to puppe...@googlegroups.com
Josh Cooper updated an issue
Change By: Josh Cooper
Fix Version/s: PUP 6.5.0
Fix Version/s: PUP 6.6.0

Jorie Tappa (JIRA)

unread,
Jun 12, 2019, 12:33:03 PM6/12/19
to puppe...@googlegroups.com
Jorie Tappa updated an issue
Change By: Jorie Tappa
Fix Version/s: PUP 6.6.0
Fix Version/s: PUP 6.y

Jorie Tappa (JIRA)

unread,
Jun 12, 2019, 1:26:03 PM6/12/19
to puppe...@googlegroups.com
Jorie Tappa updated an issue
Change By: Jorie Tappa
Fix Version/s: PUP 6.y
Fix Version/s: PUP 6.6.0

Gabriel Nagy (JIRA)

unread,
Jul 18, 2019, 3:42:03 AM7/18/19
to puppe...@googlegroups.com
Gabriel Nagy updated an issue
Change By: Gabriel Nagy
Fix Version/s: PUP 6.7.0
Fix Version/s: PUP 6.y

Jorie Tappa (JIRA)

unread,
Jul 23, 2019, 11:48:04 AM7/23/19
to puppe...@googlegroups.com

Jorie Tappa (JIRA)

unread,
Sep 23, 2019, 5:29:04 PM9/23/19
to puppe...@googlegroups.com

Reid Vandewiele (JIRA)

unread,
Sep 23, 2019, 8:48:03 PM9/23/19
to puppe...@googlegroups.com
Reid Vandewiele commented on Bug PUP-9570
 
Re: Catalog failure on first run due to pluginsync and environment switch

Just for fun, I tried to use the example code to create a workable pluginsync_environment fact. It doesn't seem to quite work though.

I created a simple site.pp with this code in it:

if $facts['pluginsync_environment'] != $server_facts['environment'] {
  notify { 'STOP':
    message => "Now what? pluginsync_environment=${facts['pluginsync_environment']}, server_facts.environment=${server_facts['environment']}",
  }
}
else {
  notify { 'PROCEED':
    message => "Safe to compile catalog. pluginsync_environment=${facts['pluginsync_environment']}, server_facts.environment=${server_facts['environment']}",
  }
}

Then, I observed what happened when trying to switch from a starting environment of env_1 to a new target environment of env_2.

[root@pe-xl-core-2 ~]# FACTER_custom_environment=env_1 puppet agent -t
Notice: Local environment: 'production' doesn't match server specified node environment 'env_1', switching agent to 'env_1'.
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Retrieving locales
Info: Loading facts
Info: Caching catalog for pe-xl-core-2.dev36.puppet.vm
Info: Applying configuration version '1569285650'
Notice: Safe to compile catalog. pluginsync_environment=env_1, server_facts.environment=env_1
Notice: /Stage[main]/Main/Notify[PROCEED]/message: defined 'message' as 'Safe to compile catalog. pluginsync_environment=env_1, server_facts.environment=env_1'
Notice: Applied catalog in 0.08 seconds
[root@pe-xl-core-2 ~]# FACTER_custom_environment=env_2 puppet agent -t
Notice: Local environment: 'production' doesn't match server specified node environment 'env_1', switching agent to 'env_1'.
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Retrieving locales
Info: Loading facts
Info: Caching catalog for pe-xl-core-2.dev36.puppet.vm
Notice: Local environment: 'env_1' doesn't match server specified environment 'env_2', restarting agent run with environment 'env_2'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Notice: /File[/opt/puppetlabs/puppet/cache/lib/facter/env_1.rb]/ensure: removed
Notice: /File[/opt/puppetlabs/puppet/cache/lib/facter/env_2.rb]/ensure: defined content as '{md5}d41d8cd98f00b204e9800998ecf8427e'
Info: Retrieving locales
Info: Loading facts
Info: Caching catalog for pe-xl-core-2.dev36.puppet.vm
Info: Applying configuration version '1569285712'
Notice: Now what? pluginsync_environment=env_1, server_facts.environment=env_2
Notice: /Stage[main]/Main/Notify[STOP]/message: defined 'message' as 'Now what? pluginsync_environment=env_1, server_facts.environment=env_2'
Notice: Applied catalog in 0.09 seconds
[root@pe-xl-core-2 ~]# FACTER_custom_environment=env_2 puppet agent -t
Notice: Local environment: 'production' doesn't match server specified node environment 'env_2', switching agent to 'env_2'.
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Retrieving locales
Info: Loading facts
Info: Caching catalog for pe-xl-core-2.dev36.puppet.vm
Info: Applying configuration version '1569285712'
Notice: Safe to compile catalog. pluginsync_environment=env_2, server_facts.environment=env_2
Notice: /Stage[main]/Main/Notify[PROCEED]/message: defined 'message' as 'Safe to compile catalog. pluginsync_environment=env_2, server_facts.environment=env_2'
Notice: Applied catalog in 0.08 seconds
[root@pe-xl-core-2 ~]#

The problem seems to be that on the catalog retry, the fact's value is already set and does not get re-evaluated. So even though the Puppet agent performed a second pluginsync against the correct new environment, this is not reflected in the value of the pluginsync_fact. This avoids the problem of failing to compile a catalog, but doesn't achieve the desired end result of finishing a full, real Puppet run in the new target environment.

Josh Cooper (JIRA)

unread,
Oct 8, 2019, 12:10:08 PM10/8/19
to puppe...@googlegroups.com

Josh Cooper (JIRA)

unread,
Nov 22, 2019, 6:49:04 PM11/22/19
to puppe...@googlegroups.com
Josh Cooper updated an issue
Change By: Josh Cooper
Sprint: Platform Core KANBAN Coremunity Hopper

Josh Cooper (JIRA)

unread,
Dec 6, 2019, 1:07:04 PM12/6/19
to puppe...@googlegroups.com
Josh Cooper commented on Bug PUP-9570
 
Re: Catalog failure on first run due to pluginsync and environment switch

I'm rewriting many of the configurer tests due to PUP-10160, which will make this easier to implement.

Austin Boyd (JIRA)

unread,
Dec 12, 2019, 9:06:23 AM12/12/19
to puppe...@googlegroups.com
Austin Boyd updated an issue
 
Change By: Austin Boyd
Zendesk Ticket IDs: 34157
Zendesk Ticket Count: 1

Reid Vandewiele (Jira)

unread,
May 19, 2020, 12:01:03 PM5/19/20
to puppe...@googlegroups.com
This message was sent by Atlassian Jira (v8.5.2#805002-sha1:a66f935)
Atlassian logo

Josh Cooper (Jira)

unread,
May 22, 2020, 11:16:03 AM5/22/20
to puppe...@googlegroups.com
Josh Cooper commented on Bug PUP-9570

The configurer changes landed, I should be able to get this done for 6.17

Josh Cooper (Jira)

unread,
Jun 5, 2020, 5:46:03 PM6/5/20
to puppe...@googlegroups.com

Josh Cooper (Jira)

unread,
Jun 10, 2020, 3:07:03 PM6/10/20
to puppe...@googlegroups.com
Josh Cooper commented on Bug PUP-9570
 
Re: Catalog failure on first run due to pluginsync and environment switch

I think the suggestion to use a fact doesn't work due to PUP-10308.

zendesk.jira (Jira)

unread,
Jul 16, 2020, 6:34:03 PM7/16/20
to puppe...@googlegroups.com

zendesk.jira (Jira)

unread,
Jul 16, 2020, 6:34:04 PM7/16/20
to puppe...@googlegroups.com
zendesk.jira updated an issue
Change By: zendesk.jira
Zendesk Ticket Count: 1 2
Zendesk Ticket IDs: 34157 ,40072

Josh Cooper (Jira)

unread,
Oct 23, 2020, 8:00:03 PM10/23/20
to puppe...@googlegroups.com

Josh Cooper (Jira)

unread,
Oct 23, 2020, 8:02:03 PM10/23/20
to puppe...@googlegroups.com

zendesk.jira (Jira)

unread,
Nov 12, 2020, 3:54:03 PM11/12/20
to puppe...@googlegroups.com
zendesk.jira updated an issue
Change By: zendesk.jira
Zendesk Ticket Count: 2 3
Zendesk Ticket IDs: 34157,40072 ,41507

Robert August Vincent II (Jira)

unread,
Dec 9, 2020, 8:31:03 AM12/9/20
to puppe...@googlegroups.com
Robert August Vincent II commented on Bug PUP-9570
 
Re: Catalog failure on first run due to pluginsync and environment switch

Any updates on this? We've littered our code with workarounds; this prevents Puppet from reliably applying all configured resources on the first run.

Mike Smith (Jira)

unread,
Jan 29, 2021, 1:00:04 PM1/29/21
to puppe...@googlegroups.com
Mike Smith commented on Bug PUP-9570

ExxonMobil is currently encountering this bug.

Reid Vandewiele (Jira)

unread,
May 21, 2021, 11:24:01 AM5/21/21
to puppe...@googlegroups.com

Ciprian Badescu yep, probably. That hypothesis—a fact not being available in all environments—is almost certainly the cause. It usually is. It would correspond to step 9 in this ticket's description.

The only way to work around the issue today is for the user to play whack-a-mole with facts when problems are noticed, and create piecemeal dependency entanglements between otherwise unrelated environments. This is highly undesirable and creates friction for a platform team trying to provide Puppet-as-a-Service to other teams in an organization.

This is what we need to fix. We don't want users to have to work around issues like this on a case-by-case basis.

This message was sent by Atlassian Jira (v8.13.2#813002-sha1:c495a97)
Atlassian logo

Josh Cooper (Jira)

unread,
May 28, 2021, 2:22:26 PM5/28/21
to puppe...@googlegroups.com
Josh Cooper commented on Bug PUP-9570

I am able to reproduce the issue as follows:

~/work/puppet retry_pluginsync_9570*
❯ tree ~/.puppetlabs/etc/code                              
/home/josh/.puppetlabs/etc/code
└── environments
    ├── dev
    │   ├── manifests
    │   │   └── site.pp
    │   └── modules
    │       └── only
    │           └── lib
    │               └── facter
    │                   └── onlyindev.rb
    └── production
 
8 directories, 2 files
 
~/work/puppet retry_pluginsync_9570*
❯ cat ~/.puppetlabs/etc/code/environments/dev/manifests/site.pp 
notify { "$onlyindev": }
 
~/work/puppet retry_pluginsync_9570*
❯ cat ~/.puppetlabs/etc/code/environments/dev/modules/only/lib/facter/onlyindev.rb 
Facter.add(:onlyindev) do
  setcode { true }
end
 
~/work/puppet retry_pluginsync_9570*
❯ cat ~/.puppetlabs/etc/puppet/puppet.conf 
[main]
certname = localhost
cadir = /home/josh/.puppetlabs/etc/puppetserver/ca
 
[agent]
server = localhost
 
[server]
node_terminus = exec
external_nodes = /home/josh/.puppetlabs/etc/puppet/enc.sh
 
~/work/puppet retry_pluginsync_9570*
❯ cat ~/.puppetlabs/etc/puppet/enc.sh     
#!/bin/sh
 
if [ -f /tmp/node ];
then
    ENV="dev"
else
    ENV="production"
    touch /tmp/node
fi
 
cat <<EOF
---
environment: ${ENV}
EOF

 
Before the change:

~/work/puppet retry_pluginsync_9570*
❯ rm -rf ~/.puppetlabs/opt/puppet/cache && rm -f /tmp/node && bundle exec puppet agent -t
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Empty string title at 0. Title strings must have a length greater than zero. (file: /home/josh/.puppetlabs/etc/code/environments/dev/manifests/site.pp, line: 1, column: 10) on node localhost
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run

With the change:

~/work/puppet retry_pluginsync_9570*
❯ rm -rf ~/.puppetlabs/opt/puppet/cache && rm -f /tmp/node && bundle exec puppet agent -t
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Notice: Local environment: 'production' doesn't match server specified environment 'dev', restarting agent run with environment 'dev'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Notice: /File[/home/josh/.puppetlabs/opt/puppet/cache/lib/facter]/ensure: created
Notice: /File[/home/josh/.puppetlabs/opt/puppet/cache/lib/facter/onlyindev.rb]/ensure: defined content as '{sha256}cd0179f1853caa082fd3ec17793e4ab915b9e3860d1d4e729cc62fc0e15e3021'
Info: Loading facts
Info: Caching catalog for localhost
Info: Applying configuration version '1622150077'
Notice: true
Notice: /Stage[main]/Main/Notify[true]/message: defined 'message' as 'true'
Info: Creating state file /home/josh/.puppetlabs/opt/puppet/cache/state/state.yaml
Notice: Applied catalog in 0.01 seconds

 

Jenna McCarthy (Jira)

unread,
Jun 25, 2021, 5:55:02 PM6/25/21
to puppe...@googlegroups.com
Jenna McCarthy updated an issue
 
Change By: Jenna McCarthy
Labels: 001G000001p3dQmIAI jira_escalated

Josh Cooper (Jira)

unread,
Jun 28, 2021, 1:33:01 PM6/28/21
to puppe...@googlegroups.com
Josh Cooper updated an issue
Change By: Josh Cooper
Labels: 001G000001p3dQmIAI jira_escalated

Josh Cooper (Jira)

unread,
Jun 29, 2021, 1:52:02 PM6/29/21
to puppe...@googlegroups.com
Josh Cooper updated an issue
Change By: Josh Cooper
Sprint: Coremunity Hopper Platform Core KANBAN

Josh Cooper (Jira)

unread,
Jul 1, 2021, 12:37:02 PM7/1/21
to puppe...@googlegroups.com
Josh Cooper updated an issue
Change By: Josh Cooper
Fix Version/s: PUP 6.y
Fix Version/s: PUP 7.9.0
Fix Version/s: PUP 6.24.0

Josh Cooper (Jira)

unread,
Jul 7, 2021, 7:24:02 PM7/7/21
to puppe...@googlegroups.com
Josh Cooper updated an issue
Change By: Josh Cooper
Fix Version/s: PUP 7.9.0
Fix Version/s: PUP 6.24.0

Jenna McCarthy (Jira)

unread,
Jul 8, 2021, 11:29:02 AM7/8/21
to puppe...@googlegroups.com
Jenna McCarthy updated an issue
Change By: Jenna McCarthy
Labels: 001G000001p3dQmIAI jira_escalated

Josh Cooper (Jira)

unread,
Jul 29, 2021, 9:22:03 PM7/29/21
to puppe...@googlegroups.com
Josh Cooper updated an issue
Change By: Josh Cooper
Fix Version/s: PUP 7.10.0
Fix Version/s: PUP 6.25.0

Josh Cooper (Jira)

unread,
Jul 29, 2021, 9:25:04 PM7/29/21
to puppe...@googlegroups.com
Josh Cooper updated an issue
Change By: Josh Cooper
Release Notes: Bug Fix
Release Notes Summary: Previously, an agent would fail its run if it switched to a new environment whose manifests relied on a fact that only existed in the new environment. Now the agent will be redirected to the server-specified environment and the run will continue using that environment.

Josh Cooper (Jira)

unread,
Aug 3, 2021, 6:14:04 PM8/3/21
to puppe...@googlegroups.com
Josh Cooper updated an issue
Change By: Josh Cooper
Sprint: Platform Core KANBAN , Coremunity Kanban

Josh Cooper (Jira)

unread,
Aug 4, 2021, 1:48:02 PM8/4/21
to puppe...@googlegroups.com

Claire Cadman (Jira)

unread,
Aug 11, 2021, 9:01:04 AM8/11/21
to puppe...@googlegroups.com
Claire Cadman updated an issue
 
Change By: Claire Cadman
Labels: 001G000001p3dQmIAI doc-reviewed jira_escalated
Reply all
Reply to author
Forward
0 new messages