Jira (FACT-3088) EC2 token hangs for minutes on non-AWS machines

159 views
Skip to first unread message

Marc Schweikert (Jira)

unread,
Oct 28, 2021, 8:16:03 AM10/28/21
to puppe...@googlegroups.com
Marc Schweikert created an issue
 
Facter / Bug FACT-3088
EC2 token hangs for minutes on non-AWS machines
Issue Type: Bug Bug
Affects Versions: FACT 3.14.20
Assignee: Unassigned
Components: Facter 3
Created: 2021/10/28 5:15 AM
Priority: Major Major
Reporter: Marc Schweikert

Who found the bug?

Open Source Puppet user

Where was the bug found?

  • System: KVM virtual machines not running in AWS
  • Version: N/A
  • Operating system(s): CentOS 7.9, Rocky Linux 8.4
  • Puppet version: puppet-agent-6.25.0-1.el8.x86_64

What is malfunctioning?

Puppet agent 6.24.0 uses facter 3.14.19, Puppet agent 6.25.0 uses facter 3.14.20

Diff between facter versions: https://github.com/puppetlabs/facter/compare/3.14.19...3.14.20

Facter added the ability to use a IMDSv2 token when querying ec2 metadata.  However, this implementation does not include an explicit timeout and non-AWS KVM virtual machines it will hang for minutes.

For comparison, running "time puppet facts --debug" using puppet-agent 6.24.0 takes 4.6 seconds, and the same command on puppet-agent 6.25.0 takes 131.7 seconds - over 2 minutes to gather facts!

 

 

# puppet-agent 6.24.0
$ puppet facts --debug
 
Debug: Facter: executing command: /opt/puppetlabs/puppet/bin/virt-what
Debug: Facter: kvm
Debug: Facter: completed processing output: closing child pipes.
Debug: Facter: process exited with status code 0.
Debug: Facter: fact "is_virtual" has resolved to true.
Debug: Facter: fact "virtual" has resolved to "kvm".
Debug: Facter: not running under a Azure instance.
Debug: Facter: resolving EC2 facts.
Debug: Facter: querying EC2 instance metadata at http://169.254.169.254/latest/meta-data/.
Debug: Facter: requesting http://169.254.169.254/latest/meta-data/.
Debug: Facter: Trying 169.254.169.254:80...
Debug: Facter: Connection timed out after 600 milliseconds
Debug: Facter: Closing connection 0
Debug: Facter: EC2 facts are unavailable: not running under an EC2 instance or EC2 is not responding in a timely manner.
Debug: Facter: resolving cloud facts.
Debug: Facter: resolving cloud fact
Debug: Facter: resolving GCE facts.
Debug: Facter: not running under a GCE instance.

# puppet-agent 6.25.0
$ puppet facts --debug
 
Debug: Facter: executing command: /opt/puppetlabs/puppet/bin/virt-what
Debug: Facter: kvm
Debug: Facter: completed processing output: closing child pipes.
Debug: Facter: process exited with status code 0.
Debug: Facter: fact "is_virtual" has resolved to true.
Debug: Facter: fact "virtual" has resolved to "kvm".
Debug: Facter: not running under a Azure instance.
Debug: Facter: resolving EC2 facts.
Debug: Facter: requesting IMDSv2 token at http://169.254.169.254/latest/api/token.
Debug: Facter: requesting http://169.254.169.254/latest/api/token.
Debug: Facter: Trying 169.254.169.254:80...
Debug: Facter: connect to 169.254.169.254 port 80 failed: Connection timed out
Debug: Facter: Failed to connect to 169.254.169.254 port 80: Connection timed out
Debug: Facter: Closing connection 0
Debug: Facter: EC2 IMDSv2 endpoint is unavailable
Debug: Facter: querying EC2 instance metadata at http://169.254.169.254/latest/meta-data/.
Debug: Facter: requesting http://169.254.169.254/latest/meta-data/.
Debug: Facter: Trying 169.254.169.254:80...
Debug: Facter: Connection timed out after 600 milliseconds
Debug: Facter: Closing connection 1
Debug: Facter: EC2 facts are unavailable: not running under an EC2 instance or EC2 is not responding in a timely manner.
Debug: Facter: resolving cloud facts.
Debug: Facter: resolving cloud fact
Debug: Facter: resolving GCE facts.
Debug: Facter: not running under a GCE instance.

 

 

What does success look like?

Fix the regression introduced in facter 3.14.20 by using the same 600ms timeout when requesting a token.

How will success be validated?

On a KVM virtual machine outside of AWS running puppet-agent 6.25, executing the command "puppet facts" should take less than 5 seconds.

Should anyone be contacted after this is fixed?

Open Source Puppet 6.x needs to be updated with this fix.

Add Comment Add Comment
 
This message was sent by Atlassian Jira (v8.13.2#813002-sha1:c495a97)
Atlassian logo

Ciprian Badescu (Jira)

unread,
Oct 29, 2021, 4:39:01 AM10/29/21
to puppe...@googlegroups.com
Ciprian Badescu updated an issue
Change By: Ciprian Badescu
h2. Who found the bug?

Open Source Puppet user
h2. Where was the bug found?
* System: KVM virtual machines not running in AWS
* Version: N/A
* Operating system(s): CentOS 7.9, Rocky Linux 8.4
* Puppet version: puppet-agent-6.25.0-1.el8.x86_64

h2. What is malfunctioning?


Puppet agent 6.24.0 uses facter 3.14.19, Puppet agent 6.25.0 uses facter 3.14.20

Diff between facter versions: [https://github.com/puppetlabs/facter/compare/3.14.19...3.14.20]

Facter added the ability to use a IMDSv2 token when querying ec2 metadata.  However, this implementation does not include an explicit timeout and non-AWS KVM virtual machines it will hang for minutes.

For comparison, running "time puppet facts --debug" using puppet-agent 6.24.0 takes 4.6 seconds, and the same command on puppet-agent 6.25.0 takes 131.7 seconds - *over 2 minutes to gather facts!*

 

 
{noformat}

# puppet-agent 6.24.0
$ puppet facts --debug

Debug: Facter: executing command: /opt/puppetlabs/puppet/bin/virt-what
Debug: Facter: kvm
Debug: Facter: completed processing output: closing child pipes.
Debug: Facter: process exited with status code 0.
Debug: Facter: fact "is_virtual" has resolved to true.
Debug: Facter: fact "virtual" has resolved to "kvm".
Debug: Facter: not running under a Azure instance.
Debug: Facter: resolving EC2 facts.
Debug: Facter: querying EC2 instance metadata at http://169.254.169.254/latest/meta-data/.
Debug: Facter: requesting http://169.254.169.254/latest/meta-data/.
Debug: Facter: Trying 169.254.169.254:80...
Debug: Facter: Connection timed out after 600 milliseconds
Debug: Facter: Closing connection 0
Debug: Facter: EC2 facts are unavailable: not running under an EC2 instance or EC2 is not responding in a timely manner.
Debug: Facter: resolving cloud facts.
Debug: Facter: resolving cloud fact
Debug: Facter: resolving GCE facts.
Debug: Facter: not running under a GCE instance.{noformat}
{noformat}

Debug: Facter: not running under a GCE instance.{noformat}
 

 
h2. What does success look like?


Fix the regression introduced in facter 3.14.20 by using the same 600ms timeout when requesting a token.
h2. How will success be validated?


On a KVM virtual machine outside of AWS running puppet-agent 6.25, executing the command "puppet facts" should take less than 5 seconds.
h2. Should anyone be contacted after this is fixed?


Open Source Puppet 6.x needs to be updated with this fix.

Ciprian Badescu (Jira)

unread,
Oct 29, 2021, 4:39:02 AM10/29/21
to puppe...@googlegroups.com

Lukas Zapletal (Jira)

unread,
Nov 3, 2021, 3:25:02 PM11/3/21
to puppe...@googlegroups.com

Lukas Zapletal (Jira)

unread,
Nov 3, 2021, 3:56:04 PM11/3/21
to puppe...@googlegroups.com

So further investigation showed that my router (MikroTik) is configured to route local-link IPv4 addresses to the internet. Responses are not getting back as they got dropped on the third router on the way to nowhere.

This is a combination of misconfiguration and too long TCP timeout in facter. Both needs to be fixed I guess

Ciprian Badescu (Jira)

unread,
Nov 4, 2021, 3:45:03 AM11/4/21
to puppe...@googlegroups.com

Lukas Zapletal, you could add a firewall rule not to ignore packets but answer with TCP RST for local-link IPv4 addresses. The fix will be part of the next release.

Marc Schweikert (Jira)

unread,
Nov 5, 2021, 8:20:02 AM11/5/21
to puppe...@googlegroups.com

There is a known workaround and I have implemented this on my production systems to fix this regression.  You can configure facter to disable the gathering of certain facts by managing facter.conf

 

$ cat /etc/puppetlabs/facter/facter.conf
# This file is managed by Puppet. DO NOT EDIT.
 
facts: {
    blocklist : [ "AZ", "EC2" ],
}

 

 

Reply all
Reply to author
Forward
0 new messages