Jira (FACT-3166) Long unexplained pause while resolving facts

24 views
Skip to first unread message

Siebrand Mazeland (Jira)

unread,
Nov 14, 2022, 8:33:02 AM11/14/22
to puppe...@googlegroups.com
Siebrand Mazeland created an issue
 
Facter / Bug FACT-3166
Long unexplained pause while resolving facts
Issue Type: Bug Bug
Affects Versions: FACT 4.2.13
Assignee: Unassigned
Components: Facter 4
Created: 2022/11/14 5:32 AM
Priority: Normal Normal
Reporter: Siebrand Mazeland

Puppet Version: 7.20.0
Facter version: 4.2.13
Puppet Server Version: 7.9.2
OS Name/Version: Amazon Linux 2

I'm unfamiliar with reporting here; please let me know where I can improve

In certain environments, I see unexplained long run times for a puppet run. When using debug output on this, I see the delay occur, but there is nothing being logged, and there is no clear reason. I assume <something> is timing out. I'd like to be able to provide the right information, but not sure how.

This is what I see:

facter --puppet -d
 
[events up to here follow each other quickly]
[2022-11-14 15:13:35.556906 ] DEBUG Facter::FactManager - fact "ec2_metadata" has resolved to: {"ami-id"=>"ami-[REDACTED]", "ami-launch-index"=>"0", "ami-manifest-path"=>"(unknown)", "block-device-mapping"=>{"ami"=>"xvda", "root"=>"/dev/xvda"}, "events"=>{"maintenance"=>{"history"=>"[]", "scheduled"=>"[]"}}, "hostname"=>"[REDACTED].eu-north-1.compute.internal", "iam"=>{"info"=>"{\n  \"Code\" : \"Success\",\n  \"LastUpdated\" : \"2022-11-14T12:54:05Z\",\n  \"InstanceProfileArn\" : \"arn:aws:iam::[REDACTED]:instance-profile/[REDACTED]\",\n  \"InstanceProfileId\" : \"[REDACTED]\"\n}"}, "identity-credentials"=>{"ec2"=>{"info"=>"{\n  \"Code\" : \"Success\",\n  \"LastUpdated\" : \"2022-11-14T12:53:26Z\",\n  \"AccountId\" : \"REDACTED\"\n}"}}, "instance-action"=>"none", "instance-id"=>"i-REDACTED", "instance-life-cycle"=>"on-demand", "instance-type"=>"r5.xlarge", "local-hostname"=>"REDACTED.eu-north-1.compute.internal", "local-ipv4"=>"REDACTED", "mac"=>"REDACTED", "metrics"=>{"vhostmd"=>"<?xml version=\"1.0\" encoding=\"UTF-8\"?>"}, "network"=>{"interfaces"=>{"macs"=>{"REDACTED"=>{"device-number"=>"0", "interface-id"=>"eni-REDACTED", "local-hostname"=>"REDACTED.eu-north-1.compute.internal", "local-ipv4s"=>"REDACTED", "mac"=>"REDACTED", "owner-id"=>"REDACTED", "security-group-ids"=>"sg-REDACTED\nsg-REDACTED", "security-groups"=>"REDACTED-REDACTED\nREDACTED-REDACTED", "subnet-id"=>"subnet-REDACTED", "subnet-ipv4-cidr-block"=>"REDACTED", "vpc-id"=>"vpc-REDACTED", "vpc-ipv4-cidr-block"=>"REDACTED", "vpc-ipv4-cidr-blocks"=>"REDACTED"}}}}, "placement"=>{"availability-zone"=>"eu-north-1b", "availability-zone-id"=>"eun1-az2", "region"=>"eu-north-1"}, "profile"=>"default-hvm", "public-keys"=>{"0"=>{"openssh-key"=>"ssh-rsa REDACTED"}}, "reservation-id"=>"r-REDACTED", "security-groups"=>"REDACTED-REDACTED-1UKQVTNZDI0ZU\nREDACTED-REDACTED-REDACTED", "services"=>{"domain"=>"amazonaws.com", "partition"=>"aws"}, "system"=>"nitro", "tags"=>{"instance"=>{"Name"=>"REDACTED", "REDACTED"=>"REDACTED", "REDACTED"=>"REDACTED", "REDACTED"=>"REDACTED", "REDACTED"=>"REDACTED", "REDACTED"=>"sg-REDACTED", "REDACTED"=>"REDACTED", "application"=>"REDACTED", "aws:cloudformation:logical-id"=>"REDACTED", "aws:cloudformation:stack-id"=>"arn:aws:cloudformation:eu-north-1:REDACTED:stack/REDACTED", "aws:cloudformation:stack-name"=>"REDACTED", "aws:ec2launchtemplate:id"=>"lt-REDACTED", "aws:ec2launchtemplate:version"=>"1", "contact"=>"REDACTED", "costcenter"=>"REDACTED", "environment"=>"REDACTED", "personaldata"=>"false", "service:autoshutdown"=>"true", "service:autostartup"=>"true"}}}
[2022-11-14 15:14:18.784818 ] DEBUG Facter::Core::Execution::Posix - Executing command: /bin/cat /proc/swaps
[events from here follow each other quickly]

The total run takes 49 seconds; the pause takes over 43 seconds.

Is there any way I can zoom in more on what causes the pause?

Add Comment Add Comment
 
This message was sent by Atlassian Jira (v8.20.11#820011-sha1:0629dd8)
Atlassian logo

Siebrand Mazeland (Jira)

unread,
Nov 14, 2022, 8:42:02 AM11/14/22
to puppe...@googlegroups.com
Siebrand Mazeland updated an issue
Change By: Siebrand Mazeland
*Puppet Version: 7.20.0*
*Facter version: 4.2.13*
*Puppet Server Version: 7.9.2*
*OS Name/Version: Amazon Linux 2*


I'm unfamiliar with reporting here; please let me know where I can improve

In certain environments, I see unexplained long run times for a puppet run. When using debug output on this, I see the delay occur, but there is nothing being logged, and there is no clear reason. I assume <something> is timing out. I'd like to be able to provide the right information, but not sure how.

This is what I see:


{noformat}

facter --puppet -d

[events up to here follow each other quickly]
[2022-11-14 15:13:35.556906 ] DEBUG Facter::FactManager - fact "ec2_metadata" has resolved to: {"ami-id"=>"ami-[REDACTED]", "ami-launch-index"=>"0", "ami-manifest-path"=>"(unknown)", "block-device-mapping"=>{"ami"=>"xvda", "root"=>"/dev/xvda"}, "events"=>{"maintenance"=>{"history"=>"[]", "scheduled"=>"[]"}}, "hostname"=>"[REDACTED].eu-north-1.compute.internal", "iam"=>{"info"=>"{\n  \"Code\" : \"Success\",\n  \"LastUpdated\" : \"2022-11-14T12:54:05Z\",\n  \"InstanceProfileArn\" : \"arn:aws:iam::[REDACTED]:instance-profile/[REDACTED]\",\n  \"InstanceProfileId\" : \"[REDACTED]\"\n}"}, "identity-credentials"=>{"ec2"=>{"info"=>"{\n  \"Code\" : \"Success\",\n  \"LastUpdated\" : \"2022-11-14T12:53:26Z\",\n  \"AccountId\" : \"REDACTED\"\n}"}}, "instance-action"=>"none", "instance-id"=>"i-REDACTED", "instance-life-cycle"=>"on-demand", "instance-type"=>"r5.xlarge", "local-hostname"=>"REDACTED.eu-north-1.compute.internal", "local-ipv4"=>"REDACTED", "mac"=>"REDACTED", "metrics"=>{"vhostmd"=>"<?xml version=\"1.0\" encoding=\"UTF-8\"?>"}, "network"=>{"interfaces"=>{"macs"=>{"REDACTED"=>{"device-number"=>"0", "interface-id"=>"eni-REDACTED", "local-hostname"=>"REDACTED.eu-north-1.compute.internal", "local-ipv4s"=>"REDACTED", "mac"=>"REDACTED", "owner-id"=>"REDACTED", "security-group-ids"=>"sg-REDACTED\nsg-REDACTED", "security-groups"=>"REDACTED-REDACTED\nREDACTED-REDACTED", "subnet-id"=>"subnet-REDACTED", "subnet-ipv4-cidr-block"=>"REDACTED", "vpc-id"=>"vpc-REDACTED", "vpc-ipv4-cidr-block"=>"REDACTED", "vpc-ipv4-cidr-blocks"=>"REDACTED"}}}}, "placement"=>{"availability-zone"=>"eu-north-1b", "availability-zone-id"=>"eun1-az2", "region"=>"eu-north-1"}, "profile"=>"default-hvm", "public-keys"=>{"0"=>{"openssh-key"=>"ssh-rsa REDACTED"}}, "reservation-id"=>"r-REDACTED", "security-groups"=>"REDACTED-REDACTED-1UKQVTNZDI0ZU\nREDACTED-REDACTED-REDACTED", "services"=>{"domain"=>"amazonaws.com", "partition"=>"aws"}, "system"=>"nitro", "tags"=>{"instance"=>{"Name"=>"REDACTED", "REDACTED"=>"REDACTED", "REDACTED"=>"REDACTED", "REDACTED"=>"REDACTED", "REDACTED"=>"REDACTED", "REDACTED"=>"sg-REDACTED", "REDACTED"=>"REDACTED", "application"=>"REDACTED", "aws:cloudformation:logical-id"=>"REDACTED", "aws:cloudformation:stack-id"=>"arn:aws:cloudformation:eu-north-1:REDACTED:stack/REDACTED", "aws:cloudformation:stack-name"=>"REDACTED", "aws:ec2launchtemplate:id"=>"lt-REDACTED", "aws:ec2launchtemplate:version"=>"1", "contact"=>"REDACTED", "costcenter"=>"REDACTED", "environment"=>"REDACTED", "personaldata"=>"false", "service:autoshutdown"=>"true", "service:autostartup"=>"true"}}}
[2022-11-14 15:14:18.784818 ] DEBUG Facter::Core::Execution::Posix - Executing command: /bin/cat /proc/swaps
[events from here follow each other quickly]
{noformat}


The total run takes 49 seconds; the pause takes over 43 seconds.

Is there any way I can zoom in more on what causes the pause?


Another way of looking at things:

{noformat}
# time facter --puppet -d --trace --verbose --timing | ts
[2022-11-14 15:39:42.288673 ] INFO Facter - executed with command line: --puppet -d --trace --verbose --timing
Nov 14 15:39:42 fact 'ec2_metadata', took: (0.100138) seconds
Nov 14 15:39:42 fact 'ec2_metadata', took: (0.000070) seconds
Nov 14 15:39:42 fact 'ec2_metadata', took: (0.000074) seconds
Nov 14 15:39:42 fact 'ec2_metadata', took: (0.000063) seconds
Nov 14 15:39:42 fact 'ec2_metadata', took: (0.000070) seconds
Nov 14 15:39:42 fact 'aio_agent_version', took: (0.000083) seconds
Nov 14 15:39:42 fact 'operatingsystem', took: (0.000188) seconds
Nov 14 15:39:42 fact 'osfamily', took: (0.000010) seconds
Nov 14 15:39:42 fact 'operatingsystem', took: (0.000019) seconds
Nov 14 15:39:42 fact 'osfamily', took: (0.000009) seconds
Nov 14 15:39:42 fact 'operatingsystemrelease', took: (0.000091) seconds
Nov 14 15:39:42 fact 'operatingsystem', took: (0.000010) seconds
Nov 14 15:39:42 fact 'operatingsystemmajrelease', took: (0.000017) seconds
Nov 14 15:39:42 fact 'operatingsystem', took: (0.000009) seconds
Nov 14 15:39:42 fact 'ec2_metadata', took: (0.000090) seconds
Nov 14 15:39:42 fact 'ec2_metadata', took: (0.000086) seconds
Nov 14 15:39:43 fact 'ec2_metadata', took: (0.000137) seconds
Nov 14 15:40:28 fact 'os.release', took: (0.000049) seconds
Nov 14 15:40:28 fact 'operatingsystemmajrelease', took: (0.000010) seconds
[..]
{noformat}

Siebrand Mazeland (Jira)

unread,
Nov 14, 2022, 12:27:02 PM11/14/22
to puppe...@googlegroups.com
Siebrand Mazeland commented on Bug FACT-3166
 
Re: Long unexplained pause while resolving facts

I looked closer into the commandline switches for facter, and it appeared that "" wasn't documented in the man page, but was listed in the help of the command. So I added that, and threw in some more, and I ended up with the same unexplained pause, but one extra line. That line if displayed before the pause.

[2022-11-14 19:20:32.500550 ] DEBUG Facter::FactManager - fact "ec2_metadata" has resolved to: {"ami-id"=>"ami-REDACTED[..]}
fact 'ec2_metadata', took: (0.000160) seconds
[2022-11-14 19:21:19.605305 ] DEBUG Facter::Core::Execution::Posix - Executing command: /bin/cat /proc/swaps

Morgan Rhodes (Jira)

unread,
Nov 15, 2022, 4:07:02 PM11/15/22
to puppe...@googlegroups.com
Morgan Rhodes updated an issue
 
Change By: Morgan Rhodes
Team: Comply EMEA Phoenix

Morgan Rhodes (Jira)

unread,
Nov 15, 2022, 4:11:03 PM11/15/22
to puppe...@googlegroups.com
Morgan Rhodes commented on Bug FACT-3166
 
Re: Long unexplained pause while resolving facts

Siebrand Mazeland it may be useful to run facter through strace to look for any network calls, probably connect.

Siebrand Mazeland (Jira)

unread,
Nov 16, 2022, 11:25:03 AM11/16/22
to puppe...@googlegroups.com

Thanks for the help so far, Morgan Rhodes. I had never used strace before.

I used it like this:

strace -r facter --debug --puppet --json --http-debug --sequential --timing --no-color 2>&1 | ts > strace-facter.txt

When looking at the output, it showed that 60k events were logged in the first 6 seconds. Then 263 in the following 41 seconds. After that 17k events.

I have attached the 263 log lines, with a little context around it, which is exactly what's in the gap detailed in my two previous comments. Is that of any use?

Siebrand Mazeland (Jira)

unread,
Nov 16, 2022, 11:25:03 AM11/16/22
to puppe...@googlegroups.com
Siebrand Mazeland updated an issue
 
Change By: Siebrand Mazeland
Attachment: strace-facter.partial.txt

Siebrand Mazeland (Jira)

unread,
Nov 22, 2022, 6:24:03 AM11/22/22
to puppe...@googlegroups.com
 
Re: Long unexplained pause while resolving facts

Morgan Rhodes: Is the added information enough to see what causes the delays?

Siebrand Mazeland (Jira)

unread,
Nov 30, 2022, 10:03:02 AM11/30/22
to puppe...@googlegroups.com

I updated the status to "open". Maybe that makes the issue surface somewhere.

Morgan Rhodes (Jira)

unread,
Dec 6, 2022, 4:27:03 PM12/6/22
to puppe...@googlegroups.com

The team will take a look and see if anything stands out.

Morgan Rhodes (Jira)

unread,
Dec 6, 2022, 4:27:03 PM12/6/22
to puppe...@googlegroups.com
Morgan Rhodes updated an issue
 
Change By: Morgan Rhodes
Labels: needs-validation

Michael Hashizume (Jira)

unread,
Dec 14, 2022, 5:59:02 PM12/14/22
to puppe...@googlegroups.com
Michael Hashizume commented on Bug FACT-3166
 
Re: Long unexplained pause while resolving facts

Hi Siebrand Mazeland

The team took a look at this today. We were unable to reproduce the issue and wanted to get a few more details on your end:

  • Would you be able to send us the full strace output?
  • Can you tell us a bit more about the environments you're seeing this in and how it may differ from the environments where this is not happening?
  • Could you try blocking the ec2-related facts and see if that decreases the run time?

Thanks!

Siebrand Mazeland (Jira)

unread,
Dec 15, 2022, 8:16:02 AM12/15/22
to puppe...@googlegroups.com

> Would you be able to send us the full strace output?

Yes, I can do that, but I do not want to place it in publicly accessible place. Can you please contact me at siebrand at kitano dot nl so we can arrange for a less public exchange of the strace output?

Reply all
Reply to author
Forward
0 new messages