Jira (PUP-10497) Agent hangs randomly applying configuation Ubuntu 20.04

6 views
Skip to first unread message

Buddy Scharfenberg (Jira)

unread,
May 8, 2020, 4:16:04 PM5/8/20
to puppe...@googlegroups.com
Buddy Scharfenberg created an issue
 
Puppet / Bug PUP-10497
Agent hangs randomly applying configuation Ubuntu 20.04
Issue Type: Bug Bug
Affects Versions: PUP 6.15.0
Assignee: Unassigned
Attachments: puppetrun.log
Components: Catalog Application
Created: 2020/05/08 1:15 PM
Labels: ubuntu
Priority: Normal Normal
Reporter: Buddy Scharfenberg

Puppet Version: 6.15.0
Puppet Server Version: 5.5.1
OS Name/Version: Ubuntu 20.04

Puppet runs seemingly randomly decide to hang after loading facts.
Describe steps to reproduce…
Run a puppet agent run, or allow the puppet service to run it.

Desired Behavior: Puppet agent to run to completion every time.

Actual Behavior: Agent randomly hangs.

While I know I don't have everything resolved in my environment yet I've attached a debug run of my agent, and this one succeeded. When I encounter another that fails I will upload that log. When it hangs you can't kill it with sigterm or sigkill, and when you attach to the process via strace you also can't kill the strace instance.

Add Comment Add Comment
 
This message was sent by Atlassian Jira (v8.5.2#805002-sha1:a66f935)
Atlassian logo

Buddy Scharfenberg (Jira)

unread,
May 11, 2020, 1:33:03 PM5/11/20
to puppe...@googlegroups.com
Buddy Scharfenberg commented on Bug PUP-10497
 
Re: Agent hangs randomly applying configuation Ubuntu 20.04

I was able to find some entries in the kernel log with some call traces. Once a puppet configuration gets stuck the kernel does pick it up and starts dumping info. 

One such entry from kern.log while a puppet pid is stuck. 

May 11 11:06:40 r01blspcy kernel: [11600.653015] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 11 11:06:40 r01blspcy kernel: [11600.653680] puppet D 0 28708 15365 0x00004000
May 11 11:06:40 r01blspcy kernel: [11600.653682] Call Trace:
May 11 11:06:40 r01blspcy kernel: [11600.653689] __schedule+0x2e3/0x740
May 11 11:06:40 r01blspcy kernel: [11600.653692] ? __switch_to_asm+0x40/0x70
May 11 11:06:40 r01blspcy kernel: [11600.653693] ? __switch_to_asm+0x34/0x70
May 11 11:06:40 r01blspcy kernel: [11600.653694] schedule+0x42/0xb0
May 11 11:06:40 r01blspcy kernel: [11600.653695] schedule_timeout+0x203/0x2f0
May 11 11:06:40 r01blspcy kernel: [11600.653697] wait_for_completion+0xb1/0x120
May 11 11:06:40 r01blspcy kernel: [11600.653701] ? wake_up_q+0x70/0x70
May 11 11:06:40 r01blspcy kernel: [11600.653706] __floppy_read_block_0+0x140/0x190 [floppy]
May 11 11:06:40 r01blspcy kernel: [11600.653708] ? floppy_cmos_show+0x30/0x30 [floppy]
May 11 11:06:40 r01blspcy kernel: [11600.653711] floppy_revalidate+0xfc/0x240 [floppy]
May 11 11:06:40 r01blspcy kernel: [11600.653715] check_disk_change+0x62/0x70
May 11 11:06:40 r01blspcy kernel: [11600.653717] floppy_open+0x28e/0x360 [floppy]
May 11 11:06:40 r01blspcy kernel: [11600.653719] ? disk_block_events+0x5c/0x80
May 11 11:06:40 r01blspcy kernel: [11600.653721] __blkdev_get+0xe1/0x550
May 11 11:06:40 r01blspcy kernel: [11600.653722] blkdev_get+0x3d/0x140
May 11 11:06:40 r01blspcy kernel: [11600.653724] ? blkdev_get_by_dev+0x50/0x50
May 11 11:06:40 r01blspcy kernel: [11600.653725] blkdev_open+0x8f/0xa0
May 11 11:06:40 r01blspcy kernel: [11600.653729] do_dentry_open+0x143/0x3a0
May 11 11:06:40 r01blspcy kernel: [11600.653730] vfs_open+0x2d/0x30
May 11 11:06:40 r01blspcy kernel: [11600.653732] do_last+0x194/0x900
May 11 11:06:40 r01blspcy kernel: [11600.653734] path_openat+0x8d/0x290
May 11 11:06:40 r01blspcy kernel: [11600.653736] do_filp_open+0x91/0x100
May 11 11:06:40 r01blspcy kernel: [11600.653738] ? __alloc_fd+0x46/0x150
May 11 11:06:40 r01blspcy kernel: [11600.653739] do_sys_open+0x17e/0x290
May 11 11:06:40 r01blspcy kernel: [11600.653741] __x64_sys_openat+0x20/0x30
May 11 11:06:40 r01blspcy kernel: [11600.653743] do_syscall_64+0x57/0x190
May 11 11:06:40 r01blspcy kernel: [11600.653745] entry_SYSCALL_64_after_hwframe+0x44/0xa9
May 11 11:06:40 r01blspcy kernel: [11600.653747] RIP: 0033:0x7fc08cc3bd94
May 11 11:06:40 r01blspcy kernel: [11600.653752] Code: Bad RIP value.
May 11 11:06:40 r01blspcy kernel: [11600.653753] RSP: 002b:00007ffde03b0010 EFLAGS: 00000293 ORIG_RAX: 0000000000000101
May 11 11:06:40 r01blspcy kernel: [11600.653754] RAX: ffffffffffffffda RBX: 0000561496472700 RCX: 00007fc08cc3bd94
May 11 11:06:40 r01blspcy kernel: [11600.653755] RDX: 0000000000080000 RSI: 00005614972d4200 RDI: 00000000ffffff9c
May 11 11:06:40 r01blspcy kernel: [11600.653755] RBP: 00005614972d4200 R08: 0000000000000000 R09: 00007fc08cd16b80
May 11 11:06:40 r01blspcy kernel: [11600.653756] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000080000
May 11 11:06:40 r01blspcy kernel: [11600.653756] R13: 00007fc08b86d3c4 R14: 00000000deb97429 R15: 0000000000000000
May 11 11:08:40 r01blspcy kernel: [11721.479575] INFO: task puppet:28708 blocked for more than 1087 seconds.
May 11 11:08:40 r01blspcy kernel: [11721.480269] Not tainted 5.4.0-29-generic #33-Ubuntu

Josh Cooper (Jira)

unread,
May 11, 2020, 2:29:03 PM5/11/20
to puppe...@googlegroups.com
Josh Cooper commented on Bug PUP-10497

Buddy Scharfenberg given the low-level calls here, eg. floppy_open and floppy_revalidate, I don't think this is due to puppet. Probably need to take this up with Ubuntu at https://bugs.launchpad.net/ubuntu/

Neil Walkden (Jira)

unread,
May 26, 2020, 4:22:03 AM5/26/20
to puppe...@googlegroups.com
Neil Walkden commented on Bug PUP-10497

Looks like Buddy is using puppet agent 6.15 with puppet server 5.x, is this officially supported?

Josh Cooper (Jira)

unread,
Jan 27, 2021, 12:53:04 PM1/27/21
to puppe...@googlegroups.com
Josh Cooper commented on Bug PUP-10497

Ah yeah, puppetserver5 and agent6 is not a supported configuration, see https://puppet.com/docs/puppet/latest/upgrade_minor.html. We've also been testing and shipping 20.04 for awhile now and haven't seen any other reports, so I'm going to close this.

Reply all
Reply to author
Forward
0 new messages