2019-Jun-06 devel branch and random hangs using winrm?

128 views
Skip to first unread message

Bob Tanner

unread,
Jun 6, 2019, 10:25:30 PM6/6/19
to Ansible Project

Upgraded ansible to 


$ ansible --version

ansible 2.9.0.dev0

  config file = /Users/tanner/projects/ansible.git/playbooks.git/celadonsystems.com/ansible.cfg

  configured module search path = [u'/Users/tanner/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']

  ansible python module location = /Users/tanner/projects/ansible.git/ansible/lib/ansible

  executable location = /Users/tanner/projects/ansible.git/ansible/bin/ansible

  python version = 2.7.10 (default, Feb 22 2019, 21:55:15) [GCC 4.2.1 Compatible Apple LLVM 10.0.1 (clang-1001.0.37.14)]


And now random playbooks and tasks just hang in unusual places. 


Before the upgrade of ansible all this playbooks and tasks worked as expected.


- name: Install programs (choco)
 win_chocolatey
:
 name
: "{{ item }}"
 state
: "present"
 
when:
 
- chocolatey_packages is defined
 loop
: "{{ chocolatey_packages|flatten(levels=1) }}"


The above is a task that randomly hangs.


<win2016.LAB.testing.com> ESTABLISH WINRM CONNECTION FOR USER: Admini...@LAB.testing.com on PORT 5986 TO win2016.LAB.testing.com
Using module file /Users/tanner/projects/ansible.git/ansible/lib/ansible/modules/windows/win_chocolatey.ps1
Pipelining is enabled.


I've let this sit for 12 hours.


No timeouts. No errors. 


$ python -V

Python 2.7.10


Management host: macOS Mojave 10.14.5

Managed host: Windows Server 2016


$ kinit --version

kinit (Heimdal 1.5.1apple1)

Copyright 1995-2011 Kungliga Tekniska Högskolan

Send bug-reports to heimda...@h5l.org


This feels like a personal problem or a python-2.7 problem (still in progress of upgrading to python3)


I didn't open an issue because I'd like to learn how to report more(?) better(?) info regarding this issue.


Any help would be appreciated.


Thanks.


Bob Tanner

unread,
Jun 6, 2019, 10:51:00 PM6/6/19
to Ansible Project
I do notice when things hang the python process goes into a zombie state.

  501 87898 86460   0  9:37PM ttys000    0:00.00 (python)             2006   0  0        0      0 -      Z 
  501 87900 86460   0  9:37PM ttys000    0:00.00 (python)             2006   0  0        0      0 -      Z 
  501 87909 86460   0  9:38PM ttys000    0:00.00 (python)             2006   0  0        0      0 -      Z

Dick Visser

unread,
Jun 6, 2019, 11:44:33 PM6/6/19
to ansible...@googlegroups.com
If everything worked, what was the reason for upgrading to an unreleased development version?



--
You received this message because you are subscribed to the Google Groups "Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ansible-proje...@googlegroups.com.
To post to this group, send email to ansible...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ansible-project/e918e71c-1088-48d8-8164-3e72e655a06d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
Sent from a mobile device - please excuse the brevity, spelling and punctuation.

Bob Tanner

unread,
Jun 7, 2019, 11:54:22 AM6/7/19
to Ansible Project
On Thursday, June 6, 2019 at 10:44:33 PM UTC-5, Dick Visser wrote:
If everything worked, what was the reason for upgrading to an unreleased development version?


This isn't a production environment and I am testing the devel branch by design. Helping with making ansible better?  

J Hawkesworth

unread,
Jun 8, 2019, 3:48:50 AM6/8/19
to Ansible Project
Hi Bob,

Does it just hang on chocolatey or on other things too?

I think there's been a recent change which may have broken the ansible<->chocolatey integration although iirc its been fixed by latest chocolatey.

Anything in the event logs on the target machine(s)?  I don't use choco much but is it logging and if so is there anything in the log file for it?

Hope this helps,

Jon

Bob Tanner

unread,
Jun 10, 2019, 10:19:46 AM6/10/19
to Ansible Project


Hi Bob,

Does it just hang on chocolatey or on other things too?

I think there's been a recent change which may have broken the ansible<->chocolatey integration although iirc its been fixed by latest chocolatey.

Anything in the event logs on the target machine(s)?  I don't use choco much but is it logging and if so is there anything in the log file for it?

Hope this helps,

Predominately win_chocolatey tasks. And, yes, there was a problem with ansible<->chocolatey but 0.15.0 fixed the problem.
 
Infrequently win_update, which I have always attributed to windows update services being buggy, native patching has problems for me sometimes :-( 

I'll be testing more this week.

Jordan Borean

unread,
Jun 10, 2019, 8:15:20 PM6/10/19
to Ansible Project
There's hasn't been any changes to the winrm or exec process for the devel branch that isn't in 2.8 and I haven't seen any big issues where it is hanging. Some things to check;
  • Do you see the module process actually running on the remote host
  • When it hangs, is the process still running, any left over processes that isn't powershell.exe
  • Is it easily replicable, or does it only happen randomly, e.g. sometimes a task works but other times it doesn't
  • I see you are using a MacOS host with Kerberos auth, there is a problem using the builtin kinit process on MacOS which causes a hang. Try using a self managed Kerberos ticket to rule out this problem

The MacOS Kerberos kinit issue is fixed by making sure pexpect is installed on the Python environment for the controller but it's best to just use your own Kerb ticket to rule it out.

Thanks

Jordan

Bob Tanner

unread,
Jun 11, 2019, 11:45:00 AM6/11/19
to ansible...@googlegroups.com


On Jun 10, 2019, at 7:15 PM, Jordan Borean <jbor...@gmail.com> wrote:

  • I see you are using a MacOS host with Kerberos auth, there is a problem using the builtin kinit process on MacOS which causes a hang. Try using a self managed Kerberos ticket to rule out this problem

The MacOS Kerberos kinit issue is fixed by making sure pexpect is installed on the Python environment for the controller but it's best to just use your own Kerb ticket to rule it out.


I am using MacOS with Kerb auth and I do not understand “self managed Kerberos tickets”

On my macOS controller I pretty much follow https://web.mit.edu/Kerberos/www/krb5-devel/doc/user/tkt_mgmt.html

I’ll install pexpect and see if things work better.

I’ll follow up on the other questions after working things with pexpect installed.



--
Bob Tanner <b...@tanners.org>          Clash of Clan Tag: #LUJ2CPU
Follow me on Facebook and Twitter!  Clash Royale Tag: #2Q98YPC

Bob Tanner

unread,
Jun 11, 2019, 12:23:24 PM6/11/19
to ansible...@googlegroups.com
On Jun 10, 2019, at 7:15 PM, Jordan Borean <jbor...@gmail.com> wrote:

  • Do you see the module process actually running on the remote host
I do not see the module process running on the remote host. System Idle Process and Task Manager are the only things taking CPU.

  • When it hangs, is the process still running, any left over processes that isn't powershell.exe
I do see powershell.exe and choco.exe in the process list.

  • Is it easily replicable, or does it only happen randomly, e.g. sometimes a task works but other times it doesn’t
Easily replicable with this ansible tasks

- name: update installed chocolatey packages
win_chocolatey:
name: all
state: latest
tags: win_chocolatey
  • I see you are using a MacOS host with Kerberos auth, there is a problem using the builtin kinit process on MacOS which causes a hang. Try using a self managed Kerberos ticket to rule out this problem

The MacOS Kerberos kinit issue is fixed by making sure pexpect is installed on the Python environment for the controller but it's best to just use your own Kerb ticket to rule it out.


I installed pexpect and no change in behavior. The “update installed chocolatey packages” task hangs.

Bob Tanner

unread,
Jun 11, 2019, 12:50:32 PM6/11/19
to ansible...@googlegroups.com
Doing a dtruss on the python process on the controller I just get screens of the following.

$ sudo dtruss -p 64569
select(0x0, 0x0, 0x0, 0x0, 0x7FFEE3C1D610) = 0 0
select(0x0, 0x0, 0x0, 0x0, 0x7FFEE3C1D610) = 0 0
select(0x0, 0x0, 0x0, 0x0, 0x7FFEE3C1D610) = 0 0
select(0x0, 0x0, 0x0, 0x0, 0x7FFEE3C1D610) = 0 0
select(0x0, 0x0, 0x0, 0x0, 0x7FFEE3C1D610) = 0 0
select(0x0, 0x0, 0x0, 0x0, 0x7FFEE3C1D610) = 0 0
select(0x0, 0x0, 0x0, 0x0, 0x7FFEE3C1D610) = 0 0
select(0x0, 0x0, 0x0, 0x0, 0x7FFEE3C1D610) = 0 0
select(0x0, 0x0, 0x0, 0x0, 0x7FFEE3C1D610) = 0 0
select(0x0, 0x0, 0x0, 0x0, 0x7FFEE3C1D610) = 0 0
select(0x0, 0x0, 0x0, 0x0, 0x7FFEE3C1D610) = 0 0
select(0x0, 0x0, 0x0, 0x0, 0x7FFEE3C1D610) = 0 0
select(0x0, 0x0, 0x0, 0x0, 0x7FFEE3C1D610) = 0 0
select(0x0, 0x0, 0x0, 0x0, 0x7FFEE3C1D610) = 0 0
select(0x0, 0x0, 0x0, 0x0, 0x7FFEE3C1D610) = 0 0
select(0x0, 0x0, 0x0, 0x0, 0x7FFEE3C1D610) = 0 0
select(0x0, 0x0, 0x0, 0x0, 0x7FFEE3C1D610) = 0 0
select(0x0, 0x0, 0x0, 0x0, 0x7FFEE3C1D610) = 0 0
select(0x0, 0x0, 0x0, 0x0, 0x7FFEE3C1D610) = 0 0
select(0x0, 0x0, 0x0, 0x0, 0x7FFEE3C1D610) = 0 0
select(0x0, 0x0, 0x0, 0x0, 0x7FFEE3C1D610) = 0 0
select(0x0, 0x0, 0x0, 0x0, 0x7FFEE3C1D610) = 0 0
select(0x0, 0x0, 0x0, 0x0, 0x7FFEE3C1D610) = 0 0
select(0x0, 0x0, 0x0, 0x0, 0x7FFEE3C1D610) = 0 0
select(0x0, 0x0, 0x0, 0x0, 0x7FFEE3C1D610) = 0 0
select(0x0, 0x0, 0x0, 0x0, 0x7FFEE3C1D610) = 0 0
select(0x0, 0x0, 0x0, 0x0, 0x7FFEE3C1D610) = 0 0
select(0x0, 0x0, 0x0, 0x0, 0x7FFEE3C1D610) = 0 0
select(0x0, 0x0, 0x0, 0x0, 0x7FFEE3C1D610) = 0 0
select(0x0, 0x0, 0x0, 0x0, 0x7FFEE3C1D610) = 0 0

Jordan Borean

unread,
Jun 11, 2019, 5:01:52 PM6/11/19
to Ansible Project
The self managed kerberos ticket is just you running `kinit` yourself and disabling the auto kinit process that Ansible runs if you supply the password. Because you actually see the process running on the Windows host then that rules out that problem and you can ignore it.

If it is hanging on choco upgrade all and you still see choco.exe running in the background then there's not much we can do. Chocolatey seems to be running something and not exiting so win_chocolatey won't exit for you. You will have to look at Chocolatey's logs, or even run the upgrade all process yourself manually and see if it's prompting for anything.

Thanks

Jordan

Bob Tanner

unread,
Jun 11, 2019, 5:29:52 PM6/11/19
to ansible...@googlegroups.com

On Jun 11, 2019, at 4:01 PM, Jordan Borean <jbor...@gmail.com> wrote:

The self managed kerberos ticket is just you running `kinit` yourself and disabling the auto kinit process that Ansible runs if you supply the password. Because you actually see the process running on the Windows host then that rules out that problem and you can ignore it.

If it is hanging on choco upgrade all and you still see choco.exe running in the background then there's not much we can do. Chocolatey seems to be running something and not exiting so win_chocolatey won't exit for you. You will have to look at Chocolatey's logs, or even run the upgrade all process yourself manually and see if it's prompting for anything.


I am doing the kinit myself.

I did install pexpect but didn’t seem to make a difference, still hung. This time on removal of the chocolatey packages hung, task below:

- name: remove old/deprecated packages (choco)
win_chocolatey:
name: "{{ chocolatey_remove_packages|list }}"
state: "absent"
when:
- chocolatey_remove_packages is defined


Doing a “cup all” there are packages that need to be updated and the update process finishes as expected with 0

Chocolatey upgraded 9/40 packages.
 See the log for details (C:\ProgramData\chocolatey\logs\chocolatey.log).

Upgraded:
 - googlechrome v75.0.3770.80
 - winscp v5.15.2
 - flashplayeractivex v32.0.0.207
 - git v2.22.0
 - winscp.install v5.15.2
 - cleanup v4.5.2.20190522
 - vscode v1.35.0
 - flashplayerplugin v32.0.0.207
 - git.install v2.22.0
PS C:\Users\Administrator.CORP> $LASTEXITCODE
0

Mostly a linux and MacOS guy so I find the Windows Event Viewer super convoluted. 

Where would I look for errors? 

I see Information > Ansible stuff but nothing in Critical, Error, or Warning  related to Ansible.

I’ll also jump over to the chocolatey group but I think they will say the problem only occurs with win_chocolatey it’s an Ansible issue not a chocolatey issue.

Jordan Borean

unread,
Jun 11, 2019, 5:49:27 PM6/11/19
to Ansible Project
It's a matter of narrowing down there the hang happens, the best way to do this is to narrow down is to find a single task that is repeatable and will always result in the hang. When you do this you can then debug further to narrow down where exactly the hang occurs, this can be done by;
  • Running the command manually to see if it works properly
  • Checking the logs that relate to the action, this could be Windows Event logs for Windows stuff but it could also just be log files for the program you are calling
  • Use something like win_command/win_shell to run the same step manually and see if the problem exists in the module or in the calling program + possibly WinRM
If you can't reliably replicate the hang then you need to do some more work in trying to find common scenarios that cause it to happen. Common scenarios you can try and identify are;
  • Whether this happens when calling a certain program, e.g. chocolatey
    • Using Chocolatey as an example, if it only happens when using it look at the Chocolatey logs on the system, IIRC this is at C:\ProgramData\Chocolatey
    • Going further, I would then try win_command/win_shell to run the same task and see if it hangs
    • I would also run the same command locally (outside of WinRM) and see if it hangs/prompts something
  • Whether this happens only on a certain host, or group of hosts
    • If a group of hosts then do they share anything in common, e.g. PS version, Windows version and so on
  • If it just happens random for any task, e.g. for win_file, win_stat, any other module then that is good info to know
    • This would indicate a potential problem with Ansible's exec model but not necessarily
  • Try out older Ansible versions, e.g. 2.6, 2.7, or 2.8 to see if the problem is also there
  • If it only started appearing after a set date then look at what changed then
    • This could be things like a Windows update
    • A program update
    • Python package change
    • Ansible update
    • GPO policy change
    • So much more
Identifying hangs aren't easy but without knowing the full story is next to impossible. From what you are saying the problem looks like it is happening on the execution side and unrelated to the Ansible controller so debugging the Python process probably won't give you any good information.

Thanks

Jordan

Bob Tanner

unread,
Jun 12, 2019, 10:41:27 AM6/12/19
to ansible...@googlegroups.com
Ansible did finally throw an error, this time on a Windows7 systems

TASK [win_chocolatey : update installed chocolatey packages] *******************
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: ConnectionError: HTTPSConnectionPool(host='win7.LAB.testing.com', port=5986): Max retries exceeded with url: /wsman (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x111607d90>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known',))
fatal: [win7.LAB.testing.com]: FAILED! => {"msg": "Unexpected failure during module execution.", "stdout": “"}

Some time stats

real 65m28.845s
user 3m14.159s
sys 0m47.496s

Bob Tanner

unread,
Jun 12, 2019, 1:10:35 PM6/12/19
to ansible...@googlegroups.com


On Jun 11, 2019, at 4:49 PM, Jordan Borean <jbor...@gmail.com> wrote:

Identifying hangs aren't easy but without knowing the full story is next to impossible. From what you are saying the problem looks like it is happening on the execution side and unrelated to the Ansible controller so debugging the Python process probably won't give you any good information.

So how does the win_chocolatey module interactive with chocolatey packages what use something like AutoIt for post-install configuration?

(Linux guy here) so does Windows have something like a controlling tty?

Doing some debugging and I think the hang is on upgrading a package called CleanUp! (which I do maintain) <https://chocolatey.org/packages/cleanup> and the chocolateyintall.ps1 uses AutoIt

$ErrorActionPreference = 'Stop';

$packageName = 'CleanUp'
$version = '452'
$fullPackage = $packageName + $version + '.exe'
$checkSum = '6ac7576c0b48ddee292f85724c7917e11360927ff8e5c5a8d795577a4241c131'
$workSpace = Join-Path $env:TEMP "$packageName-$env:chocolateyPackageVersion"

# silent install requires AutoIT
#
$autoitExe = 'AutoIt3.exe'
$toolsDir = $(Split-Path -parent $MyInvocation.MyCommand.Definition)
$autoitFile = Join-Path $toolsDir 'cleanup.au3'
$fileFullPath = Join-Path $workSpace $fullPackage

$autoitProcess = Start-Process -FilePath $autoitExe -ArgumentList "$autoitFile $fileFullPath" -PassThru

$installArgs = @{
softwareName = 'CleanUp*'
packageName = $packageName
file = Join-Path $workSpace $fullPackage
url = $url
checksum = $checkSum
checksumType = 'sha256'
fileType = 'exe'
silentArgs = "/qn /norestart /l*v `"$($env:TEMP)\$($packageName)-$($env:chocolateyPackageVersion).MsiInstall.log`" ALLUSERS=1"
validExitCodes = @(0, 3010, 1641)
}

Install-ChocolateyPackage @installArgs

Installing the package at the console or upgrading the package at the console all works as expected.

But trying to install or upgrade the package via Ansible and win_chocolatey I see both "CleanUp452.exe *32" and "AutoIt3.exe *32” in Task Manager as well as powershell.exe and choice.exe.

IF remove cancel out (control-c) the Ansible process on the management host and I do a “choco list -lo” on the managed host I get a warning:

“[Pending] Removing incomplete install for ‘cleanup’”

That would seem to hint that the chocolatey was processing the CleanUp! package?

Any recommendations on what to try/do next?

Bob Tanner

unread,
Jun 12, 2019, 1:45:12 PM6/12/19
to ansible...@googlegroups.com
> IF remove cancel out (control-c) the Ansible process on the management host and I do a “choco list -lo” on the managed host I get a warning:
>
> “[Pending] Removing incomplete install for ‘cleanup’”
>
> That would seem to hint that the chocolatey was processing the CleanUp! package?
>
> Any recommendations on what to try/do next?
>

Took CleanUp! out of the package installation and upgrade tasks and now all win_chocolatey tasks complete.

Another chocolatey package I maintain EasyLog USB <https://chocolatey.org/packages/easylogusb>, it also uses AutoIt for installation/updates <https://github.com/basictheprogram/chocolatey-EasyLogUSB/blob/master/tools/chocolateyinstall.ps1> and the win_chocolatey update task hangs on this package too.

So it looks like win_chocolatey modules has problem interacting with AutoIt?




Bob Tanner

unread,
Jun 12, 2019, 2:57:13 PM6/12/19
to ansible...@googlegroups.com
Tested against Windows7, Windows10, Windows Server 2008 R2, Windows Server 2016 and all releases hang on the upgrade of CleanUp!

Looking like the problem with win_chocolatey and AutoIt.

Jordan Borean

unread,
Jun 12, 2019, 5:39:38 PM6/12/19
to Ansible Project
AutoIt can get really fun when dealing with non-interactive sessions that WinRM is run on. The behaviour around console windows in session 0 is not the same as a normal interactive session you get when running it manually. If AutoIt is waiting for specific console state or is trying to click a button in the wrong area it won't work as expected. So effectively the problem isn't Chocolatey, Ansible, or win_chocolatey, it's using AutoIt in a non-interactive session that is a byproduct of WinRM. I don't know enough about AutoIt to give you tips on your scripts but ultimately you want to try and have some sort of timeout that will fail if it exceeds a certain time. This won't help with getting it working but it will at least tell people what the issue may be instead of just hanging.

Thanks

Jordan
Reply all
Reply to author
Forward
0 new messages