Windows playbooks causes cryptographic, windows remote management services to crash

192 views
Skip to first unread message

Michael Perzel

unread,
Jul 28, 2016, 10:17:03 AM7/28/16
to Ansible Project
Since upgrading to ansible 2.0 my windows playbooks have been failing with the following error. This error has been seen when running setup, win_template, script tasks. The easiest way to repeat it is to have multiple simultaneous runs of ansible affecting the same host. If we re-run the exact same playbook after a failure they almost always succeed.

Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/ansible/plugins/connection/winrm.py", line 240, in exec_command
    result = self._winrm_exec(cmd_parts[0], cmd_parts[1:], from_exec=True)
  File "/usr/lib/python2.6/site-packages/ansible/plugins/connection/winrm.py", line 208, in _winrm_exec
    self.protocol.cleanup_command(self.shell_id, command_id)
  File "/usr/lib/python2.6/site-packages/awx/lib/site-packages/winrm/protocol.py", line 290, in cleanup_command
    rs = self.send_message(xmltodict.unparse(rq))
  File "/usr/lib/python2.6/site-packages/awx/lib/site-packages/winrm/protocol.py", line 193, in send_message
    return self.transport.send_message(message)
  File "/usr/lib/python2.6/site-packages/awx/lib/site-packages/winrm/transport.py", line 136, in send_message
    raise WinRMTransportError('http', error_message)
WinRMTransportError: 500 WinRMTransport. Bad HTTP response returned from server.  Code 500
fatal: [hostname]: FAILED! => {"failed": true, "msg": "failed to exec cmd PowerShell -NoProfile -NonInteractive -ExecutionPolicy Unrestricted -EncodedCommand reallylongencodedcommand=="}

If we capture the tcp traffic on the windows side we see the SYN packets arriving so we know the issue isn't at the network level. The packets are reaching the windows box.  If we run a netstat while the playbook is running we notice there are a bunch of connections then all of a sudden there are none for a bit and then we are back listening. Using the windows event log if you compare the timeline of when netstat shows no listeners and cryptographic services, dns client services, workstation service, network location service, windows remote management crash they match up perfectly. After the services crash, windows restarts them automatically and the ansible playbooks start working again.  We've been having this issue on windows server 2012 boxes with 8gb ram and 4 cpus. We've been able to reproduce it with a completely vanilla server 2012 box (no antivirus or other 3rd party software installed on it). I'm at a complete loss on how to fix this.

Has anyone else seen this behavior? I haven't found anything similar in the issue tracker or in google searches.


Michael Perzel

unread,
Jul 28, 2016, 10:19:45 AM7/28/16
to Ansible Project
Forgot to mention we've also experimented with increasing the winrm maxconcurrentusers, maxprocessespershell, maxshellsperuser settings but haven’t seen any difference in behavior.

>winrm get winrm/config/winrs

Winrs

    AllowRemoteShellAccess = true

    IdleTimeout = 7200000

    MaxConcurrentUsers = 30

    MaxShellRunTime = 2147483647

    MaxProcessesPerShell = 25

    MaxMemoryPerShellMB = 1024

    MaxShellsPerUser = 30

J Hawkesworth

unread,
Jul 29, 2016, 12:27:08 PM7/29/16
to Ansible Project
Not seen this myself and having been running 2.0.0.2 against our herd of windows server 2012 boxes for months.

Did you upgrade pywinrm to 0.2.0 by any chance?

Also I spotted this bug report which sounds simliar to your case -  https://github.com/ansible/ansible/issues/16873 - although the stack trace is not failing at the same point so could be something different.

Jon

Michael Perzel

unread,
Jul 29, 2016, 3:06:28 PM7/29/16
to Ansible Project
No I haven't upgrade pywinrm. Running 0.1.1.

> pip show pywinrm
DEPRECATION: Python 2.6 is no longer supported by the Python core team, please upgrade your Python. A future version of pip will drop support for Python 2.6
---
Metadata-Version: 1.0
Name: pywinrm
Version: 0.1.1
Summary: Python library for Windows Remote Management
Author: Alexey Diyan
Author-email: alexey...@gmail.com
License: MIT license
Location: /usr/lib/python2.6/site-packages
Requires: xmltodict, isodate
Classifiers:
  Development Status :: 4 - Beta
  Environment :: Console
  Intended Audience :: Developers
  Intended Audience :: System Administrators
  Natural Language :: English
  License :: OSI Approved :: MIT License
  Programming Language :: Python
  Programming Language :: Python :: 2
  Programming Language :: Python :: 2.6
  Programming Language :: Python :: 2.7
  Programming Language :: Python :: 3
  Programming Language :: Python :: 3.2
  Programming Language :: Python :: 3.3
  Programming Language :: Python :: Implementation :: PyPy
  Topic :: Software Development :: Libraries :: Python Modules
  Topic :: System :: Clustering
  Topic :: System :: Distributed Computing
  Topic :: System :: Systems Administration
/usr/lib/python2.6/site-packages/pip/_vendor/requests/packages/urllib3/util/ssl_.py:318: SNIMissingWarning: An HTTPS request has been made, but the SNI (Subject Name Indication) extension to TLS is not available on this platform. This may cause the server to present an incorrect TLS certificate, which can cause validation failures. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#snimissingwarning.
  SNIMissingWarning
/usr/lib/python2.6/site-packages/pip/_vendor/requests/packages/urllib3/util/ssl_.py:122: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
  InsecurePlatformWarning

J Hawkesworth

unread,
Aug 1, 2016, 8:15:02 AM8/1/16
to Ansible Project
Might be worth trying pywinrm 0.2.0 - even if its just because its much quicker than 0.1.1

However I don't think that by itself it will fix your problem though.

Looking again the machines I'm running are actually S2012 R2 not S2012 and are mostly 2cpu 4Gb virtual machines.

If yours are S2012 not S2012R2 its worth checking the powershell and WMF version.  WMF 3.0 had a bug in it that meant it would fail to run almost anything but the most trivial winrm command - if so upgrading to WMF4.0 / powershell 4.0 is thoroughly recommended.

Jon

Michael Perzel

unread,
Aug 1, 2016, 11:30:04 AM8/1/16
to Ansible Project
Thanks I'll give 0.2.0 a try.

We are running S2012R2 and powershell version 4.

PS C:\Windows\system32> $PSVersionTable.PSVersion

Major  Minor  Build  Revision
-----  -----  -----  --------
4      0      -1     -1

We are looking at powershell 5 for other reasons. Have you tried it out with ansible?

Michael

J Hawkesworth

unread,
Aug 1, 2016, 12:34:42 PM8/1/16
to Ansible Project
I have run the ansible integration tests against a Server 2016 Tech Preview 5 build, which runs Powershell 5 and WMF 5.0. 

The only issue I have encountered so far is with uninstalling windows features - it seems there's a new version of the cmdlet that unininstalls features and seems to fail without an interactive user (not tested thoroughly yet so could be wrong about interactive user).

Jon

Matt Davis

unread,
Aug 1, 2016, 4:01:27 PM8/1/16
to Ansible Project
I've heard one other report of this happening a few weeks ago (but was via Ansible support and I didn't know who the customer was- maybe it was also you?)

The services in question share the winrm host process, so not surprising that they're the ones going down. 

pywinrm 0.2.0 could definitely help some with this, as the HTTP(S) connections are reused for the various winrm calls within a task, where 0.1.1 and lower get a new connection for every winrm operation.

Let us know if it keeps up- it'd definitely be a Microsoft issue (winrm service shouldn't crash. Ever.), but we might be able to short-circuit the official support loop and get you in touch with the right folks directly.

-Matt

J Hawkesworth

unread,
Aug 2, 2016, 3:51:48 AM8/2/16
to Ansible Project
Reading this again I realise that running multiple playbooks against my windows hosts simultaneously is something I do not do very often, so my experience may not apply.

I hope pywinrm 0.2.0 turns out to fix this for you.

Jon

Michael Perzel

unread,
Aug 8, 2016, 11:26:34 AM8/8/16
to Ansible Project
Since upgrading to 0.2.0, it hasn't occurred but this issue has been fairly hard to reproduce consistently outside of our production environment.

Michael
Reply all
Reply to author
Forward
0 new messages