WINRM issue with domain user - some works and some dont

1,557 views
Skip to first unread message

Eyal Zarchi

unread,
Sep 1, 2015, 4:08:41 AM9/1/15
to Ansible Project
Hi.

we have a few windows server 2008 R2 that we would like to use the winrm module.
we have similar machines that some work and some dont. i compared the build of the machine, the build of the powershell and even local security policy. the result is still the same.
we use kerberos and winbind on the controller machine and since the winrm module work for windows 2012 and some of the 2008 R2 machines with the domain username, i am guessing the issue is not on the controller.

i though it was because it uses the ticket with the ldap user i logged into the controller machine but i am a member of the administrator group on the target machine and it still doesnt work.
if i create a local username and put it in the administrator group, the winrm work.

here is a machine that works:

<rnpl-qa1-bes01> WINRM RESULT <Response code 0, out "C:\Users\deploy_rn\A", err "">
<rnpl-qa1-bes01> PUT /tmp/tmpe8SQvn TO C:\Users\deploy_rn\AppData\Local\Temp\ansible-tmp-1441020926.8-178247757458762\\win_ping
<rnpl-qa1-bes01> WINRM PUT /tmp/tmpe8SQvn to C:\Users\deploy_rn\AppData\Local\Temp\ansible-tmp-1441020926.8-178247757458762\\win_ping.ps1 (offset=0 size=2035)
<rnpl-qa1-bes01> WINRM PUT /tmp/tmpe8SQvn to C:\Users\deploy_rn\AppData\Local\Temp\ansible-tmp-1441020926.8-178247757458762\\win_ping.ps1 (offset=2035 size=2035)
<rnpl-qa1-bes01> WINRM PUT /tmp/tmpe8SQvn to C:\Users\deploy_rn\AppData\Local\Temp\ansible-tmp-1441020926.8-178247757458762\\win_ping.ps1 (offset=4070 size=2035)
<rnpl-qa1-bes01> WINRM PUT /tmp/tmpe8SQvn to C:\Users\deploy_rn\AppData\Local\Temp\ansible-tmp-1441020926.8-178247757458762\\win_ping.ps1 (offset=6105 size=602)
<rnpl-qa1-bes01> PUT /tmp/tmpsiY4YG TO C:\Users\deploy_rn\AppData\Local\Temp\ansible-tmp-1441020926.8-178247757458762\\arguments
<rnpl-qa1-bes01> WINRM PUT /tmp/tmpsiY4YG to C:\Users\deploy_rn\AppData\Local\Temp\ansible-tmp-1441020926.8-178247757458762\\arguments (offset=0 size=2)
<rnpl-qa1-bes01> EXEC PowerShell -NoProfile -NonInteractive -ExecutionPolicy Unrestricted -File C:\Users\deploy_rn\AppData\Local\Temp\ansible-tmp-1441020926.8-178247757458762\\win_ping.ps1 C:\Users\deploy_rn\AppData\Local\Temp\ansible-tmp-1441020926.8-178247757458762\\arguments; Remove-Item "C:\Users\deploy_rn\AppData\Local\Temp\ansible-tmp-1441020926.8-178247757458762\" -Force -Recurse;
<rnpl-qa1-bes01> WINRM EXEC 'PowerShell' ['-NoProfile', '-NonInteractive', '-EncodedCommand', 'UABvAHcAZQByAFMAaABlAGwAbAAgAC0ATgBvAFAAcgBvAGYAaQBsAGUAIAAtAE4AbwBuAEkAbgB0AGUAcgBhAGMAdABpAHYAZQAgAC0ARQB4AGUAYwB1AHQAaQBvAG4AUABvAGwAaQBjAHkAIABVAG4AcgBlAHMAdAByAGkAYwB0AGUAZAAgAC0ARgBpAGwAZQAgAEMAOgBcAFUAcwBlAHIAcwBcAGQAZQBwAGwAbwB5AF8AcgBuAFwAQQBwAHAARABhAHQAYQBcAEwAbwBjAGEAbABcAFQAZQBtAHAAXABhAG4AcwBpAGIAbABlAC0AdABtAHAALQAxADQANAAxADAAMgAwADkAMgA2AC4AOAAtADEANwA4ADIANAA3ADcANQA3ADQANQA4ADcANgAyAFwAXAB3AGkAbgBfAHAAaQBuAGcALgBwAHMAMQAgAEMAOgBcAFUAcwBlAHIAcwBcAGQAZQBwAGwAbwB5AF8AcgBuAFwAQQBwAHAARABhAHQAYQBcAEwAbwBjAGEAbABcAFQAZQBtAHAAXABhAG4AcwBpAGIAbABlAC0AdABtAHAALQAxADQANAAxADAAMgAwADkAMgA2AC4AOAAtADEANwA4ADIANAA3ADcANQA3ADQANQA4ADcANgAyAFwAXABhAHIAZwB1AG0AZQBuAHQAcwA7ACAAUgBlAG0AbwB2AGUALQBJAHQAZQBtACAAIgBDADoAXABVAHMAZQByAHMAXABkAGUAcABsAG8AeQBfAHIAbgBcAEEAcABwAEQAYQB0AGEAXABMAG8AYwBhAGwAXABUAGUAbQBwAFwAYQBuAHMAaQBiAGwAZQAtAHQAbQBwAC0AMQA0ADQAMQAwADIAMAA5ADIANgAuADgALQAxADcAOAAyADQANwA3ADUANwA0ADUAOAA3ADYAMgBcACIAIAAtAEYAbwByAGMAZQAgAC0AUgBlAGMAdQByAHMAZQA7AA==']
<rnpl-qa1-bes01> WINRM RESULT <Response code 0, out "{ "changed": f", err "">
rnpl-qa1-bes01 | success >> {
    "changed": false,
    "ping": "pong"
}


here is one that doesnt work:

<rnpl-qa1-sts01> ESTABLISH WINRM CONNECTION FOR USER:  on PORT 5986 TO rnpl-qa1-sts01
<rnpl-qa1-sts02> ESTABLISH WINRM CONNECTION FOR USER:  on PORT 5986 TO rnpl-qa1-sts02
<rnpl-qa1-sts01> WINRM CONNECT: transport=kerberos endpoint=https://rnpl-qa1-sts01:5986/wsman
<rnpl-qa1-sts02> WINRM CONNECT: transport=kerberos endpoint=https://rnpl-qa1-sts02:5986/wsman
rnpl-qa1-sts01 | FAILED => the username/password specified for this server was incorrect
rnpl-qa1-sts02 | FAILED => the username/password specified for this server was incorrect


as soon as i remove the @DOMAIN from the host file, and use a local username, the winrm works.
i am probably missing a silly thing but i cant find it.
thanks

Eyal Zarchi

unread,
Sep 1, 2015, 5:00:16 AM9/1/15
to Ansible Project
Another info:
this i get on the server that doesnt work and the one that does.

winrm get winrm/config/client/auth
Auth
    Basic = true
    Digest = true
    Kerberos = true
    Negotiate = true
    Certificate = true
    CredSSP = false


this is in the event viewer:
User authentication using Basic authentication scheme failed. 

 Additional Data 
 Unexpected error received from LogonUser 1326: %%1326.


event ID 10111.


Dimitri Yioulos

unread,
Sep 1, 2015, 11:23:44 AM9/1/15
to Ansible Project
And you're sure the hosts are included in the Ansible hosts file?

Eyal Zarchi

unread,
Sep 6, 2015, 7:20:55 AM9/6/15
to Ansible Project
Hi
the servers are of course in the host file.

Ok some updates on this but first information:

Domain controller : 172.16.10.6
Ansible controller - 172.16.19.1
server that works (STS03) - 172.16.19.41
servers that DOESNT work (STS01) - 172.16.1.114


now if i try with a domain username to access from ansible to STS03 (that works), it is all good.
if i try with a domain username to access from ansible to STS01 (doesnt work) - i get the "server not found in kerberos database" and "username is incorrect"

now if i take the server that doesnt work and move it to the same network (172.16.19.42) near the server that works - everything is working on both servers.

as soon as it is in another vlan, the domain username doesnt work anylonger (a local username on the machine works anywhere).

so i suspected it is maybe something on the dc (in the firewall i have ANY to ANY on all 4 servers: DC, ansible , STS01 & STS 03).

i ran wireshark on the DC and ran against both servers:

when the ansible runs again the server INSIDE the network (STS03) i see this:
172.16.10.6 172.16.19.41 TCP 66 kerberos > 55200 [SYN, ACK] Seq=0 Ack=1 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1
172.16.10.6 172.16.19.41 TCP 54 kerberos > 55200 [RST, ACK] Seq=1441 Ack=1419 Win=0 Len=0

so it seems that the DC is working directly against the destination server.


BUT if i run the same winrm against the server in another VLAN i see this:
172.16.10.6 172.16.12.71 KRB5 176 KRB Error: KRB5KDC_ERR_S_PRINCIPAL_UNKNOWN
172.16.10.6 172.16.12.71 TCP 54 kerberos > 60772 [RST, ACK] Seq=111 Ack=1441 Win=0 Len=0


it seems that when the destination server is in another VLAN, the kerberos is checked against the controller machine and not the destination server.


could i be on to something?

Trond Hindenes

unread,
Sep 6, 2015, 7:27:53 PM9/6/15
to Ansible Project
EYal, just a thought: Could you try replacing ip addresses in your hosts file with actual servername fqdns (sts03.domain.com) and see if that helps?

Eyal Zarchi

unread,
Sep 7, 2015, 4:35:15 AM9/7/15
to Ansible Project
Trond - thanks for the tip.
it actually helped because using tcpdump we saw that we had more than 1 PTR records for the server.
as soon as we fixed that, the winrm worked.
again it was weird since it did work in the same network but not over vlan.

Trond Hindenes

unread,
Sep 7, 2015, 10:51:13 AM9/7/15
to Ansible Project
Good. Kerberos relies on service principal names (which again relies on name resolution), so you need a working DNS infrastructure for Kerberos to work correctly.

Eyal Zarchi

unread,
Oct 8, 2015, 9:04:00 AM10/8/15
to Ansible Project
Hi again.
well after removing all extra PTR the servers where good to do.
i started deploying the ansible on production servers and here i have the same issue exactly but this time the dns and resolve are correct.
local user on the machine is working perfectly
domain user will produce the "
FAILED => the username/password specified for this server was incorrect" error message.

is there any logs i can check or extra errors i can check?
thanks

Trond Hindenes

unread,
Oct 8, 2015, 9:17:25 AM10/8/15
to Ansible Project
Could you test regular ps remoting from another domain-joined windows node against the problematic servers to see if that works?

Eyal Zarchi

unread,
Oct 8, 2015, 9:49:11 AM10/8/15
to Ansible Project
yes that works.
from one machine to another with ps-remotesession i had no problem.
even with the domain username and password i was able to connect.
this happens to all the windows machine in the domain.
beside the powershell script to prepare for ansible i tried to add the security permission for the user but it still doesnt work.

the winrm is ready since i am able to connect with a local username that is in the administrators group.

i already got a few windows machine to work with the domain username so i am probably just missing something
this is from a machine that works:

PS C:\Users\TEMP.JAJAH> winrm get winrm/config/service
Service
    RootSDDL = O:NSG:BAD:P(A;;GA;;;BA)(A;;GWGR;;;S-1-5-21-1738876665-1027346198-3318579073-26131)(A;;GR;;;IU)S:P(AU;FA;G
A;;;WD)(AU;SA;GXGW;;;WD)
    MaxConcurrentOperations = 4294967295
    MaxConcurrentOperationsPerUser = 1500
    EnumerationTimeoutms = 240000
    MaxConnections = 300
    MaxPacketRetrievalTimeSeconds = 120
    AllowUnencrypted = true
    Auth
        Basic = true
        Kerberos = true
        Negotiate = true
        Certificate = false
        CredSSP = false
        CbtHardeningLevel = Relaxed
    DefaultPorts
        HTTP = 5985
        HTTPS = 5986
    IPv4Filter = *
    IPv6Filter = *
    EnableCompatibilityHttpListener = false
    EnableCompatibilityHttpsListener = false
    CertificateThumbprint
    AllowRemoteAccess = true


this is from a machine that doesnt work:
Service
    RootSDDL = O:NSG:BAD:P(A;;GA;;;BA)(A;;GR;;;IU)S:P(AU;FA;GA;;;WD)(AU;SA;GXGW;;;WD)
    MaxConcurrentOperations = 4294967295
    MaxConcurrentOperationsPerUser = 1500
    EnumerationTimeoutms = 240000
    MaxConnections = 300
    MaxPacketRetrievalTimeSeconds = 120
    AllowUnencrypted = false
    Auth
        Basic = true
        Kerberos = true
        Negotiate = true
        Certificate = false
        CredSSP = false
        CbtHardeningLevel = Relaxed
    DefaultPorts
        HTTP = 5985
        HTTPS = 5986
    IPv4Filter = *
    IPv6Filter = *
    EnableCompatibilityHttpListener = false
    EnableCompatibilityHttpsListener = false
    CertificateThumbprint = eb 9b 2d f2 a5 89 03 f2  e2 ca 0e 8a 35 32 39 08c5 a8 42 d7
    AllowRemoteAccess = true

Trond Hindenes

unread,
Oct 8, 2015, 10:11:47 AM10/8/15
to Ansible Project
This line diffs:
AllowUnencrypted = false

That setting basically dictates wether you're allowed to use basic auth using non-encrypted comms. 
Again, from another windows node could you ensure that you're able to connect to the problematic server using basic auth? Try both with and without the -usessl parameter and compare to a working node. I suspect you will find some diffs.

In general we advise using 5986 (SSL) with Ansible.
Reply all
Reply to author
Forward
0 new messages