Windows Domain/Ansible Kerberos Auth Issues Still

58 views
Skip to first unread message

Dave York

unread,
Jun 8, 2020, 1:09:02 PM6/8/20
to Ansible Project
(I've posted a bit about this before, but I want to revisit it because its frustrating as I try to optimize my playbooks)

I have a playbook where I build servers from vmware templates using vmware_guest and I join the domain using that module.  Once the servers are built I have an extremely long "wait_for_connection":

  - nameWait until server becomes available to connect
    wait_for_connection:
      delay900 #Wait 10 minutes before trying
      sleep30 #After 10 minutes, try every 30 seconds
      timeout1200 #Maximum amount of time to wait

After this wait, I start running tasks on the new hosts.  Initially, those tasks will run fine, but one-by-one, randomly, the servers will start failing with Kerberos errors.  During this time I can confirm im able to login to these servers using the same credentials, so the authentication doesn't seem to be failing outside of ansible, but it fails within ansible for some reason.

The longer I wait after building the servers, the less likely this issue occurs.  It just seems insane that I have to keep adding more wait time.  

Here's me running the playbook against 4 servers.  Each task runs against all four servers but the red lines highlighed show the kerberos failures and the eventual atrophy of the playbook entirely because of the kerberos errors:

TASK [Registry fix to enable solution for CVE-2017-8529 Part 1] ****************
Monday 08 June 2020  16:32:22 +0000 (0:00:09.368)       0:33:29.081 *********** 
changed: [server4.fqdn] => {"changed": true, "data_changed": false, "data_type_changed": false}
changed: [server1.fqdn] => {"changed": true, "data_changed": false, "data_type_changed": false}
changed: [server3.fqdn] => {"changed": true, "data_changed": false, "data_type_changed": false}
changed: [server2.fqdn] => {"changed": true, "data_changed": false, "data_type_changed": false}
TASK [Registry fix to enable solution for CVE-2017-8529 Part 2] ****************
Monday 08 June 2020  16:32:25 +0000 (0:00:03.635)       0:33:32.717 *********** 
changed: [server1.fqdn] => {"changed": true, "data_changed": false, "data_type_changed": false}
changed: [server4.fqdn] => {"changed": true, "data_changed": false, "data_type_changed": false}
changed: [server2.fqdn] => {"changed": true, "data_changed": false, "data_type_changed": false}
changed: [server3.fqdn] => {"changed": true, "data_changed": false, "data_type_changed": false}
TASK Configure UAC] *************************************************************
Monday 08 June 2020  16:32:29 +0000 (0:00:03.388)       0:33:36.105 *********** 
fatal: [server3.fqdn]: UNREACHABLE! => {"changed": false, "msg": "kerberos: the specified credentials were rejected by the server", "unreachable": true}
changed: [server1.fqdn] => {"changed": true, "data_changed": true, "data_type_changed": false}
changed: [server2.fqdn] => {"changed": true, "data_changed": true, "data_type_changed": false}
changed: [server4.fqdn] => {"changed": true, "data_changed": true, "data_type_changed": false}
TASK [Initialize Disk 1] *******************************************************
Monday 08 June 2020  16:32:32 +0000 (0:00:03.335)       0:33:39.440 *********** 
changed: [server4.fqdn] => {"changed": true, "cmd": "Initialize-Disk -Number 1", "delta": "0:00:04.105311", "end": "2020-06-08 04:32:39.137372", "rc": 0, "start": "2020-06-08 04:32:35.032060", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
changed: [server1.fqdn] => {"changed": true, "cmd": "Initialize-Disk -Number 1", "delta": "0:00:03.903042", "end": "2020-06-08 04:32:39.527549", "rc": 0, "start": "2020-06-08 04:32:35.624506", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
changed: [server2.fqdn] => {"changed": true, "cmd": "Initialize-Disk -Number 1", "delta": "0:00:05.007749", "end": "2020-06-08 04:32:40.903429", "rc": 0, "start": "2020-06-08 04:32:35.895680", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
TASK [Wait 15 seconds for disk initilization] **********************************
Monday 08 June 2020  16:32:41 +0000 (0:00:08.457)       0:33:47.898 *********** 
Pausing for 15 seconds
(ctrl+C then 'C' = continue early, ctrl+C then 'A' = abort)
ok: [server1.fqdn] => {"changed": false, "delta": 15, "echo": true, "rc": 0, "start": "2020-06-08 16:32:41.126472", "stderr": "", "stdout": "Paused for 15.0 seconds", "stop": "2020-06-08 16:32:56.126843", "user_input": ""}
TASK [Partition Disk 1] ********************************************************
Monday 08 June 2020  16:32:56 +0000 (0:00:15.051)       0:34:02.949 *********** 
changed: [server4.fqdn] => {"changed": true}
changed: [server1.fqdn] => {"changed": true}
changed: [server2.fqdn] => {"changed": true}
TASK [Format Disk 1 as E drive] ************************************************
Monday 08 June 2020  16:33:03 +0000 (0:00:06.888)       0:34:09.838 *********** 
changed: [server4.fqdn] => {"changed": true}
changed: [server1.fqdn] => {"changed": true}
changed: [server2.fqdn] => {"changed": true}
TASK [Stage AV Setup Binaries to e:\admin\binaries\] ******************
Monday 08 June 2020  16:33:39 +0000 (0:00:24.463)       0:34:46.237 *********** 
changed: [server4.fqdn] => {"changed": true, "dest": "e:\\admin\\binaries\\AVAgent\\", "operation": "folder_copy", "size": 27713762, "src": "\\\\reposerver\\Applications\\Production\\AV"}
changed: [server1.fqdn] => {"changed": true, "dest": "e:\\admin\\binaries\\AVAgent\\", "operation": "folder_copy", "size": 27713762, "src": "\\\\reposerver\\Applications\\Production\\AV"}
changed: [server2.fqdn] => {"changed": true, "dest": "e:\\admin\\binaries\\AVAgent\\", "operation": "folder_copy", "size": 27713762, "src": "\\\\reposerver\\Applications\\Production\\AV"}
TASK [Stage SecScan Setup Binaries to e:\admin\binaries\] ***********************
Monday 08 June 2020  16:33:42 +0000 (0:00:03.402)       0:34:49.639 *********** 
changed: [server1.fqdn] => {"changed": true, "dest": "e:\\admin\\binaries\\SecScan\\64bit", "operation": "folder_copy", "size": 23530139, "src": "\\\\reposerver\\Applications\\Production\\SecScan"}
changed: [server4.fqdn] => {"changed": true, "dest": "e:\\admin\\binaries\\SecScan\\64bit", "operation": "folder_copy", "size": 23530139, "src": "\\\\reposerver\\Applications\\Production\\SecScan"}
changed: [server2.fqdn] => {"changed": true, "dest": "e:\\admin\\binaries\\SecScan\\64bit", "operation": "folder_copy", "size": 23530139, "src": "\\\\reposerver\\Applications\\Production\\SecScan"}
TASK [Stage LAPS Setup Binaries to e:\admin\binaries\] *************************
Monday 08 June 2020  16:33:46 +0000 (0:00:03.674)       0:34:53.314 *********** 
fatal: [server1.fqdn]: UNREACHABLE! => {"changed": false, "msg": "kerberos: the specified credentials were rejected by the server", "unreachable": true}
changed: [server2.fqdn] => {"changed": true, "dest": "e:\\admin\\binaries\\LAPSAgent\\x64", "operation": "folder_copy", "size": 1019904, "src": "\\\\reposerver\\Applications\\Production\\Microsoft\\LAPS"}
changed: [server4.fqdn] => {"changed": true, "dest": "e:\\admin\\binaries\\LAPSAgent\\x64", "operation": "folder_copy", "size": 1019904, "src": "\\\\reposerver\\Applications\\Production\\Microsoft\\LAPS"}
TASK [Ensure LAPS is installed] ************************************************
Monday 08 June 2020  16:33:49 +0000 (0:00:03.291)       0:34:56.606 *********** 
changed: [server4.fqdn] => {"changed": true, "rc": 0, "reboot_required": false}
changed: [server2.fqdn] => {"changed": true, "rc": 0, "reboot_required": false}
TASK [Ensure Agent is installed] **********************************************
Monday 08 June 2020  16:33:54 +0000 (0:00:04.571)       0:35:01.177 *********** 
fatal: [server2.fqdn]: UNREACHABLE! => {"changed": false, "msg": "kerberos: the specified credentials were rejected by the server", "unreachable": true}
changed: [server4.fqdn] => {"changed": true, "rc": 0, "reboot_required": false}
TASK [Ensure Agent is installed] ************************************************
Monday 08 June 2020  16:34:03 +0000 (0:00:09.009)       0:35:10.187 *********** 
changed: [server4.fqdn] => {"changed": true, "rc": 0, "reboot_required": false}
TASK [Ensure AV is installed] ******************************************
Monday 08 June 2020  16:34:08 +0000 (0:00:04.973)       0:35:15.161 *********** 
fatal: [server4.fqdn]: UNREACHABLE! => {"changed": false, "msg": "kerberos: the specified credentials were rejected by the server", "unreachable": true}


I'm a bit new to the Linux world, is it possible this is a bug within something on the linux node I run ansible/ansible tower off of? I initially thought it was something with AD replication, but I can authenticate fine against these servers within minutes of them being added to the domain through normal windows/microsoft processes.

Thanks in advance for any advice!

David Foley

unread,
Jun 11, 2020, 7:01:46 AM6/11/20
to Ansible Project
Are these Linux Machines? 
how many Domain Controllers are in your Environment if you have more then one it may be doing a round-robin on the Kerberos and failing on one Domain and not on the other.: you need to start restricting down to only allow your Linux Server to Connect to one AD.

Dave York

unread,
Jun 15, 2020, 4:00:44 PM6/15/20
to Ansible Project
The machines being managed here are windows machines - but the ansible tower server itself is linux (obviously) - I wonder if the kerberos configuration on the tower machine may be running into a flavor of what you're suggesting - but im not sure exactly how I would point the tower server directly to just one DC for authentication

Rafael Tomelin

unread,
Jun 16, 2020, 6:52:09 AM6/16/20
to ansible...@googlegroups.com
Hi Dave,

I didn't understand the problem, but I would to like help you.

My wait server is:
- name: waiting Windows server
wait_for:
port: 5985
sleep: 50
timeout: 500
host: "{{ groups[item[1]][-1] }}"
when:
- item[0].value.os_type | lower == 'windows'
- item[0].key == item[1]
loop_control:
index_var: idx
with_nested:
- "{{ virtual_machine|dict2items }}"
- "{{ groups }}"

- name: waiting linux server
wait_for:
port: 22
sleep: 10
timeout: 300
host: "{{ groups[item[1]][-1] }}"
when:
- item[0].value.os_type | lower == 'linux'
- item[0].key == item[1]
loop_control:
index_var: idx
with_nested:
- "{{ virtual_machine|dict2items }}"
- "{{ groups }}"


About your problem authentication, in Kerberos/sssd is possibly defined the default domain if you working more domains but you need pass user+domain for authentication.
--
You received this message because you are subscribed to the Google Groups "Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ansible-proje...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ansible-project/01aa821e-afb8-45fb-8bd0-3baec466e97co%40googlegroups.com.


--
Atenciosamente,

Rafael Tomelin
Tel.: 51-984104084
Skype: rafael.tomelin

LPI ID: LPI000191271
Red Hat Certified Engineer
Puppet Professional 2017 Certification
Reply all
Reply to author
Forward
0 new messages