Promoting domain controller in vagrant causes winrm auth issues

886 views
Skip to first unread message

Wil T

unread,
Jan 24, 2016, 10:39:32 AM1/24/16
to Vagrant
Problem:
When I create a domain controller using a chef-solo config it runs perfectly fine till the machine reboots at the end. After this point vagrant can't authenticate with winrm any more. And I get a message like the following:

==> default: Detecting if a remote PowerShell connection can be made with the guest...
Unable to establish a remote PowerShell connection with the guest.
Check if the firewall rules on the guest allow connections to the
Windows remote management service.

Software versions:
Vagrant 1.8.1
Vagrant-vmware-workstation (4.0.6).
VMWare Workstation Pro - 12.1.0 build-3272444
Windows 10

My vagrantfile:
Vagrant.configure(2) do |config|
  config
.vm.box = "Windows2012R2x64"


 
#This fixes an issue with chef solo provisioner where it can't resolve omitruck.chef.io the first time.
  config
.vm.provision "shell", inline: "Resolve-DnsName omnitruck.chef.io | out-null"


  config
.vm.provision :chef_solo do |chef|
    chef
.add_recipe "domain"
    chef
.cookbooks_path = "cookbooks"
 
end
end

I have been able to connect to the virtual machine manually via powershell and mstc. However if I try them from vagrant they fail with the above message.

Is there a way to change credentials after the chef_solo provisioner runs?

Wil T

unread,
Jan 24, 2016, 11:01:01 AM1/24/16
to Vagrant
Also here is a debug log if that helps:

Jules & Amy Clements

unread,
Jan 17, 2017, 2:54:36 PM1/17/17
to Vagrant
On Vagrant 1.9.1 DC promotion sometimes works, I tried the following:

From https://github.com/rgl/windows-domain-controller-vagrant/blob/master/provision/domain-controller.ps1
Use in combination with Reload plugin, -NoRebootOnCompletion as it causes Vagrant to fail (raise_if_auth_error) after
   ==> dc: Message        : You must restart this computer to complete the operation.
   ==> dc: Context        : DCPromo.General.4
   ==> dc: RebootRequired : True
   ==> dc: Status         : Success

https://github.com/dbroeglin/windows-lab/blob/master/provision/02_install_forest.ps1
Tried putting a sleep in, but reboot triggered a Vagrant error
rescue in block in parse_header': HTTPClient::KeepAliveDisconnected: An existing connection was forcibly closed by the remote host. @ io_fillbuf - fd:3  (HTTPClient::KeepAliveDisconnected)

This has given move promising results
From https://github.com/mitchellh/vagrant/issues/6430
Add to the Vagrantfile
   config.winrm.retry_limit = 60
   config.winrm.retry_delay = 10

However, I have encountered "Server instance not found on the given port." on subsequent operations. I shall continue to provide updates if I make any progress on this extremely frustrating objective.

Jules & Amy Clements

unread,
Jan 17, 2017, 9:42:18 PM1/17/17
to Vagrant
I've increased the retry_limit from 60 to 200, added a reload after provisioning and then a 1 minute sleep to allow the DC to settle, avoiding "InitializeDefaultDrives operation on the 'ActiveDirectory' provider failed" or "Server instance not found on the given port.". Resulting Vagranfile snippet

      override.winrm.retry_limit = 200
      override.winrm.retry_delay = 10
      override.vm.synced_folder "../.provision", "/.provision"
      override.vm.provision 'shell', path: './automation/remote/capabilities.ps1'
      override.vm.provision 'shell', path: './automation/provisioning/setStaticIP.ps1', args: '172.16.17.102'
      override.vm.provision 'shell', path: './automation/provisioning/newForest.ps1', args: 'sky.net hUspefRuKapaga6P8Reh c:\.provision\w2k12R2.wim 2'

Jules & Amy Clements

unread,
Jan 18, 2017, 1:55:24 PM1/18/17
to Vagrant
Even with an extremely high limit, it is still failing, I'm not convinced this setting even does anything.

override.winrm.retry_limit = 200

Alvaro Miranda Aguilera

unread,
Jan 18, 2017, 2:47:39 PM1/18/17
to vagra...@googlegroups.com
Hello

On windows, is  normal practice to schedule the scripts to run with the task schduler. Say to start in 1 minute from now.

then vagrant will stop the provisioning and 1 min later the DC promotion/reboot will happen.

If your script keep the session open and reboot/disconnect, vagrant will fail as you have found.

I am not aware of vagrant being able to handle reboots in/between script provision.

Alvaro

--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
 
GitHub Issues: https://github.com/mitchellh/vagrant/issues
IRC: #vagrant on Freenode
---
You received this message because you are subscribed to the Google Groups "Vagrant" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vagrant-up+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/vagrant-up/386b2ff7-cad0-4be7-9a97-c88b7d36e649%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Alvaro

Jules & Amy Clements

unread,
Jan 19, 2017, 3:28:06 PM1/19/17
to Vagrant
As at Vagrant 1.9.1 the reboot during provisioning is successful around 75% of the time. I believe that if the host is performing well, the reboot will execute within the Vagrant reconnect maximum and the provisioning will proceed. I've seen two different strategies from other users (see links above) where the DC promotion is attempted with reboot disabled and the other attempts to insert a sleep. I suspect these may have worked in previous versions of Vagrant consistently fail in 1.9.1. Because the reconnect works most of the time, I'm confident an increase in the number of retries would overcome this occasional failure.


On Thursday, January 19, 2017 at 8:47:39 AM UTC+13, Alvaro Miranda Aguilera wrote:
Hello

On windows, is  normal practice to schedule the scripts to run with the task schduler. Say to start in 1 minute from now.

then vagrant will stop the provisioning and 1 min later the DC promotion/reboot will happen.

If your script keep the session open and reboot/disconnect, vagrant will fail as you have found.

I am not aware of vagrant being able to handle reboots in/between script provision.

Alvaro
On Wed, Jan 18, 2017 at 7:55 PM, Jules & Amy Clements <v8l...@gmail.com> wrote:
Even with an extremely high limit, it is still failing, I'm not convinced this setting even does anything.

override.winrm.retry_limit = 200

rescue in block in parse_header': HTTPClient::KeepAliveDisconnected

--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
 
GitHub Issues: https://github.com/mitchellh/vagrant/issues
IRC: #vagrant on Freenode
---
You received this message because you are subscribed to the Google Groups "Vagrant" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vagrant-up+...@googlegroups.com.



--
Alvaro

Alvaro Miranda Aguilera

unread,
Jan 19, 2017, 4:25:58 PM1/19/17
to vagra...@googlegroups.com
Are you creating different domains? creating a single domain?

Have you look into packer? perhaps the best will be create a VM that is DC already.

Alvaro

To unsubscribe from this group and stop receiving emails from it, send an email to vagrant-up+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/vagrant-up/44a0d723-0ad5-4ab3-9827-cfb94d6ea81a%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Alvaro

Jules & Amy Clements

unread,
Jan 23, 2017, 2:40:39 PM1/23/17
to Vagrant
It's a single forest/domain, just enough to support KERBEROS and SPN delegation, i.e. the lab creates the DC, the web server (IIS), the database server (SQL Server) and a build agent (with Visual Studio installed), all up takes 2 hours. I desperately want to avoid having to own images as my users would then need to download two images (one for the DC and another for the member servers) and I don't have the capacity to keep an additional image updated.

When I get some time I will try a rummage through the ruby code for the WinRM connector to see if I can increase the connection timeout. Currently I'm getting around 80% success rate on the nightly regression run by clean starting the agent (host) just before the run so the machine resources are performing at their best.



--
Alvaro

Jules & Amy Clements

unread,
Feb 6, 2017, 5:47:24 PM2/6/17
to Vagrant
So the solution I have resorted to is to drop out to the host, apply a powershell wrapper around the DC instantiation and provisioning, with traps and tests for WinRM connectivity being re-established, then dropping back to the parent session and provision the remain hosts.

Vagrant.configure(2) do |allhosts|
  puts 'Use script wrapper to manage Domain Controller intermittent failures'
  puts "vagrant action  = #{ARGV[0]}"
  puts "first argument  = #{ARGV[1]}"
  puts "second argument = #{ARGV[2]}"
  result = system("./automation/provisioning/runner.bat ./automation/provisioning/dcProvision.ps1 #{ARGV[0]} #{ARGV[1]} #{ARGV[2]}")
  puts "result        = #{result}"

This is a far from ideal implementation but it works. The runner.bat is a simple handler for switching from CMD to PowerShell.

set "back=\"
set "command=%cd%%back%%1"
echo %command% %2 %3 %4 %5 %6 %7 %8 %9
call powershell -NoProfile -ExecutionPolicy ByPass -command %command% %2 %3 %4 %5 %6 %7 %8 %9

Jules & Amy Clements

unread,
Feb 17, 2017, 3:58:55 PM2/17/17
to Vagrant
Script wrapper was not providing me with the stability I desired so I have ended up with two solutions. For one project I've created a pre-provisioned domain controller


For another, I'm continuing to provision the AD role, (this run nightly as part of a regression suite and I now have 9 consequtive successes), the script (see http://cdaf.io) executes Forest creationg with -NoRebootOnCompletion and then performs a restart explicitely, this (so far) appeases Vagrants WinRM connection tests, the connection is re-established and provisioning proceeds.

==> dc: [newForest.ps1] Install-ADDSForest -Force -NoRebootOnCompletion -DomainName "sky.net" -SafeModeAdministratorPassword $securePassword
==> dc: Message        : You must restart this computer to complete the operation.
==> dc:                 
==> dc: Context        : DCPromo.General.4
==> dc: RebootRequired : True
==> dc:
==> dc: Status         : Success
==> dc:
==> dc: [newForest.ps1] shutdown /r /t 0
==> dc: [newForest.ps1] ---------- stop ----------
==> dc: [newForest.ps1] ---------- start ----------
==> dc:
==> dc: [newForest.ps1] New Active Directory Forest, requires Windows Server 2012 and above.
==> dc:
==> dc: [newForest.ps1] forest        : sky.net
==> dc: [newForest.ps1] password      : **********
==> dc: [newForest.ps1] media         : c:\.provision\w2k12R2.wim
==> dc:
==> dc: [newForest.ps1] wimIndex      : 2
==> dc: [newForest.ps1] controlReboot : yes (default)
==> dc: Host WIN-HAFG83EVPR4 verified domain member of sky.net
==> dc:
==> dc: This is normal in Vagrant run after reboot for the provisioner to re-run.
==> dc:
==> dc: [newForest.ps1] ---------- stop ----------

Note: the Domain Controller image was built using the same script as is executed nightly.

Reply all
Reply to author
Forward
0 new messages