Vagrant + libvirt under VMWare ESX 6.5 Host (nested)

Mihai

unread,

Oct 6, 2017, 9:00:01 AM10/6/17

to Vagrant

Hi there,

I'm hopefully in the right place with my question and if yes, then it's going to be quite a headache one :)

I have a server (HP DL380G8p) with 2 x 12 core CPUs (Intel E5-2695) + 768 GB Ram.

On top of it I run an ESX 6.5 and then my lab experiments.

So: ESX --> Ubuntu/Debian --> vagrant libvirt --> Cumulus VX topology/VMs / or just a simple Ubuntu vagrant box

Recently I wanted to give Vagrant a try so:

- I installed an Ubuntu 16.04.3 LTS, then an 17.04 and also a Debian 9.1 (trying to see if kernel version, other distro packages are having an influence on this)

- From my notes so far:

- libvirt -3.0

- vagrant 2.0

- kernel 4.90.3-amd64

- Virtual machine with Linux/Vagrant/Qemu with 64GB Ram reserved for (no memory ballooning, etc)

- I cloned these repositories: https://github.com/CumulusNetworks/topology_converter.git

and https://github.com/CumulusNetworks/topology_converter.git

- They start with vagrant Linux VMs + I also tried with a simple Vagrant Ubuntu box (no special settings)

- All the time after "vagrant up", if I do a "virsh console <id>", I get:

Vagrant:

==> leaf-r01n01: Creating shared folders metadata...

==> leaf-r01n01: Starting domain.

==> leaf-r01n01: Waiting for domain to get an IP address...

Virsh:

Connected to domain topology_converter_leaf-r01n01

Escape character is ^]

[ 74.380034] INFO: rcu_sched detected stalls on CPUs/tasks:

[ 74.380034] (detected by 0, t=60002 jiffies, g=3030, c=3029, q=23)

[ 74.380034] All QSes seen, last rcu_sched kthread activity 60002 (4294741676-4294681674), jiffies_till_next_fqs=3, root ->qsmask 0x0

[ 74.380034] rcu_sched kthread starved for 60002 jiffies!

[ 100.132032] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [dhclient:2179]

[ 128.132031] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [dhclient:2179]

[ 156.132037] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [dhclient:2179]

I suspected this to be just another qemu nested setup problem BUT I do have another 2 VMs where I play with qemu and all works fine.

This might mean that it is related to the parameters that vagrant -> libvirt -> uses to start qemu/kvm machines.

If it is, then the question is what exactly is causing this behavior?

On top of this, I tried also:

- to disable acpi on boot

- reserve also CPU resources on VMWare

- I checked to be sure that VMWare exported the hardware virtualization flags (for nested virtualization)

flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss ht syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq vmx ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm epb tpr_shadow vnmi ept vpid fsgsbase tsc_adjust smep dtherm ida arat pln pts

Vagrant starts the qemu VM with these parameters:

/usr/bin/qemu-system-x86_64 -name guest=topology_converter_leaf-r01n01,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-3-topology_converter_l/master-key.aes -machine pc-i440fx-zesty,accel=kvm,usb=off,dump-guest-core=off -cpu IvyBridge,+ds,+ss,+ht,+vmx,+pcid,+osxsave,+hypervisor,+arat,+tsc_adjust -m 512 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 7e4de018-2e63-4686-a110-4cec6d587758 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-3-topology_converter_l/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device ahci,id=sata0,bus=pci.0,addr=0x3 -drive file=/var/lib/libvirt/images/topology_converter_leaf-r01n01.img,format=qcow2,if=none,id=drive-sata0-0-0 -device ide-hd,bus=sata0.0,drive=drive-sata0-0-0,id=sata0-0-0,bootindex=1 -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=28 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:86:cd:fe,bus=pci.0,addr=0x5 -netdev socket,udp=127.0.0.1:9008,localaddr=127.0.0.1:8008,id=hostnet1 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=44:38:39:00:00:0d,bus=pci.0,addr=0x6 -netdev socket,udp=127.0.0.1:8002,localaddr=127.0.0.1:9002,id=hostnet2 -device virtio-net-pci,netdev=hostnet2,id=net2,mac=44:38:39:00:00:04,bus=pci.0,addr=0x7 -netdev socket,udp=127.0.0.1:8005,localaddr=127.0.0.1:9005,id=hostnet3 -device virtio-net-pci,netdev=hostnet3,id=net3,mac=44:38:39:00:00:0a,bus=pci.0,addr=0x8 -netdev socket,udp=127.0.0.1:8004,localaddr=127.0.0.1:9004,id=hostnet4 -device virtio-net-pci,netdev=hostnet4,id=net4,mac=44:38:39:00:00:08,bus=pci.0,addr=0x9 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:0 -k en-us -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on

Now I ran out of ideas and this is driving me crazy already.

Does anyone have any clue as to what else I should try?

I'm afraid I reached a deadend.

Thanks in advance and hoping for a life saving solution to get me unstuck :)

Alvaro Miranda Aguilera

unread,

Oct 11, 2017, 4:15:45 AM10/11/17

to vagra...@googlegroups.com

Hello

If on the VM on top of ESXI full VT / Virtualization is enabled, then for the VM should be transparent create a nested setup.

So I would expend bit more time on the VM in ESXi and see if you are missing some flag.

Alvaro

--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.

GitHub Issues: https://github.com/mitchellh/vagrant/issues
IRC: #vagrant on Freenode
---
You received this message because you are subscribed to the Google Groups "Vagrant" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vagrant-up+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/vagrant-up/91da599d-0804-4e48-9b79-01545c33ea0b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Alvaro

Mihai Tanasescu

unread,

Oct 11, 2017, 8:16:17 AM10/11/17

to vagra...@googlegroups.com

Hi,

Thanks for the tip.

I got a simple Ubuntu VM to work in the meantime, no special Vagrant config, very basic and it booted.

It seems that it has something to do with the Cumulus Topology Converter Repository Vagrant setup....but I can't figure out for now what init parameters is causing all this strange behavior.

I will invest more time toward the end of the week with this and if I locate the solution, post it here.

Regards,

Mihai

You received this message because you are subscribed to a topic in the Google Groups "Vagrant" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/vagrant-up/BRlwTChzAuo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to vagrant-up+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/vagrant-up/CAHqq0eybxHwgqACMe%2BFQSwaUkB9njZRsJ1C8FVBg8%2BQWBARrxA%40mail.gmail.com.

Mihai

unread,

Nov 7, 2017, 9:49:11 AM11/7/17

to Vagrant

Hi all,

Forgot to add any updates here but in the meantime I figured out one thing.

Ubuntu Trusty works.

Ubuntu Xenial does not..always the same lockup cycle (and Cumulus VMs are based on Xenial).

The ESX does export all the virtualization flags so I'm not sure what Xenial does not like but Trusty does....

This seems like an illogical issue.

flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq vmx ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm epb tpr_shadow vnmi ept vpid fsgsbase tsc_adjust smep dtherm ida arat pln pts