Hi there,
I'm hopefully in the right place with my question and if yes, then it's going to be quite a headache one :)
I have a server (HP DL380G8p) with 2 x 12 core CPUs (Intel E5-2695) + 768 GB Ram.
On top of it I run an ESX 6.5 and then my lab experiments.
So: ESX --> Ubuntu/Debian --> vagrant libvirt --> Cumulus VX topology/VMs / or just a simple Ubuntu vagrant box
Recently I wanted to give Vagrant a try so:
- I installed an Ubuntu 16.04.3 LTS, then an 17.04 and also a Debian 9.1 (trying to see if kernel version, other distro packages are having an influence on this)
- From my notes so far:
- libvirt -3.0
- vagrant 2.0
- kernel 4.90.3-amd64
- Virtual machine with Linux/Vagrant/Qemu with 64GB Ram reserved for (no memory ballooning, etc)
- They start with vagrant Linux VMs + I also tried with a simple Vagrant Ubuntu box (no special settings)
- All the time after "vagrant up", if I do a "virsh console <id>", I get:
Vagrant:
==> leaf-r01n01: Creating shared folders metadata...
==> leaf-r01n01: Starting domain.
==> leaf-r01n01: Waiting for domain to get an IP address...
Virsh:
Connected to domain topology_converter_leaf-r01n01
Escape character is ^]
[ 74.380034] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 74.380034] (detected by 0, t=60002 jiffies, g=3030, c=3029, q=23)
[ 74.380034] All QSes seen, last rcu_sched kthread activity 60002 (4294741676-4294681674), jiffies_till_next_fqs=3, root ->qsmask 0x0
[ 74.380034] rcu_sched kthread starved for 60002 jiffies!
[ 100.132032] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [dhclient:2179]
[ 128.132031] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [dhclient:2179]
[ 156.132037] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [dhclient:2179]
I suspected this to be just another qemu nested setup problem BUT I do have another 2 VMs where I play with qemu and all works fine.
This might mean that it is related to the parameters that vagrant -> libvirt -> uses to start qemu/kvm machines.
If it is, then the question is what exactly is causing this behavior?
On top of this, I tried also:
- to disable acpi on boot
- reserve also CPU resources on VMWare
- I checked to be sure that VMWare exported the hardware virtualization flags (for nested virtualization)
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss ht syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq vmx ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm epb tpr_shadow vnmi ept vpid fsgsbase tsc_adjust smep dtherm ida arat pln pts
Vagrant starts the qemu VM with these parameters:
/usr/bin/qemu-system-x86_64 -name guest=topology_converter_leaf-r01n01,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-3-topology_converter_l/master-key.aes -machine pc-i440fx-zesty,accel=kvm,usb=off,dump-guest-core=off -cpu IvyBridge,+ds,+ss,+ht,+vmx,+pcid,+osxsave,+hypervisor,+arat,+tsc_adjust -m 512 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 7e4de018-2e63-4686-a110-4cec6d587758 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-3-topology_converter_l/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device ahci,id=sata0,bus=pci.0,addr=0x3 -drive file=/var/lib/libvirt/images/topology_converter_leaf-r01n01.img,format=qcow2,if=none,id=drive-sata0-0-0 -device ide-hd,bus=sata0.0,drive=drive-sata0-0-0,id=sata0-0-0,bootindex=1 -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=28 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:86:cd:fe,bus=pci.0,addr=0x5 -netdev socket,udp=
127.0.0.1:9008,localaddr=
127.0.0.1:8008,id=hostnet1 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=44:38:39:00:00:0d,bus=pci.0,addr=0x6 -netdev socket,udp=
127.0.0.1:8002,localaddr=
127.0.0.1:9002,id=hostnet2 -device virtio-net-pci,netdev=hostnet2,id=net2,mac=44:38:39:00:00:04,bus=pci.0,addr=0x7 -netdev socket,udp=
127.0.0.1:8005,localaddr=
127.0.0.1:9005,id=hostnet3 -device virtio-net-pci,netdev=hostnet3,id=net3,mac=44:38:39:00:00:0a,bus=pci.0,addr=0x8 -netdev socket,udp=
127.0.0.1:8004,localaddr=
127.0.0.1:9004,id=hostnet4 -device virtio-net-pci,netdev=hostnet4,id=net4,mac=44:38:39:00:00:08,bus=pci.0,addr=0x9 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc
127.0.0.1:0 -k en-us -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on
Now I ran out of ideas and this is driving me crazy already.
Does anyone have any clue as to what else I should try?
I'm afraid I reached a deadend.
Thanks in advance and hoping for a life saving solution to get me unstuck :)