Migration fail: Unable to receive data from KVM using the QMP protocol

10 views
Skip to first unread message

Jens Larsson

unread,
Oct 27, 2025, 12:58:52 PMOct 27
to gan...@googlegroups.com
Hi,

I searched this groups and Google, but I don't find any good leads on this.

There is an issue with migration on my Ganeti clusters. Most VM:s work
without issue, but a few fail sometimes with this error:


[root@sheridan ~]# gnt-instance migrate draal
Instance draal will be migrated. Note that migration might impact the
instance if anything goes wrong (e.g. due to bugs in the hypervisor).
Continue?
y/[n]/?: y
Mon Oct 27 15:40:26 2025 Migrating instance draal.swestore.se
Mon Oct 27 15:40:26 2025 * checking disk consistency between source
and target
Mon Oct 27 15:40:26 2025 * closing instance disks on node
ivanova.swestore.se
Mon Oct 27 15:40:26 2025 * changing into standalone mode
Mon Oct 27 15:40:26 2025 * changing disks into dual-master mode
Mon Oct 27 15:40:28 2025 * wait until resync is done
Mon Oct 27 15:40:28 2025 * opening instance disks on node
sheridan.swestore.se in shared mode
Mon Oct 27 15:40:28 2025 * opening instance disks on node
ivanova.swestore.se in shared mode
Mon Oct 27 15:40:28 2025 * preparing ivanova.swestore.se to accept the
instance
Mon Oct 27 15:40:28 2025 * migrating instance to ivanova.swestore.se
Mon Oct 27 15:40:28 2025 * starting memory transfer
Mon Oct 27 15:40:31 2025 * memory transfer complete
Failure: command execution error:
Could not finalize instance migration: ivanova.swestore.se: Failed to
finalize migration on the target node: Unable to receive data from KVM
using the QMP protocol: [Errno 104] Connection reset by peer
[root@sheridan ~]#


Unfortunately this makes the whole live migration thing a bit to
unreliable to be used in production. It looks like the VM is started on
the receiving end, the memory is transferred, but then something happens
and the VM dies. I have not been able to figure out the exact order of
events in the end of the transaction. Is the VM killed due to some
communication issue or does the VM die and the error is when Ganeti
tries to talk to it? This used to work well, but I don't now when it
started to fail. I think it's older than 3.1.0 though.


Rocky 9
ganeti-3.1.0-1.el9 (but I think this happened in 3.0.2 too)
qemu-kvm-9.1.0-15.el9_6.9


I see this in the messages file on the receiver:

Oct 27 15:40:32 ivanova systemd-coredump[1633436]: Removed old coredump
core.qemu-kvm.0.c543d3dd011641cbb95d4e75ef66ed8f.1632971.1761575944000000.zst.
Oct 27 15:40:32 ivanova systemd-coredump[1633436]: Process 1633394
(qemu-kvm) of user 0 dumped core.#012#012Stack trace of thread
1633428:#012#0 0x00002b6f2c28bedc __pthread_kill_implementation
(/usr/lib64/libc.so.6 + 0x8bedc)#012#1 0x00002b6f2c23eb46 raise
(/usr/lib64/libc.so.6 + 0x3eb46)#012#2 0x00002b6f2c228833 abort
(/usr/lib64/libc.so.6 + 0x28833)#012#3 0x00002b6f2c22875b
__assert_fail_base.cold (/usr/lib64/libc.so.6 + 0x2875b)#012#4
0x00002b6f2c237886 __assert_fail (/usr/lib64/libc.so.6 + 0x37886)#012#5
0x0000564d81e3706a n/a (n/a + 0x0)#012ELF object binary architecture:
AMD x86-64


There is a core file too!:

[root@ivanova /var/lib/systemd/coredump]# ls -l
-rw-r----- 1 root root 30357658 Oct 27 15:40
core.qemu-kvm.0.c543d3dd011641cbb95d4e75ef66ed8f.1633394.1761576030000000.zst

But I don't know what to do with it.


Should I look for a more modern qemu-kvm? The qemu-kvm package in Rocky
9 is the 9.1.0 release. It is, funny enough, exactly to the day one year
old and there are plenty of releases after this. Or are there settings
that can affect this? I'm running with cpu_type=host and
machine_version=pc (which is pc-i440fx-rhel7.6.0).


Has anyone seen similar things? Or do you all run on Ubuntu? :-)

/jens




Misc log entries:

sheridan:/var/log/ganeti/kvm/draal.swestore.se.log

qemu-kvm: warning: Machine type 'pc-i440fx-rhel7.6.0' is deprecated:
machines from the previous RHEL major release are subject to deletion in
the next RHEL major release

This is probably something to look at in the future. The only other
option available is q35.



sheridan:/var/log/ganeti/commands.log

2025-10-27 15:40:32,560: gnt-instance migrate pid=869465 ERROR Error
during command processing
Traceback (most recent call last):
File "/usr/share/ganeti/3.1/ganeti/cli.py", line 1255, in GenericMain
result = func(options, args)
File "/usr/share/ganeti/3.1/ganeti/client/gnt_instance.py", line 821,
in MigrateInstance
SubmitOrSend(op, cl=cl, opts=opts)
File "/usr/share/ganeti/3.1/ganeti/cli.py", line 1045, in SubmitOrSend
return SubmitOpCode(op, cl=cl, feedback_fn=feedback_fn, opts=opts)
File "/usr/share/ganeti/3.1/ganeti/cli.py", line 1009, in SubmitOpCode
op_results = PollJob(job_id, cl=cl, feedback_fn=feedback_fn,
File "/usr/share/ganeti/3.1/ganeti/cli.py", line 988, in PollJob
return GenericPollJob(job_id, _LuxiJobPollCb(cl), reporter,
File "/usr/share/ganeti/3.1/ganeti/cli.py", line 787, in GenericPollJob
errors.MaybeRaise(msg)
File "/usr/share/ganeti/3.1/ganeti/errors.py", line 550, in MaybeRaise
raise errcls(*args)
ganeti.errors.OpExecError: Could not finalize instance migration:
ivanova.swestore.se: Failed to finalize migration on the target node:
Unable to receive data from KVM using the QMP pro
tocol: [Errno 104] Connection reset by peer



sheridan:/var/log/ganeti/jobs.log

2025-10-27 15:40:32,457: job-625776 pid=869466 ERROR Instance migration
succeeded, but finalization failed on the target node: Failed to
finalize migration on the target node: Unable to
receive data from KVM using the QMP protocol: [Errno 104] Connection
reset by peer
2025-10-27 15:40:32,459: job-625776 pid=869466 ERROR Op 1/1: Caught
exception in INSTANCE_MIGRATE(draal)
Traceback (most recent call last):
File "/usr/share/ganeti/3.1/ganeti/jqueue/__init__.py", line 933, in
_ExecOpCodeUnlocked
result = self.opexec_fn(op.input,
File "/usr/share/ganeti/3.1/ganeti/mcpu.py", line 705, in ExecOpCode
result = self._LockAndExecLU(lu, locking.LEVEL_CLUSTER + 1,
File "/usr/share/ganeti/3.1/ganeti/mcpu.py", line 631, in _LockAndExecLU
result = self._LockAndExecLU(lu, level + 1, calc_timeout,
File "/usr/share/ganeti/3.1/ganeti/mcpu.py", line 639, in _LockAndExecLU
result = self._LockAndExecLU(lu, level + 1, calc_timeout,
pending=pending)
File "/usr/share/ganeti/3.1/ganeti/mcpu.py", line 631, in _LockAndExecLU
result = self._LockAndExecLU(lu, level + 1, calc_timeout,
File "/usr/share/ganeti/3.1/ganeti/mcpu.py", line 631, in _LockAndExecLU
result = self._LockAndExecLU(lu, level + 1, calc_timeout,
File "/usr/share/ganeti/3.1/ganeti/mcpu.py", line 639, in _LockAndExecLU
result = self._LockAndExecLU(lu, level + 1, calc_timeout,
pending=pending)
File "/usr/share/ganeti/3.1/ganeti/mcpu.py", line 547, in _LockAndExecLU
result = self._ExecLU(lu)
File "/usr/share/ganeti/3.1/ganeti/mcpu.py", line 505, in _ExecLU
result = _ProcessResult(submit_mj_fn, lu.op, lu.Exec(self.Log))
File "/usr/share/ganeti/3.1/ganeti/cmdlib/base.py", line 351, in Exec
tl.Exec(feedback_fn)
File "/usr/share/ganeti/3.1/ganeti/cmdlib/instance_migration.py",
line 1156, in Exec
return self._ExecMigration()
File "/usr/share/ganeti/3.1/ganeti/cmdlib/instance_migration.py",
line 986, in _ExecMigration
raise errors.OpExecError(
ganeti.errors.OpExecError: Could not finalize instance migration:
ivanova.swestore.se: Failed to finalize migration on the target node:
Unable to receive data from KVM using the QMP pro
tocol: [Errno 104] Connection reset by peer
2025-10-27 15:40:32,478: job-625776 pid=869466 INFO Finished job 625776,
status = error



ivanova:/var/log/ganeti/node-daemon.log

2025-10-27 15:40:32,455: ganeti-noded pid=1633437 ERROR Failed to
finalize migration on the target node: Unable to receive data from KVM
using the QMP protocol: [Errno 104] Connection \
reset by peer
Traceback (most recent call last):
File "/usr/share/ganeti/3.1/ganeti/hypervisor/hv_kvm/monitor.py",
line 315, in recv_qmp
data = self.recv(4096)
File "/usr/share/ganeti/3.1/ganeti/hypervisor/hv_kvm/monitor.py",
line 231, in recv
return self.sock.recv(bufsize)
ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/share/ganeti/3.1/ganeti/backend.py", line 3115, in
FinalizeMigrationDst
hyper.FinalizeMigrationDst(instance, info, success)
File "/usr/share/ganeti/3.1/ganeti/hypervisor/hv_kvm/__init__.py",
line 2467, in FinalizeMigrationDst
self._ClearInstanceMigrationCapabilities(instance)
File "/usr/share/ganeti/3.1/ganeti/hypervisor/hv_kvm/__init__.py",
line 185, in wrapper
return fn(self, *args, **kwargs)
File "/usr/share/ganeti/3.1/ganeti/hypervisor/hv_kvm/__init__.py",
line 930, in _ClearInstanceMigrationCapabilities
self.qmp.SetMigrationCapabilities(migration_caps_list, False)
File "/usr/share/ganeti/3.1/ganeti/hypervisor/hv_kvm/monitor.py",
line 179, in wrapper
mon.connect()
File "/usr/share/ganeti/3.1/ganeti/hypervisor/hv_kvm/monitor.py",
line 417, in connect
greeting = self.recv_qmp()
File "/usr/share/ganeti/3.1/ganeti/hypervisor/hv_kvm/monitor.py",
line 329, in recv_qmp
raise errors.HypervisorError("Unable to receive data from KVM using
the"
ganeti.errors.HypervisorError: Unable to receive data from KVM using the
QMP protocol: [Errno 104] Connection reset by peer




Sascha Lucas

unread,
Oct 30, 2025, 11:45:42 AMOct 30
to gan...@googlegroups.com
Hi Jens,

thanks for providing detailed information. If your problem ocuoured also
with Ganeti-3.0 and now with 3.1 it would indicate not a general problem
in 3.1, which I hope.

First, using machine=pc is unstable across qemu-kvm upgrades. pc is an
alios to the current qemu-kvm version. it's better to explicit
pc-i440fx-rhel7.6.0. The same is regarding cpu=host and rolling node
replacements with newer CPUs. But I assume your settings are intentional.

Your analysis seems right. The receiving side crashes after successful
live migration.

On Mon, 27 Oct 2025, Jens Larsson wrote:

> ivanova:/var/log/ganeti/node-daemon.log
...
> File "/usr/share/ganeti/3.1/ganeti/hypervisor/hv_kvm/__init__.py", line
> 930, in _ClearInstanceMigrationCapabilities
> self.qmp.SetMigrationCapabilities(migration_caps_list, False)
> File "/usr/share/ganeti/3.1/ganeti/hypervisor/hv_kvm/monitor.py", line 179,
> in wrapper
> mon.connect()
> File "/usr/share/ganeti/3.1/ganeti/hypervisor/hv_kvm/monitor.py", line 417,
> in connect
> greeting = self.recv_qmp()
> File "/usr/share/ganeti/3.1/ganeti/hypervisor/hv_kvm/monitor.py", line 329,
> in recv_qmp
> raise errors.HypervisorError("Unable to receive data from KVM using the"
> ganeti.errors.HypervisorError: Unable to receive data from KVM using the QMP
> protocol: [Errno 104] Connection reset by peer

The relevant code path[1] indicates, that you have `migration_caps` set.
Can you please sent this instance HV parameters?

It seems that qemu crashes while unsetting the migration_caps. Else there
seems noting to "finalize".

Thanks, Sascha.

[1] https://github.com/ganeti/ganeti/blob/78174afe00428b855d4a255f509a347227522b14/lib/hypervisor/hv_kvm/__init__.py#L916-L925
Reply all
Reply to author
Forward
0 new messages