Hi,
This is a non converging migration, which Ganeti can not break by itself.
The effected instance changes memory faster than your migration settings /
network can handle.
You have the following options:
* If you can control the effected instance from inside, stop the process
that changes memory (mostly java). Then migration should converge.
* If not, break the current migration manually: on the primary node of
the effected instance run the following command:
echo "migrate_cancel" | socat stdio unix-connect:/var/run/ganeti/kvm-hypervisor/ctrl/<instance-name>.monitor
For the next migration, tune your migration settings:
* set migration_bandwidth to ~2/3 of the NIC speed. This is
- ~83 on 1G
- ~833 on 10G
- etc.
* set migration_downtime to 1000 (meaning one second)
If all this does not help, the brave admin changes to post-copy migration.
Set either at cluster level for all instances or just on effected
instance: migration_caps=postcopy-ram. With that the migration is
guarantied to finish.
HTH, Sascha.