KVM live migration issues

989 views
Skip to first unread message

Lucas, Sascha

unread,
Jul 19, 2013, 5:42:08 AM7/19/13
to gan...@googlegroups.com
Hi,

I'm doing live migration with KVM (disk type sharedfile -> NFS). It seems to work in the lab, but in production I'm getting sometimes failures. Following I see:

* a receiving instance on the destination node was started
* the sending instance hangs/freezes with the start of live migration
* top on src node shows n-times 100% of CPU load, where n is the number of VCPUs the instance has
* on the source node I see echo "info migrate" and socat. Using strace on the socat-PID shows doing "connect" hanging forever.
* killing socat/echo leads to a failed migration (rcv. instance got killed, Ganeti sends "cont" to the sender qemu monitor, which still hangs)
* killing "cont" the machine comes back with following kernel message:

[81847.957324] INFO: rcu_sched_state detected stalls on CPUs/tasks: { 0 1} (detected by 5, t=15442 jiffies)
[81847.957344] sending NMI to all CPUs:
... followed by a trace on every CPU

* running live migration again on the same instance, same destination node works without problems

Does anyone know what's happening? System is SLES11-SP2/kvm-0.15.

In issue 492: socat timeouts in KVM Hypervisor, it is said, that in some environments the 500ms do not suffice to get the answer to 'info migrate'. What are the conditions in this environments. Could it be, that my issue is related?

Thanks, Sascha.



Vorsitzender des Aufsichtsrates: Ralf Hiltenkamp
Geschäftsführung: Michael Krüger (Sprecher), Stephan Drescher
Sitz der Gesellschaft: Halle/Saale
Registergericht: Amtsgericht Stendal | Handelsregister-Nr. HRB 208414
UST-ID-Nr. DE 158253683
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Empfänger sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail oder des Inhalts dieser Mail sind nicht gestattet. Diese Kommunikation per E-Mail ist nicht gegen den Zugriff durch Dritte geschützt. Die GISA GmbH haftet ausdrücklich nicht für den Inhalt und die Vollständigkeit von E-Mails und den gegebenenfalls daraus entstehenden Schaden. Sollte trotz der bestehenden Viren-Schutzprogramme durch diese E-Mail ein Virus in Ihr System gelangen, so haftet die GISA GmbH - soweit gesetzlich zulässig - nicht für die hieraus entstehenden Schäden.


Lucas, Sascha

unread,
Jul 19, 2013, 11:19:05 AM7/19/13
to gan...@googlegroups.com
From: Sascha Lucas
Date: Fri, 19. July 2013 11:42

> I'm doing live migration with KVM (disk type sharedfile -> NFS). It
> seems to work in the lab, but in production I'm getting sometimes
> failures.

I figured out, that the difference from lab to production is the RAM-size of the instances. Small instances (8GB) migrate fine. The bigger the RAM is set, the more the instance is stalled.

The number of ICMP ping loses before switching over to the destination node, while migration is running, are:

* 10GB -> 2 ping loses
* 30GB -> 30 ping loses

It doesn't matter how much memory the instance has actually committed. Tests were done with a fresh rebooted instance having about 1GB committed.

Does this makes any sense?

Helga Velroyen

unread,
Jul 19, 2013, 11:28:59 AM7/19/13
to gan...@googlegroups.com
Does the same thing happen if you live-migrate a kvm instance with that much RAM manually (without the help of Ganeti)?


2013/7/19 Lucas, Sascha <Sascha...@gisa.de>

Lucas, Sascha

unread,
Jul 23, 2013, 2:17:58 PM7/23/13
to gan...@googlegroups.com
Hi Helga,

From: Helga Velroyen
Date: Fri, 19. July 2013 17:29

> Does the same thing happen if you live-migrate a kvm instance with that much RAM manually (without the help of Ganeti)?

ATM I didn't test this. But I'm sure it's not a Ganeti issue. Are related issue welcome on this list?

What I found so far is, that my problem can be a known one[1]. Even if my kvm-0.15 is very old, I wonder if there is someone else migrating "not so small" VMs and having stalls/downtimes on the beginning of live migration?

[1] http://www.linux-kvm.org/wiki/images/6/66/2012-forum-live-migration.pdf

Helga Velroyen

unread,
Jul 24, 2013, 3:27:31 AM7/24/13
to gan...@googlegroups.com
Hi Lucas,


> Does the same thing happen if you live-migrate a kvm instance with that much RAM manually (without the help of Ganeti)?

ATM I didn't test this. But I'm sure it's not a Ganeti issue. Are related issue welcome on this list?

They are not unwelcome, but I'm afraid it is less likely to get a helpful answer here than on the kvm/qemu mailinglist.
 

What I found so far is, that my problem can be a known one[1]. Even if my kvm-0.15 is very old, I wonder if there is someone else migrating "not so small" VMs and having stalls/downtimes on the beginning of live migration?

[1] http://www.linux-kvm.org/wiki/images/6/66/2012-forum-live-migration.pdf


We haven't encountered this yet, but maybe someone else on the list has.

Cheers,
Helga

Roberto Espinoza

unread,
Jul 24, 2013, 3:37:06 AM7/24/13
to gan...@googlegroups.com
I was to add that I have noticed this kind of behavior too.
I never found out a solution but I also lose a few pings when migrating KVM.

For the purpose of this instances, that kind of downtime is not so bad for us but of course it would be nice to see no downtime.

And just like yourself, the bigger and RAM, the more it stalls while live migrating.

No errors on either side though.


Cheers,
Roberto

tschend

unread,
Jul 24, 2013, 6:46:55 AM7/24/13
to gan...@googlegroups.com
Hi,

we have also a bunch of big instances which 32 GB of mem.

We see a similar behavior. When memory transfer starts we loose 4 pings and then, while the transfer is in progress we also loose a few. In total it is about 30 to 40 pings because the instance changes the memory frequently.

We use wheezy with 3.2.0-4-amd64 #1 SMP Debian 3.2.46-1 x86_64 GNU/Linux kernel and kvm 1.2.
The switches are a stack of 10 Gbe.

Maybe someone has a hint.

Regards
Thomas

tschend

unread,
Jul 24, 2013, 7:15:08 AM7/24/13
to gan...@googlegroups.com
Hi,

i did a few more tests. If i have a fresh booted vm which is not really using the memory the transfer only "hangs" one second. That's it.

The problem is only when the vm uses the memory and the pings get lost and vm hangs most ot the time while transfering the memory.

This was testes with a 32GB machine with our windows 2008 R2 template.

It looks like kvm has a problem only after taking the memory bitmap / start of the memory transfer?

Can anyone confirm?

Regards
Thomas

Am Freitag, 19. Juli 2013 11:42:08 UTC+2 schrieb sascha:

tschend

unread,
Jul 25, 2013, 2:30:42 AM7/25/13
to gan...@googlegroups.com
Hi agian,

i have done some more testing and it seems not to be a problem with CPU count.

No matter if there are 2,4,6 or 8 vCPUs there is no problem with a small amount of memory.

Here is a log of a vm with 1 vCPU and 32G of RAM

gnt-instance migrate output:

Thu Jul 25 08:21:02 2013 Migrating instance ms-test01
Thu Jul 25 08:21:02 2013 * checking disk consistency between source and target
Thu Jul 25 08:21:03 2013 * switching node sgntko21.tsd.lab to secondary mode
Thu Jul 25 08:21:03 2013 * changing into standalone mode
Thu Jul 25 08:21:03 2013 * changing disks into dual-master mode
Thu Jul 25 08:21:04 2013 * wait until resync is done
Thu Jul 25 08:21:04 2013 * preparing sgntko21.tsd.lab to accept the instance
Thu Jul 25 08:21:05 2013 * migrating instance to sgntko21.iaas.cgm.ag
--> VM starts stalling
Thu Jul 25 08:21:07 2013 * starting memory transfer
Thu Jul 25 08:21:18 2013 * memory transfer progress: 1.27 %
Thu Jul 25 08:21:30 2013 * memory transfer progress: 2.06 %
--> VM is running fine again
Thu Jul 25 08:21:32 2013 * memory transfer complete
Thu Jul 25 08:21:33 2013 * switching node sgntko22.tsd.lab to secondary mode
Thu Jul 25 08:21:33 2013 * wait until resync is done
Thu Jul 25 08:21:33 2013 * changing into standalone mode
Thu Jul 25 08:21:34 2013 * changing disks into single-master mode
Thu Jul 25 08:21:35 2013 * wait until resync is done
Thu Jul 25 08:21:35 2013 * done

and here is a ping with timestamp


Thu Jul 25 08:20:58 CEST 2013: 64 bytes from 10.0.6.90: icmp_req=5 ttl=128 time=0.339 ms
Thu Jul 25 08:20:59 CEST 2013: 64 bytes from 10.0.6.90: icmp_req=6 ttl=128 time=1.93 ms
Thu Jul 25 08:21:00 CEST 2013: 64 bytes from 10.0.6.90: icmp_req=7 ttl=128 time=2.02 ms
Thu Jul 25 08:21:01 CEST 2013: 64 bytes from 10.0.6.90: icmp_req=8 ttl=128 time=0.274 ms
Thu Jul 25 08:21:02 CEST 2013: 64 bytes from 10.0.6.90: icmp_req=9 ttl=128 time=0.245 ms
Thu Jul 25 08:21:03 CEST 2013: 64 bytes from 10.0.6.90: icmp_req=10 ttl=128 time=0.266 ms
Thu Jul 25 08:21:04 CEST 2013: 64 bytes from 10.0.6.90: icmp_req=11 ttl=128 time=0.284 ms
Thu Jul 25 08:21:05 CEST 2013: 64 bytes from 10.0.6.90: icmp_req=12 ttl=128 time=3.58 ms
Thu Jul 25 08:21:06 CEST 2013: 64 bytes from 10.0.6.90: icmp_req=13 ttl=128 time=0.254 ms
--> VM starts stalling
Thu Jul 25 08:21:08 CEST 2013: 64 bytes from 10.0.6.90: icmp_req=14 ttl=128 time=1004 ms
Thu Jul 25 08:21:09 CEST 2013: 64 bytes from 10.0.6.90: icmp_req=16 ttl=128 time=10.8 ms
Thu Jul 25 08:21:10 CEST 2013: 64 bytes from 10.0.6.90: icmp_req=17 ttl=128 time=199 ms
Thu Jul 25 08:21:11 CEST 2013: 64 bytes from 10.0.6.90: icmp_req=18 ttl=128 time=127 ms
Thu Jul 25 08:21:15 CEST 2013: 64 bytes from 10.0.6.90: icmp_req=19 ttl=128 time=2508 ms
Thu Jul 25 08:21:15 CEST 2013: 64 bytes from 10.0.6.90: icmp_req=22 ttl=128 time=81.6 ms
Thu Jul 25 08:21:16 CEST 2013: 64 bytes from 10.0.6.90: icmp_req=23 ttl=128 time=0.404 ms
Thu Jul 25 08:21:17 CEST 2013: 64 bytes from 10.0.6.90: icmp_req=24 ttl=128 time=152 ms
Thu Jul 25 08:21:19 CEST 2013: 64 bytes from 10.0.6.90: icmp_req=25 ttl=128 time=565 ms
Thu Jul 25 08:21:19 CEST 2013: 64 bytes from 10.0.6.90: icmp_req=26 ttl=128 time=37.9 ms
Thu Jul 25 08:21:20 CEST 2013: 64 bytes from 10.0.6.90: icmp_req=27 ttl=128 time=25.8 ms
Thu Jul 25 08:21:21 CEST 2013: 64 bytes from 10.0.6.90: icmp_req=28 ttl=128 time=181 ms
Thu Jul 25 08:21:22 CEST 2013: 64 bytes from 10.0.6.90: icmp_req=29 ttl=128 time=49.0 ms
Thu Jul 25 08:21:23 CEST 2013: 64 bytes from 10.0.6.90: icmp_req=30 ttl=128 time=0.401 ms
Thu Jul 25 08:21:24 CEST 2013: 64 bytes from 10.0.6.90: icmp_req=31 ttl=128 time=158 ms
Thu Jul 25 08:21:25 CEST 2013: 64 bytes from 10.0.6.90: icmp_req=32 ttl=128 time=133 ms
Thu Jul 25 08:21:27 CEST 2013: 64 bytes from 10.0.6.90: icmp_req=33 ttl=128 time=756 ms
Thu Jul 25 08:21:28 CEST 2013: 64 bytes from 10.0.6.90: icmp_req=34 ttl=128 time=1198 ms
Thu Jul 25 08:21:28 CEST 2013: 64 bytes from 10.0.6.90: icmp_req=35 ttl=128 time=199 ms
Thu Jul 25 08:21:29 CEST 2013: 64 bytes from 10.0.6.90: icmp_req=36 ttl=128 time=75.6 ms
--> VM is running fine again
Thu Jul 25 08:21:31 CEST 2013: 64 bytes from 10.0.6.90: icmp_req=38 ttl=128 time=0.646 ms
Thu Jul 25 08:21:32 CEST 2013: 64 bytes from 10.0.6.90: icmp_req=39 ttl=128 time=0.348 ms
Thu Jul 25 08:21:33 CEST 2013: 64 bytes from 10.0.6.90: icmp_req=40 ttl=128 time=0.300 ms
Thu Jul 25 08:21:34 CEST 2013: 64 bytes from 10.0.6.90: icmp_req=41 ttl=128 time=0.324 ms
Thu Jul 25 08:21:35 CEST 2013: 64 bytes from 10.0.6.90: icmp_req=42 ttl=128 time=0.282 ms
Thu Jul 25 08:21:36 CEST 2013: 64 bytes from 10.0.6.90: icmp_req=43 ttl=128 time=0.654 ms
Thu Jul 25 08:21:37 CEST 2013: 64 bytes from 10.0.6.90: icmp_req=44 ttl=128 time=0.324 ms
Thu Jul 25 08:21:38 CEST 2013: 64 bytes from 10.0.6.90: icmp_req=45 ttl=128 time=0.309 ms
Thu Jul 25 08:21:39 CEST 2013: 64 bytes from 10.0.6.90: icmp_req=46 ttl=128 time=0.286 ms

So anything i can do to debug? With this test VM the time is very short but we use a lot of heavy VMs where migration can take very long and the VM is stalled all the time.

I am thankful for every hint.

Regards
Thomas

Am Freitag, 19. Juli 2013 11:42:08 UTC+2 schrieb sascha:

Sascha Lucas

unread,
Jul 25, 2013, 3:57:28 AM7/25/13
to gan...@googlegroups.com
Hi,

first many thanks for your feedback, also to Roberto Espinoza!

On Wed, 24 Jul 2013 23:30:42 -0700 (PDT) tschend wrote:

> i have done some more testing and it seems not to be a problem with CPU
> count.

Yes. That's also what I found out.

> Here is a log of a vm with 1 vCPU and 32G of RAM

this is a nice analysis: comparing ganeti output and ping-times. The picture
I get is the same. Just worse: 30 ICMP packets lost at the begining. The
same RTT-lag while migration is running.

I also compared timings with a "socat-wrapper" logging qemu monitor
commands. The stall/hang happens directily after the "migrate -d ..."
command. Ganetis "info migrate" commands have no influence, they are just
stalled until qemu is responsive again (tested with delaying/suppressing
this command).

> but we use a lot of heavy VMs where migration can take very long and the VM is stalled all the time.

I just set migration speed to a higher value (80-100MB/s on 1G ethernet).
One might think of 10G ethernet -> see qemu-1.5 release notes: "4.2 Gbps in 1.5
vs 1.8 Gbps in 1.4)"

> I am thankful for every hint.

I'm afraid that this is "normal, enterprise ready" qemu behavior (will have
to talk with my enterprise OS support).

In [1] it is explained why it stalls at the begining. Comparison was done
with qemu-1.2 and a separate migration thread, which found it's way in
qemu-1.4. I plan to switch to qemu-1.4 in the lab next month. production
will follow in october. I'm looking forward ...

tschend

unread,
Jul 25, 2013, 5:42:10 AM7/25/13
to gan...@googlegroups.com
Hi Sascha,


Am Donnerstag, 25. Juli 2013 09:57:28 UTC+2 schrieb Sascha Lucas:
Hi,

first many thanks for your feedback, also to Roberto Espinoza!

On Wed, 24 Jul 2013 23:30:42 -0700 (PDT) tschend wrote:

> i have done some more testing and it seems not to be a problem with CPU  
> count.

Yes. That's also what I found out.

> Here is a log of a vm with 1 vCPU and 32G of RAM

this is a nice analysis: comparing ganeti output and ping-times. The picture  
I get is the same. Just worse: 30 ICMP packets lost at the begining. The  
same RTT-lag while migration is running.

I also have seen the lost pings with a production machine where it takes very long to migrate. 

I also compared timings with a "socat-wrapper" logging qemu monitor  
commands. The stall/hang happens directily after the "migrate -d ..."  
command. Ganetis "info migrate" commands have no influence, they are just  
stalled until qemu is responsive again (tested with delaying/suppressing  
this command).

> but we use a lot of heavy VMs where migration can take very long and the VM is stalled all the time.

I just set migration speed to a higher value (80-100MB/s on 1G ethernet).  
One might think of 10G ethernet -> see qemu-1.5 release notes: "4.2 Gbps in 1.5  
vs 1.8 Gbps in 1.4)"

If it works without the stalling it is fine for me. 

> I am thankful for every hint.

I'm afraid that this is "normal, enterprise ready" qemu behavior (will have  
to talk with my enterprise OS support).

Maybe they can say if this is a "normal expected behaviour". I do not see anybody else having this issue.

Have you tried a plain kvm migration without ganeti? 

In [1] it is explained why it stalls at the begining. Comparison was done  
with qemu-1.2 and a separate migration thread, which found it's way in  
qemu-1.4. I plan to switch to qemu-1.4 in the lab next month. production  
will follow in october. I'm looking forward ...

[1] http://www.linux-kvm.org/wiki/images/6/66/2012-forum-live-migration.pdf

Thanks, Sascha.

For now i want to stay with qemu-1.2 from wheezy. If there is no other way then using 1.4 i will think about migrating.

Thanks for your help.

Regards
Thomas 

Sascha Lucas

unread,
Jul 25, 2013, 9:29:30 AM7/25/13
to gan...@googlegroups.com
On Thu, 25 Jul 2013 02:42:10 -0700 (PDT) tschend wrote:

> Am Donnerstag, 25. Juli 2013 09:57:28 UTC+2 schrieb Sascha Lucas:
> > I just set migration speed to a higher value (80-100MB/s on 1G ethernet).

> If it works without the stalling it is fine for me. 

no. it won't avoid the stall at the begining, but makes migration converging
faster/possible on bussy VMs (time spent with RTT-lag will be smaler).

> Maybe they can say if this is a "normal expected behaviour". I do not see
> anybody else having this issue.

At least we are 3 on ganeti and the talk at KVM-Forum 2012 ... I won't be
surpriesd the first time going where no one (better not so many) has gone
before :-).

> Have you tried a plain kvm migration without ganeti? 

No. But I'll do this for my support case report (after holidays).

> For now i want to stay with qemu-1.2 from wheezy.

isn't wheezy 1.1? 1.5 is waiting in sid for a backport :-).

Thanks, Sascha.

tschend

unread,
Jul 25, 2013, 10:00:48 AM7/25/13
to gan...@googlegroups.com
Hi,


Am Donnerstag, 25. Juli 2013 15:29:30 UTC+2 schrieb Sascha Lucas:
On Thu, 25 Jul 2013 02:42:10 -0700 (PDT) tschend wrote:

> Am Donnerstag, 25. Juli 2013 09:57:28 UTC+2 schrieb Sascha Lucas:
> >    I just set migration speed to a higher value (80-100MB/s on 1G ethernet).

> If it works without the stalling it is fine for me. 

no. it won't avoid the stall at the begining, but makes migration converging  
faster/possible on bussy VMs (time spent with RTT-lag will be smaler).

Of course but we see about 10 minutes of migration time with that setup. 

> Maybe they can say if this is a "normal expected behaviour". I do not see  
> anybody else having this issue.

At least we are 3 on ganeti and the talk at KVM-Forum 2012 ... I won't be  
surpriesd the first time going where no one (better not so many) has gone  
before :-).

;-) 

> Have you tried a plain kvm migration without ganeti? 

No. But I'll do this for my support case report (after holidays).

Great! 

> For now i want to stay with qemu-1.2 from wheezy.

isn't wheezy 1.1? 1.5 is waiting in sid for a backport :-).

Indeed you are right is is 1.1.2 :-) 
 

Thanks, Sascha.

I wish you nice holidays.

Best regards
Thomas 

David Mitchell

unread,
Jul 25, 2013, 10:23:26 AM7/25/13
to gan...@googlegroups.com
Let me preface by saying that I'm not an expert on live migration but I'll throw my two cents in anyway. My understanding of how the migration works is that it first tries to get the contents of RAM copied over to the destination server. Once the RAM is in sync the actual migration happens. That entails pausing the source VM, copying the CPU state, and finally starting to run the VM on the destination server. As you noted, the number of CPU's doesn't matter much because this is actually a very small amount of data to sync compared to the contents of RAM.

One downside to this model is that if the VM is making a lot of changes to memory it's not well defined how long it will take to sync the RAM. It's essentially a race with KVM trying to copy the RAM over to the destination server faster than the VM is changing it. And depending on how much the RAM is being changed a lot of the transfer may end up wasted because the VM may change a block of RAM on the source after it's been copied to the destination. As a result that block has to be copied again. The upside to this model is that if something fails during the migration the VM should be able to keep running on the source server without failing.

An alternative model I've seen discussed is to move the running VM and CPU state first and the RAM second. When the VM needs to read memory which is still on the source server it has to wait for it to be copied (although nothing prevents KVM from scheduling that block to be transferred immediately.) The main advantage of this model is that the time required for the migration can be better defined because it guaranteed that no RAM blocks have to be transferred twice. The downside is that during the migration you are vulnerable to any errors that might crop up. If something goes wrong half way through there is probably not any way to recover since neither server has a complete RAM image for the VM. Any errors will likely result in having to just reboot the VM.

I don't really follow KVM development so I don't know if the second model ever got discussed in depth or considered for implementation. It sounds appealing and I can imagine that for some workloads it might be the only way to make it work reliably. Maybe somebody who is more in tune with KVM development knows more.


On Jul 25, 2013, at 12:30 AM, tschend <thomas...@gmail.com> wrote:

> Hi agian,
>
> i have done some more testing and it seems not to be a problem with CPU count.
>
> No matter if there are 2,4,6 or 8 vCPUs there is no problem with a small amount of memory.

-----------------------------------------------------------------
| David Mitchell (mitc...@ucar.edu) Network Engineer IV |
| Tel: (303) 497-1845 National Center for |
| FAX: (303) 497-1818 Atmospheric Research |
-----------------------------------------------------------------



Sascha Lucas

unread,
Jul 26, 2013, 6:07:03 AM7/26/13
to gan...@googlegroups.com
Hi David,

thanks for your cents... your understanding of classic migration and its
downsides is the same as mine. once I asked vmware people what they do if
memory changes to fast... they told me, they are slowing down the VM....
Never verified this, but with KVM it's the same, just unintended :-)

The alternative you throw in is called post-copy migration. There are a lot
of references out there... one talk[1] at KVM forum 2012 directly after the
one[2] explaining why VMs stall at the begining and having performance
impacts in the meantime.

[1] http://www.linux-kvm.org/wiki/images/f/f7/2012-forum-postcopy.pdf
[2] http://www.linux-kvm.org/wiki/images/6/66/2012-forum-live-migration.pdf

post-copy migration would be my favorit...

Thanks, Sascha.

Nicolas Sebrecht

unread,
Jul 29, 2013, 10:03:20 AM7/29/13
to gan...@googlegroups.com, Nicolas Sebrecht
The 26/07/13, Sascha Lucas wrote:
> Hi David,
>
> thanks for your cents... your understanding of classic migration and its
> downsides is the same as mine. once I asked vmware people what they do if
> memory changes to fast... they told me, they are slowing down the VM....
> Never verified this, but with KVM it's the same, just unintended :-)

From what I remember, I think I've seen floating patches on the qemu
mailing list to slowing down the guest if required while migrating.

--
Nicolas Sebrecht

Nicolas Sebrecht

unread,
Jul 29, 2013, 10:13:10 AM7/29/13
to gan...@googlegroups.com, Nicolas Sebrecht
The 24/07/13, Helga Velroyen wrote:

> > What I found so far is, that my problem can be a known one[1]. Even if my
> > kvm-0.15 is very old, I wonder if there is someone else migrating "not so
> > small" VMs and having stalls/downtimes on the beginning of live migration?
> >
> > [1]
> > http://www.linux-kvm.org/wiki/images/6/66/2012-forum-live-migration.pdf
> >
> >
> We haven't encountered this yet, but maybe someone else on the list has.

We have encountered this issue. We have a Windows 2008 R2 with 64 Gb of
RAM which we can't live-migrate. Unfortunately, we don't have expressive
error on what's hapenning but the other guests with lower RAM on the
exact same cluster just migrate fine (~4 Gb).

This is with qemu-kvm v1.1.2 and Ganeti v2.6.2 on Debian Wheezy-DI-b2.

--
Nicolas Sebrecht
Reply all
Reply to author
Forward
0 new messages