[proposal] More fine-grained locking for vcpus

Jeehoon Kang

unread,

May 25, 2019, 12:26:15 PM5/25/19

to Hafnium

Hi all,

I'd like to propose a more fine-grained locking scheme for vcpus, which is implemented here: https://github.com/jeehoonkang/hafnium/tree/fine-grained-cpu-lock

- The first commit splits a vcpu lock into (1) a vcpu execution lock and (2) a vcpu interrupts lock.

(https://github.com/jeehoonkang/hafnium/commit/8c139107e61dfc86f63d309d25546cbcb0001d2b)

A vcpu lock is used for protecting its registers and interrupts, but they are not simultaneously protected.

So we can easily divide a lock into two.

- The second commit adjusts the role of vcpu execution lock.

(https://github.com/jeehoonkang/hafnium/commit/6af9c563f02923a3d1187b25e2a72a80d55c6c8f)

Now when a vcpu is running on a pcpu, the pcpu conceptually has acquired the vcpu execution lock.

As a result, `api_vcpu_prepare_run()` acquires the vcpu execution lock and `complete_saving_state()` releases it.

In my opinion, it significantly streamlines the correctness argument of vcpu synchronization schemes.

If Hafnium developers are happy with this change, I'll send a PR to the upstream.

Thanks,

Jeehoon

Jeehoon Kang

unread,

May 26, 2019, 11:32:00 PM5/26/19

to Hafnium

I updated this branch to reflect the fact that the lock order is reversed.

It was vm lock -> vcpu lock,

but in this PR, it's vcpu execution lock -> vm lock -> vcpu interrupts lock.

Hong-Seok, thank you for reporting this!

The new commits are here:

https://github.com/jeehoonkang/hafnium/commit/16a70d9bff1125d19a24c9ad3645afacf0102200

https://github.com/jeehoonkang/hafnium/commit/a843f463c3abdadb22faf574e16045f58d87a677

Thanks,

Jeehoon

Andrew Walbran

unread,

May 28, 2019, 9:46:34 AM5/28/19

to Jeehoon Kang, Hafnium

Thanks for the proposal! I'd like to wait until Andrew Scull gets back later this week to have a look at it before we decide whether it makes sense, as I think he's looked into this more than I have, but feel free to go ahead and upload it to Gerrit so we can review it there.

--
You received this message because you are subscribed to the Google Groups "Hafnium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hafnium-discu...@googlegroups.com.
To post to this group, send email to hafnium...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/hafnium-discuss/44fb8c73-4b49-461a-af7c-ea83381b52e8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Andrew Scull

unread,

May 31, 2019, 8:32:34 AM5/31/19

to Jeehoon Kang, Andrew Walbran, Hafnium

I'm going to take a closer look at the changes but I'm interested as to why you suggest finer grained locking will make arguments of correctness easier? In our current position, more locks leave more chances for mistakes at the (as yet unmeasured) cost of contention. Does your thinking over the approach for reasoning also include a link to the source code that will help us ensure the correct locks are held and are being acquired in the correct order with automated tooling since there will then be many more combinations?

The execution lock is the one I will be thinking about the most as I remember there being a reason for not using a spinlock there but I will have to do some digging to remind myself. Regretting the lack of a comment now...

To view this discussion on the web, visit https://groups.google.com/d/msgid/hafnium-discuss/CA%2B_y_2Gcgg%3D%3DpOBJkct%2BumomL-69Cxs5F4Sszhck9u8bHss_Jw%40mail.gmail.com.

Jeehoon Kang

unread,

Jun 1, 2019, 6:49:21 AM6/1/19

to Hafnium

Thank you for feedback!

I'm going to take a closer look at the changes but I'm interested as to why you suggest finer grained locking will make arguments of correctness easier?

My general strategy here is associating data, not code region, to a lock. Because using locks we want to remove concurrent accesses to a shared data, not code region. This strategy is shared among concurrent programming literature, and is culminated by the following interface of mutex in Rust: https://doc.rust-lang.org/std/sync/struct.Mutex.html#method.lock Of course we can do a similar thing in C++.

Back to Hafnium, I think the current locking scheme in vcpu doesn't clearly state which data is protected by which locks. My branch tries to clarify the relation: the "execution lock" protects state and the register file, and "interrupts lock" protects interrupt things. The reason why a running vcpu is locking the execution lock is, it is changing its state and register file. This should increase the number of locks---as we want to inject interrupts to a running vcpu---but it better reflects our intention and is easier to verify than using a single lock, IMO.

Jeehoon Kang

unread,

Jun 1, 2019, 12:51:36 PM6/1/19

to Hafnium

I submitted Gerrit review request. I'm new to Gerrit, and it's totally different from GitHub... Please correct me if I'm doing anything wrong. Please review the following diffs:

https://hafnium-review.googlesource.com/c/hafnium/+/5460

https://hafnium-review.googlesource.com/c/hafnium/+/5461

Preparing the review request, I realized that I don't need to split the vcpu lock into two. Interrupt information is accessed only by the running, current vcpu, which already has the lock in the new scheme. In short, (1) the vcpu's lock protects its state, register file, and interrupt information; and (2) the running pcpu conceptually has the running vcpu's lock. It's review id 5460.

Now I realized that `VCPU_STATE_RUNNING` is no longer needed. A running vcpu's lock is already acquired, so it's state cannot be read by any threads. So I removed it. It's review id 5461.

Thanks,

Jeehoon

Andrew Walbran

unread,

Jun 3, 2019, 8:51:23 AM6/3/19

to Jeehoon Kang, Hafnium

On Sat, 1 Jun 2019 at 17:51, Jeehoon Kang <jeehoo...@cp.kaist.ac.kr> wrote:

I submitted Gerrit review request. I'm new to Gerrit, and it's totally different from GitHub... Please correct me if I'm doing anything wrong. Please review the following diffs:
https://hafnium-review.googlesource.com/c/hafnium/+/5460
https://hafnium-review.googlesource.com/c/hafnium/+/5461

Preparing the review request, I realized that I don't need to split the vcpu lock into two. Interrupt information is accessed only by the running, current vcpu, which already has the lock in the new scheme. In short, (1) the vcpu's lock protects its state, register file, and interrupt information; and (2) the running pcpu conceptually has the running vcpu's lock. It's review id 5460.

I've commented on the change in Gerrit too, but I don't think it's the case that interrupt information is only accessed by the current vCPU. api_inject_interrupt is called from another vCPU (possibly from another VM) to modify the pending interrupts of a given vCPU.

Now I realized that `VCPU_STATE_RUNNING` is no longer needed. A running vcpu's lock is already acquired, so it's state cannot be read by any threads. So I removed it. It's review id 5461.

Thanks,
Jeehoon

On Saturday, June 1, 2019 at 7:49:21 PM UTC+9, Jeehoon Kang wrote:
Thank you for feedback!

I'm going to take a closer look at the changes but I'm interested as to why you suggest finer grained locking will make arguments of correctness easier?

My general strategy here is associating data, not code region, to a lock. Because using locks we want to remove concurrent accesses to a shared data, not code region. This strategy is shared among concurrent programming literature, and is culminated by the following interface of mutex in Rust: https://doc.rust-lang.org/std/sync/struct.Mutex.html#method.lock Of course we can do a similar thing in C++.

Back to Hafnium, I think the current locking scheme in vcpu doesn't clearly state which data is protected by which locks. My branch tries to clarify the relation: the "execution lock" protects state and the register file, and "interrupts lock" protects interrupt things. The reason why a running vcpu is locking the execution lock is, it is changing its state and register file. This should increase the number of locks---as we want to inject interrupts to a running vcpu---but it better reflects our intention and is easier to verify than using a single lock, IMO.

--

You received this message because you are subscribed to the Google Groups "Hafnium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hafnium-discu...@googlegroups.com.
To post to this group, send email to hafnium...@googlegroups.com.

To view this discussion on the web, visit https://groups.google.com/d/msgid/hafnium-discuss/f2d8ca15-9aa6-450d-853d-57027b11eac3%40googlegroups.com.

Andrew Scull

unread,

Jun 3, 2019, 9:22:41 AM6/3/19

to Andrew Walbran, Jeehoon Kang, Hafnium

In the current locking scheme, the data in `struct vcpu` is protected by the vCPU lock (except, possibly, values that don't change which is a bit of laziness but doesn't apply to the topics in this thread). The registers are further protected by the `regs_available` flag. It's possible that `regs_available` could be changes to a lock with try_lock behavior but I don't see the fundamental difference or improvement this would bring.

I can see why straight forward locking would be nicer and you can see in the CL introducing `regs_available` that this was tried and then changed and was the focus of discussion. Hopefully looking through that thread will give some better insight into this and if you can find a way to simplify it that would be great.

To view this discussion on the web, visit https://groups.google.com/d/msgid/hafnium-discuss/CA%2B_y_2HxXr%2B8drOh3vt%2BuGDWHGqQWwpSoeMndywCTyYvZKAyRg%40mail.gmail.com.

Reply all

Reply to author

Forward