[PATCH 0/3] cell_management: fix logic in prologue

26 views
Skip to first unread message

Veaceslav Falico

unread,
Jul 16, 2015, 7:48:12 AM7/16/15
to jailho...@googlegroups.com, jan.k...@siemens.com
Hi,

I'm only getting starting to learn the code and I don't understand the
logic completely, so this patchset might be complete garbage.

Anyway, hypervisor-side cell_management_prologue() verification for task
actions and their permissions strikes quite odd - both that it seems to be
buggy for the CELL_DESTROY case and that it lacks CELL_START verification.

This patchset tries to address both and introduce a macro for current and
future use - hopefully it gets along the logic of online/offline cells.

The CELL_START verification bug can be easily reproduced:

sudo tools/jailhouse enable configs/qemu-vm.cell
sudo tools/jailhouse cell create configs/apic-demo.cell
sudo tools/jailhouse cell load apic-demo inmates/demos/x86/apic-demo.bin -a 0xf0000
sudo tools/jailhouse cell start apic-demo
sudo tools/jailhouse cell start apic-demo
sudo tools/jailhouse cell start apic-demo

After the 3rd start it hangs for me completely.

It's based on the most up-to-date next branch.

Hope that the fixes make sense :).

Thank you!


Veaceslav Falico

unread,
Jul 16, 2015, 7:48:13 AM7/16/15
to jailho...@googlegroups.com, jan.k...@siemens.com, Veaceslav Falico, Veaceslav Falico
It's useful to see whether the state is currently active or not.

Signed-off-by: Veaceslav Falico <veacesla...@huawei.com>
---
hypervisor/control.c | 3 +--
hypervisor/include/jailhouse/hypercall.h | 4 ++++
2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/hypervisor/control.c b/hypervisor/control.c
index 695c8c1..5973eb5 100644
--- a/hypervisor/control.c
+++ b/hypervisor/control.c
@@ -108,8 +108,7 @@ static bool cell_send_message(struct cell *cell, u32 message,
u32 reply = cell->comm_page.comm_region.reply_from_cell;
u32 cell_state = cell->comm_page.comm_region.cell_state;

- if (cell_state == JAILHOUSE_CELL_SHUT_DOWN ||
- cell_state == JAILHOUSE_CELL_FAILED)
+ if (!JAILHOUSE_CELL_IS_ONLINE(cell_state))
return true;

if ((type == MSG_REQUEST &&
diff --git a/hypervisor/include/jailhouse/hypercall.h b/hypervisor/include/jailhouse/hypercall.h
index 0087d66..649afc0 100644
--- a/hypervisor/include/jailhouse/hypercall.h
+++ b/hypervisor/include/jailhouse/hypercall.h
@@ -88,6 +88,10 @@
#define JAILHOUSE_CELL_SHUT_DOWN 2 /* terminal state */
#define JAILHOUSE_CELL_FAILED 3 /* terminal state */

+/* cell state macros for online/offline state */
+#define JAILHOUSE_CELL_IS_ONLINE(state) (state == JAILHOUSE_CELL_RUNNING || \
+ state == JAILHOUSE_CELL_RUNNING_LOCKED)
+
#define COMM_REGION_GENERIC_HEADER \
volatile __u32 msg_to_cell; \
volatile __u32 reply_from_cell; \
--
2.4.3

Veaceslav Falico

unread,
Jul 16, 2015, 7:48:14 AM7/16/15
to jailho...@googlegroups.com, jan.k...@siemens.com, Veaceslav Falico, Veaceslav Falico
Currently the verification for cell management will fail with EPERM if
either it's being destroyed and there's another locked cell running, or if
it's being created/loaded/destroyed AND it allows being shut down.

The last condition is wrong because only on destruction it should be
verified if the cell itself allows to be shut down.

Fix this by verifying if the cell can be destroyed (other cells aren't
locked and the cell is ok to shut down) only on destroy task.

Signed-off-by: Veaceslav Falico <veacesla...@huawei.com>
---
hypervisor/control.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hypervisor/control.c b/hypervisor/control.c
index 5973eb5..010b645 100644
--- a/hypervisor/control.c
+++ b/hypervisor/control.c
@@ -522,8 +522,8 @@ static int cell_management_prologue(enum management_task task,
return -EINVAL;
}

- if ((task == CELL_DESTROY && !cell_reconfig_ok(*cell_ptr)) ||
- !cell_shutdown_ok(*cell_ptr)) {
+ if (task == CELL_DESTROY &&
+ (!cell_reconfig_ok(*cell_ptr) || !cell_shutdown_ok(*cell_ptr))) {
cell_resume(cpu_data);
return -EPERM;
}
--
2.4.3

Veaceslav Falico

unread,
Jul 16, 2015, 7:48:15 AM7/16/15
to jailho...@googlegroups.com, jan.k...@siemens.com, Veaceslav Falico, Veaceslav Falico
Currently we don't verify if cell is already started when trying to start
it or set it loadable, thus we can start any cell unlimited amount of
times, which can lead to weird consequencies.

Fix this by verifying if the cell is already online and, if yes, return
with an EPERM error.

Signed-off-by: Veaceslav Falico <veacesla...@huawei.com>
---

Notes:
I've decided to use switch () {} instead of if/elif for readability.

No default not to mess with later-added types of tasks.

hypervisor/control.c | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/hypervisor/control.c b/hypervisor/control.c
index 010b645..2f9417a 100644
--- a/hypervisor/control.c
+++ b/hypervisor/control.c
@@ -522,10 +522,20 @@ static int cell_management_prologue(enum management_task task,
return -EINVAL;
}

- if (task == CELL_DESTROY &&
- (!cell_reconfig_ok(*cell_ptr) || !cell_shutdown_ok(*cell_ptr))) {
- cell_resume(cpu_data);
- return -EPERM;
+ switch (task) {
+ case CELL_DESTROY:
+ if (!cell_reconfig_ok(*cell_ptr) || !cell_shutdown_ok(*cell_ptr)) {
+ cell_resume(cpu_data);
+ return -EPERM;
+ }
+ break;
+ case CELL_START:
+ case CELL_SET_LOADABLE:
+ if (JAILHOUSE_CELL_IS_ONLINE((*cell_ptr)->comm_page.comm_region.cell_state)) {
+ cell_resume(cpu_data);
+ return -EPERM;
+ }
+ break;
}

cell_suspend(*cell_ptr, cpu_data);
--
2.4.3

Jan Kiszka

unread,
Jul 16, 2015, 8:08:18 AM7/16/15
to Veaceslav Falico, jailho...@googlegroups.com
Yeah, good catch - confirmed.

>
> It's based on the most up-to-date next branch.
>
> Hope that the fixes make sense :).

Will look into them ASAP. Thanks already!

Jan

--
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

Henning Schild

unread,
Jul 16, 2015, 11:17:59 AM7/16/15
to Veaceslav Falico, jailho...@googlegroups.com, jan.k...@siemens.com
Another start without another load will reuse the old memory content.
Your global state will not be all zeros, just like stack content that
does not get initialized.

I did not look the code of apic-demo or your patches. That is just to
clarify what is going on and why the second and third run can differ
from the first. If you reload before you restart you should always get
the same results.

Henning

Henning Schild

unread,
Jul 16, 2015, 11:50:16 AM7/16/15
to Veaceslav Falico, jailho...@googlegroups.com, jan.k...@siemens.com
In the case of the apic test the third start should actually trigger
the real restart on dirty memory. The second start command should get
rejected because the cell is locked.

Henning

Veaceslav Falico

unread,
Jul 16, 2015, 12:44:17 PM7/16/15
to Henning Schild, jailho...@googlegroups.com, jan.k...@siemens.com
Yeah, pretty much as I've figured it out. The patchset actually prohibits
this kind of behaviour - second (successfull) start without destroying it
first.

>
>Henning

Jan Kiszka

unread,
Jul 17, 2015, 2:21:37 AM7/17/15
to Veaceslav Falico, jailho...@googlegroups.com, Veaceslav Falico
This is not correct. cell_shutdown_ok has to be validated on
CELL_SET_LOADABLE and CELL_START as well because both imply a shutdown
of running cells that the target shall be able to vote on.

Jan Kiszka

unread,
Jul 17, 2015, 2:23:58 AM7/17/15
to Veaceslav Falico, jailho...@googlegroups.com, Veaceslav Falico
On 2015-07-16 13:48, Veaceslav Falico wrote:
> Currently we don't verify if cell is already started when trying to start
> it or set it loadable, thus we can start any cell unlimited amount of
> times, which can lead to weird consequencies.
>
> Fix this by verifying if the cell is already online and, if yes, return
> with an EPERM error.

It used to be a working feature to be able to restart demo cells without
reloading them first. Now this is no longer working with our cells (will
comment on this in a separate reply). But there might be other workload
that supports it, so I'd like to preserve this feature.

Veaceslav Falico

unread,
Jul 17, 2015, 2:24:50 AM7/17/15
to Jan Kiszka, jailho...@googlegroups.com, Veaceslav Falico
That's addressed in the next patch - to prohibit starting/loading of the
cell if it's still running - i.e. to force user to explicitly destroy the
cell first, before (re-)starting it.

This way there will be clear division between starting and destroying the
cell.

Jan Kiszka

unread,
Jul 17, 2015, 2:29:39 AM7/17/15
to Veaceslav Falico, jailho...@googlegroups.com, Veaceslav Falico
Well, then you should have reordered patches. Please no temporal
regressions in patch series. We need to maintain bisectability.

> cell if it's still running - i.e. to force user to explicitly destroy the
> cell first, before (re-)starting it.
>
> This way there will be clear division between starting and destroying the
> cell.

First, the hypervisor has no "shutdown cell" command. It only knows
"start" and "set loadable" and has no idea if the cell was actually
reloaded for a restart. And second, destroying a cell to reload it is
overkill (and not possible with some other cell locked the configuration).

Jan Kiszka

unread,
Jul 17, 2015, 2:29:49 AM7/17/15
to Veaceslav Falico, jailho...@googlegroups.com, Henning Schild
OK, understood the effect now. This pattern used to work for the
apic-demo until ef3b370c64 (SMP support). Since then, the bootstrap code
uses some global variables to define the CPU ID and type (BP vs. AP).
This adds states to the bootstrap process, thus we can no longer restart
the cell without reloading it first (to reset the states).

The real problem is now related to the TODO item "cell software watchdog
via comm region messages": The apic-demo participates in the comm region
protocol, thus the hypervisor expects an answer on the first restart
request. But as the cell is now stuck during boot, there won't be an
answer, and the hypervisor locks up the root cell.

The answer to this is what is listed under that TODO item: adding a
timeout to the communication region protocol and fail a cell that does
not react in time. The only challenge of this is that the hypervisor has
no notion of time so far.

Veaceslav Falico

unread,
Jul 17, 2015, 2:41:59 AM7/17/15
to Jan Kiszka, jailho...@googlegroups.com, Veaceslav Falico
From my POV that wasn't a "feature" (hidden restart on a start command), but
rather a bug and unpredictable behaviour. So, no regressions :).

>
>> cell if it's still running - i.e. to force user to explicitly destroy the
>> cell first, before (re-)starting it.
>>
>> This way there will be clear division between starting and destroying the
>> cell.
>
>First, the hypervisor has no "shutdown cell" command. It only knows
>"start" and "set loadable" and has no idea if the cell was actually
>reloaded for a restart. And second, destroying a cell to reload it is
>overkill (and not possible with some other cell locked the configuration).

That's why I didn't mention shutting down, but rather destroying the cell -
I'm completely aware that there's no way to actually shut it down without
unloading it.

As for the overkill - completely agree, there should be a method to be able
to cleanly restart it. I'll follow up on this in the reply to the main 0/N
thread.

Valentine Sinitsyn

unread,
Jul 17, 2015, 3:00:42 AM7/17/15
to Jan Kiszka, Veaceslav Falico, jailho...@googlegroups.com, Henning Schild
Hi everybody,

On 17.07.2015 11:29, Jan Kiszka wrote:
> The answer to this is what is listed under that TODO item: adding a
> timeout to the communication region protocol and fail a cell that does
> not react in time. The only challenge of this is that the hypervisor has
> no notion of time so far.
I also have a few places that are crying for timeout infrastructure in
amd_iommu code (otherwise we leave the hypervisor potentially vulnerable
to a hard lockup), so patches are very welcome. :)

Valentine

Jan Kiszka

unread,
Jul 17, 2015, 3:06:33 AM7/17/15
to Valentine Sinitsyn, Veaceslav Falico, jailho...@googlegroups.com, Henning Schild
OK, good to know. Would be polling for a timeout sufficient in your use
case? Then we could start with a simpler infrastructure.

The watchdog I was also mentioning in the TODO clearly requires
something active, i.e. a periodic event, likely over a root cell CPU to
check non-root cells. But that could be stacked on top of some
"get_time" service later on.

Veaceslav Falico

unread,
Jul 17, 2015, 3:24:06 AM7/17/15
to Jan Kiszka, jailho...@googlegroups.com, Henning Schild
And that's exaclty why I think that there shouldn't be an implied restart
on start (and shutdown on set_loadable) command - because of these kind of bugs.

Anyway, understood, and it's definitely not a priority... but:

Maybe there should be a more clean devision between (implied) tasks and
what they do?

currently both SET_LOADABLE and START will shut down the running (!) cell,
which doesn't strike me as something intuitive.

What do you think about the following design:

--------------------
CREATE: creates a cell from cell_desc if one doesn't exist already.
DESTROY: destroys a cell if it's offline (implies UNLOAD)

LOAD: maps regions to the cell if it's not mapped already and the cell is
offline, copies the image
UNLOAD: unmaps regions if the cell is offline

START: starts a cell if it's offline and loaded
STOP: shuts down a cell if it's online
RESTART: STOP && START

FLUSH (?): memzeros the loadable memory if it's offline and loaded
--------------------

This way there'll be a lot more flexibility, more defined behaviour and
easier to understand/get into it.

>
>The real problem is now related to the TODO item "cell software watchdog
>via comm region messages": The apic-demo participates in the comm region
>protocol, thus the hypervisor expects an answer on the first restart
>request. But as the cell is now stuck during boot, there won't be an
>answer, and the hypervisor locks up the root cell.
>
>The answer to this is what is listed under that TODO item: adding a
>timeout to the communication region protocol and fail a cell that does
>not react in time. The only challenge of this is that the hypervisor has
>no notion of time so far.

The easiest solution I see is just to add a watchdog to the root cell and a
new hypercall _TICK or whatever - it'll keep the base small, more or less
reliable and it's easy to implement :).

What do you think?

Valentine Sinitsyn

unread,
Jul 17, 2015, 3:25:11 AM7/17/15
to Jan Kiszka, Veaceslav Falico, jailho...@googlegroups.com, Henning Schild
On 17.07.2015 12:06, Jan Kiszka wrote:
> On 2015-07-17 09:00, Valentine Sinitsyn wrote:
>> Hi everybody,
>>
>> On 17.07.2015 11:29, Jan Kiszka wrote:
>>> The answer to this is what is listed under that TODO item: adding a
>>> timeout to the communication region protocol and fail a cell that does
>>> not react in time. The only challenge of this is that the hypervisor has
>>> no notion of time so far.
>> I also have a few places that are crying for timeout infrastructure in
>> amd_iommu code (otherwise we leave the hypervisor potentially vulnerable
>> to a hard lockup), so patches are very welcome. :)
>
> OK, good to know. Would be polling for a timeout sufficient in your use
> case? Then we could start with a simpler infrastructure.
I think so. In fact, I just thought that amd_iommu is naturally
x86-specific, so if I could rely on something like PM Timer/HPET inside
the hypervisor, it would already be enough.

Valentine

Valentine Sinitsyn

unread,
Jul 17, 2015, 3:30:07 AM7/17/15
to Veaceslav Falico, Jan Kiszka, jailho...@googlegroups.com, Henning Schild
On 17.07.2015 12:24, Veaceslav Falico wrote:
>> The answer to this is what is listed under that TODO item: adding a
>> timeout to the communication region protocol and fail a cell that does
>> not react in time. The only challenge of this is that the hypervisor has
>> no notion of time so far.
>
> The easiest solution I see is just to add a watchdog to the root cell and a
> new hypercall _TICK or whatever - it'll keep the base small, more or less
> reliable and it's easy to implement :).
Does this imply the root cell is always assumed to be alive? I'd rather
not off-load a hypervisor task to an inmate, even a privileged one.

Valentine

Veaceslav Falico

unread,
Jul 17, 2015, 3:39:06 AM7/17/15
to Valentine Sinitsyn, Jan Kiszka, jailho...@googlegroups.com, Henning Schild
Sure it does, and it's definitely not a long-term solution, just a proposal
for a quick-n-dirty hack to get the time bookkeeping going, before the
actual implementation of real timekeeping code...

>
>Valentine

Gustavo Lima Chaves

unread,
Jul 25, 2017, 6:52:26 PM7/25/17
to Jailhouse, jan.k...@siemens.com, vfa...@gmail.com, henning...@siemens.com

That's interesting. So we kidnap one of those timers to the hypervisor only (since the cells are using APIC timers and are good with them) and gain timers on that level, right? Leaving the root cell out of the way on the watchdog task looks safer for me as well (reading through the thread). We could have policies to reload cells even without participation of the root one. Has anyone experimented with those different clocks on x86 already?

Jan Kiszka

unread,
Jul 26, 2017, 1:38:44 AM7/26/17
to Gustavo Lima Chaves, Jailhouse, vfa...@gmail.com, henning...@siemens.com
In this last comments, I was not talking about a timer but a clock. We
already share the PM "Timer" (which is a clock in reality) across all
cells (because it is read-only and trivially handed out via PIO access
masks). We could also use it in the hypervisor on x86 in order to gain a
notion of time. However, we may also need a timer in order to implement
a watchdog. And, with safety in mind, we may need some redundancy in
that concept, e.g. a second independent time source, to avoid nasty
error modes when some time source goes mad.

Regarding cell management without the root cell: If it's only about
restart, that would be doable. But generally, this is not possibly
without reload. And loading can only be done with the help of the root
cell. That's an architectural property of Jailhouse, to offload all the
logic and drivers and you-name-it to the root cell.

Jan

Gustavo Lima Chaves

unread,
Jul 26, 2017, 6:53:46 PM7/26/17
to Jan Kiszka, Jailhouse, vfa...@gmail.com, henning...@siemens.com
* Jan Kiszka <jan.k...@siemens.com> [2017-07-26 07:38:41 +0200]:

[...]

> >> Valentine
> >
> > That's interesting. So we kidnap one of those timers to the hypervisor only (since the cells are using APIC timers and are good with them) and gain timers on that level, right? Leaving the root cell out of the way on the watchdog task looks safer for me as well (reading through the thread). We could have policies to reload cells even without participation of the root one. Has anyone experimented with those different clocks on x86 already?
> >
>
> In this last comments, I was not talking about a timer but a clock. We
> already share the PM "Timer" (which is a clock in reality) across all
> cells (because it is read-only and trivially handed out via PIO access
> masks). We could also use it in the hypervisor on x86 in order to gain a

Do you say it is read-only because we only see an inl() call on that
address at timing.c or because any write would be Jailhouse-blocked (I
could not find any code doing said block, but maybe I missed it)?

I also ask that targeting another thing: would it be trivial, as the
code is, to enable the RTC region on both root and inmate cell, so
that an inmate Linux is given a permanent clock as well? I'm doing the
test now, but just in case...

> notion of time. However, we may also need a timer in order to implement
> a watchdog. And, with safety in mind, we may need some redundancy in
> that concept, e.g. a second independent time source, to avoid nasty
> error modes when some time source goes mad.
>
> Regarding cell management without the root cell: If it's only about
> restart, that would be doable. But generally, this is not possibly
> without reload. And loading can only be done with the help of the root
> cell. That's an architectural property of Jailhouse, to offload all the
> logic and drivers and you-name-it to the root cell.

So if if cell-reloading without the root cell is a no-go, we're back
better off generating the watchdog ticks from the root cell (and not
the hypervisor), no?

>
> Jan

--
Gustavo Lima Chaves
Intel - Open Source Technology Center

Gustavo Lima Chaves

unread,
Jul 26, 2017, 7:47:06 PM7/26/17
to Jan Kiszka, Jailhouse, vfa...@gmail.com, henning...@siemens.com
* Gustavo Lima Chaves <gustavo.l...@intel.com> [2017-07-26 15:53:45 -0700]:

> * Jan Kiszka <jan.k...@siemens.com> [2017-07-26 07:38:41 +0200]:
>
> [...]
>
> > >> Valentine
> > >
> > > That's interesting. So we kidnap one of those timers to the hypervisor only (since the cells are using APIC timers and are good with them) and gain timers on that level, right? Leaving the root cell out of the way on the watchdog task looks safer for me as well (reading through the thread). We could have policies to reload cells even without participation of the root one. Has anyone experimented with those different clocks on x86 already?
> > >
> >
> > In this last comments, I was not talking about a timer but a clock. We
> > already share the PM "Timer" (which is a clock in reality) across all
> > cells (because it is read-only and trivially handed out via PIO access
> > masks). We could also use it in the hypervisor on x86 in order to gain a
>
> Do you say it is read-only because we only see an inl() call on that
> address at timing.c or because any write would be Jailhouse-blocked (I
> could not find any code doing said block, but maybe I missed it)?
>
> I also ask that targeting another thing: would it be trivial, as the
> code is, to enable the RTC region on both root and inmate cell, so
> that an inmate Linux is given a permanent clock as well? I'm doing the
> test now, but just in case...

INTx sharing is the issue here, I guess, ugh. Any clues on how to
provide non "Jan 1 1970" dates for Linux inmates?

Ralf Ramsauer

unread,
Jul 27, 2017, 4:40:20 AM7/27/17
to Gustavo Lima Chaves, Jan Kiszka, Jailhouse, vfa...@gmail.com, henning...@siemens.com
I use the ivshmem-net device and NAT to access the internet from the
non-root cell. This allows me to use NTP.

Another quick hack could be to provide a timestamp as kernel parameter.
Not really accurate, though.

Ralf
>
Reply all
Reply to author
Forward
0 new messages