Root cell_suspend in the management prelogue

17 views
Skip to first unread message

Devshatwar, Nikhil

unread,
Dec 13, 2017, 1:22:44 AM12/13/17
to jailho...@googlegroups.com, Vutla, Lokesh, Jan Kiszka, Ralf Ramsauer

Hello all,

 

I am profiling the impact of reloading a cell on the root cell.

My observation is that the hypercall for cell_load and cell_start ends up suspending the root cell while the operation is being performed.

 

This causes delays in the execution of a real time application running in the root cell.

I can understand that the cell_suspend might be necessary when creating or destroying cells because it affects resources being used by root cell.

But the loading and starting of other non-root cell can happen with root cell running in parallel.

 

Was this decision taken to avoid making Jailhouse re-entrant?

If so, we can hold some locks in the hypercall to avoid that

 

For now, I tried[1 HACK patch] to remove the cell_suspend and cell_resume and it seems to work well.

The correct way would be to suspend only when creating or destroying cells.

Let me know if there will be any repercussions of this change?

 

 

 

diff --git a/hypervisor/control.c b/hypervisor/control.c

index 72d3c7e..a25da91 100644

--- a/hypervisor/control.c

+++ b/hypervisor/control.c

@@ -514,26 +514,21 @@ static int cell_management_prologue(enum management_task task,

        if (cpu_data->cell != &root_cell)

                return -EPERM;

 

-       cell_suspend(&root_cell, cpu_data);

-

        for_each_cell(*cell_ptr)

                if ((*cell_ptr)->config->id == id)

                        break;

 

        if (!*cell_ptr) {

-               cell_resume(cpu_data);

                return -ENOENT;

        }

 

        /* root cell cannot be managed */

        if (*cell_ptr == &root_cell) {

-               cell_resume(cpu_data);

                return -EINVAL;

        }

 

        if ((task == CELL_DESTROY && !cell_reconfig_ok(*cell_ptr)) ||

            !cell_shutdown_ok(*cell_ptr)) {

-               cell_resume(cpu_data);

                return -EPERM;

        }

 

@@ -582,8 +577,6 @@ static int cell_start(struct per_cpu *cpu_data, unsigned long id)

        printk("Started cell \"%s\"\n", cell->config->name);

 

out_resume:

-       cell_resume(cpu_data);

-

        return err;

}

 

 

 

Regards,

Nikhil D

Jan Kiszka

unread,
Dec 13, 2017, 2:09:25 AM12/13/17
to Devshatwar, Nikhil, jailho...@googlegroups.com, Vutla, Lokesh, Ralf Ramsauer
On 2017-12-13 07:22, Devshatwar, Nikhil wrote:
> Hello all,
>
> I am profiling the impact of reloading a cell on the root cell.
> My observation is that the hypercall for cell_load and cell_start ends up suspending the root cell while the operation is being performed.
>
> This causes delays in the execution of a real time application running in the root cell.
> I can understand that the cell_suspend might be necessary when creating or destroying cells because it affects resources being used by root cell.
> But the loading and starting of other non-root cell can happen with root cell running in parallel.
>
> Was this decision taken to avoid making Jailhouse re-entrant?
> If so, we can hold some locks in the hypercall to avoid that
>
> For now, I tried[1 HACK patch] to remove the cell_suspend and cell_resume and it seems to work well.
> The correct way would be to suspend only when creating or destroying cells.
> Let me know if there will be any repercussions of this change?
>

There are two reasons why we stop the root cell on load and start
operations (you need both for a cell reload):

- primitive lock to protect the management API against reentry
- need for atomic modification of the 2nd-stage page table
(we map in/out the loadable non-root cell memory regions to/from the
root cell)

While the former might be resolvable with a hypervisor internal lock +
usage counter, the latter is more complex. I'm not saying it's
impossible, but the result will require more code and more review to
ensure the correct behavior.
Oh, and this patch is definitely broken as it also affects cell
destruction and - indirectly - creation.

Jan

--
Siemens AG, Corporate Technology, CT RDA ITP SES-DE
Corporate Competence Center Embedded Linux

Jan Kiszka

unread,
Dec 13, 2017, 2:11:06 AM12/13/17
to Devshatwar, Nikhil, jailho...@googlegroups.com, Vutla, Lokesh, Ralf Ramsauer
On 2017-12-13 08:09, Jan Kiszka wrote:
> On 2017-12-13 07:22, Devshatwar, Nikhil wrote:
>> Hello all,
>>
>> I am profiling the impact of reloading a cell on the root cell.
>> My observation is that the hypercall for cell_load and cell_start ends up suspending the root cell while the operation is being performed.
>>
>> This causes delays in the execution of a real time application running in the root cell.
>> I can understand that the cell_suspend might be necessary when creating or destroying cells because it affects resources being used by root cell.
>> But the loading and starting of other non-root cell can happen with root cell running in parallel.
>>
>> Was this decision taken to avoid making Jailhouse re-entrant?
>> If so, we can hold some locks in the hypercall to avoid that
>>
>> For now, I tried[1 HACK patch] to remove the cell_suspend and cell_resume and it seems to work well.
>> The correct way would be to suspend only when creating or destroying cells.
>> Let me know if there will be any repercussions of this change?
>>
>
> There are two reasons why we stop the root cell on load and start
> operations (you need both for a cell reload):
>
> - primitive lock to protect the management API against reentry
> - need for atomic modification of the 2nd-stage page table
> (we map in/out the loadable non-root cell memory regions to/from the
> root cell)

...across all CPUs of the root cell. It usually requires some remote TLB
shoot-downs.

Nikhil Devshatwar

unread,
Dec 18, 2017, 3:45:36 AM12/18/17
to Devshatwar, Nikhil, jailho...@googlegroups.com, Vutla, Lokesh, Jan Kiszka, Ralf Ramsauer
Hi Jan/Ralf,

Do you have any opinion on the following?

Regards,
Nikhil D

Jan Kiszka

unread,
Dec 18, 2017, 4:15:46 AM12/18/17
to Nikhil Devshatwar, Devshatwar, Nikhil, jailho...@googlegroups.com, Vutla, Lokesh, Ralf Ramsauer
On 2017-12-18 09:45, Nikhil Devshatwar wrote:
> Hi Jan/Ralf,
>
> Do you have any opinion on the following?

I replied already, see mailing list archive if it didn't reach you.

I'm not fundamentally against making "Cell Set Loadable" and "Cell
Start" less invasive to the root cell, but we need to address the
concurrency around those calls properly.

Jan

Devshatwar, Nikhil

unread,
Dec 19, 2017, 4:51:54 AM12/19/17
to Jan Kiszka, jailho...@googlegroups.com, Vutla, Lokesh, Ralf Ramsauer
> From: jailho...@googlegroups.com [mailto:jailhouse-
> d...@googlegroups.com] On Behalf Of Jan Kiszka
> Sent: Monday, December 18, 2017 2:46 PM
> To: Devshatwar, Nikhil; Devshatwar, Nikhil; jailho...@googlegroups.com;
> Vutla, Lokesh; Ralf Ramsauer
> Subject: [EXTERNAL] Re: Root cell_suspend in the management prelogue
>
> On 2017-12-18 09:45, Nikhil Devshatwar wrote:
> > Hi Jan/Ralf,
> >
> > Do you have any opinion on the following?
>
> I replied already, see mailing list archive if it didn't reach you.
>

Yeah, there was some issue with the sync

> I'm not fundamentally against making "Cell Set Loadable" and "Cell Start" less
> invasive to the root cell, but we need to address the concurrency around those
> calls properly.

Alright. I will try to come up with the solution which reduces the impact on root cell while
Making sure the concurrency is handled.
We can add few locks and implement some events to flush remote TLBs etc.

Let us review this when I have implemented this cleanly.

Nikhil D
> --
> You received this message because you are subscribed to the Google Groups
> "Jailhouse" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to jailhouse-de...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages