On 2/24/22 11:32, Dmitrii Kuvaiskii wrote:
> Sorry, I don't know anything about Gapfruit and Genode, so I'm missing
> some context.
Of course. My apologies for not sharing more details from the beginning.
Let me try to give a short overview. Genode is a framework to design
component-based scenarios or whole operating systems on top of various
microkernels. Every component is strongly isolated and only gets access
to resources and services in an explicit manner. The microkernel
guarantees strong isolation and maintains access control to various
resources via capability objects. This mechanism is often referred to as
capability-based security.
The system designer defines the component topology in a declarative and
nested manner - making it possible to abstract sub-systems while
maintaining the possibility to define fine-grained trust relationships
between the components. This approach lets you identify the critical
components make them as simple and robust as possible, which ultimately
leads to an overall resiliency of the system.
Gapfruit OS is an operating system built with the Genode framework. We
currently focus on headless appliances on x86_64 and arm_v8. We support
various runtimes: E.g., FreeBSD libc, WASM/WASI, ADA/SPARK, Solo5
Unikernel Runtime, JVM, etc., and full-blown VMs. For some workloads, we
would like to run unmodified Linux binaries. Obviously, this would work
with running Linux within a VM. However, we would like a more
lightweight solution. Hence, my approach to talking to y'all about
Gramine :)
>> If I understand correctly, in practice, this means having Gramine+App
>> run inside a VM, right? But this raises the question of how
>> scheduling and memory management etc., would be handled.
>
> I thought that this is what Gapfruit and Genode provide -- the
> primitives to link against and call into, including the primitives for
> scheduling and memory management. So this is not the case?
That's correct. However, everything but the microkernel runs in ring-3.
The microkernel is the only component that runs in ring-0.
> I guess you actually want Gramine to run in ring-3 as a
> service, and then other ring-3 apps will contact Gramine for
> something? Why would ring-3 apps go to Gramine? What's the purpose?
To have a lightweight runtime for unmodified Linux apps on Gapfruit OS.
>> Further, the docs [1] mentioned that on SGX each Linux application
>> process would run in their own SGX enclave. E.g., executing `bash -c
>> ls` would end up spawning two SGX enclaves. In the VM approach, would
>> you also implement that scenario so that it spawns two VMs?
>
> Good question. In my personal experiments, I made a decision to
> support only single-process applications. So no need to spawn a second
> VM, because I simply didn't support fork/clone.
>
> But in general yes, the Gramine philosophy is to spawn a new isolate
> for each new child process. So you would end up with two VMs.
Eventually, I would like to have the ability for Linux processes to
spawn children. From my perspective, the children of "Linux" processes
that run on top of Gramine would belong to the same trust boundary. I
sense that it would be better for performance reasons to share the same
instance of Gramine for the different processes. But since the VM
approach does not seem reasonable anymore, there might be other factors
to consider.
On 2/28/22 02:29, Michał Kowalczyk wrote:
> I'd suggest just doing the same as Linux PAL does: trap `syscall`
> instruction (you can do this, as it's your own kernel where you can add
> such functionality) and additionally (for performance) provide patched
> versions of some popular libcs with `syscall` instructions replaced with
> calls to Gramine.
A little more background regarding microkernels: We consider the
microkernel as one of many building blocks (albeit an important one)
that form the operating system. We choose different microkernels for
different use-cases. We support Nova, base-hw, SeL4, and Linux (using
seccomp for isolation and fd's as capability objects). E.g., if
virtualization is needed on x86_64, the Nova hypervisor works best. In a
way, we are users of these microkernels. And I would certainly not call
myself a microkernel expert, especially when it comes to implementation
details. From that perspective, this multi-kernel support makes me
hesitant to implement this syscall-trap feature within the various
microkernels. However, I would not completely rule out this path. Maybe
this could be a use case that only one microkernel would support.
> I don't really see any justification to put Gramine into ring0 in this
> case, it just makes everything much more complex.
Agreed.
> Not as a service, but as a shim to be able to run binaries based on
> Linux kernel ABI.
That sums it up perfectly.
> Same as we currently do in Linux PAL and in the past
> did with FreeBSD PAL.
Huh. I did not know that Gramine had a FreeBSD PAL. I assume the FreeBSD
support was dropped because of maintenance reasons and not technical
limitations. Since we support the libc from FreeBSD, it might be more
straightforward to pick up where you left.
Cheers,
Sid