--
You received this message because you are subscribed to the Google Groups "gVisor Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gvisor-users...@googlegroups.com.
To post to this group, send email to gvisor...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gvisor-users/fcd51d0b-3925-45ca-ab36-6e1049b25a47%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
On the KVM platform, system call interception works much like a normal OS. When running in guest mode, the platform sets MSR_LSTAR to a system call handler, which is invoked whenever an application (or the sentry itself) executes a SYSCALL instruction.System calls from the sentry to the host are a bit more involved, as they require the sentry to switch from guest mode back to host mode before calling into the host kernel.
On Wednesday, May 9, 2018 at 1:23:20 PM UTC-7, Michael Pratt wrote:On the KVM platform, system call interception works much like a normal OS. When running in guest mode, the platform sets MSR_LSTAR to a system call handler, which is invoked whenever an application (or the sentry itself) executes a SYSCALL instruction.System calls from the sentry to the host are a bit more involved, as they require the sentry to switch from guest mode back to host mode before calling into the host kernel.Can you explain this a little bit more? When and why would sentry issue syscall to the host kernel when it is in the guest mode?
--On Thu, May 3, 2018 at 2:49 AM <3n4...@gmail.com> wrote:Hello all,--Just curious about the tech details on how gvisor trap to syscall handler using vmx,it grateful if you can also figure the source code file and functions which finishing such tasks.Thanks !
You received this message because you are subscribed to the Google Groups "gVisor Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gvisor-users...@googlegroups.com.
To post to this group, send email to gvisor...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gvisor-users/fcd51d0b-3925-45ca-ab36-6e1049b25a47%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
You received this message because you are subscribed to the Google Groups "gVisor Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gvisor-users...@googlegroups.com.
To post to this group, send email to gvisor...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gvisor-users/0225cff8-0249-488e-94a4-2edb71b6c55d%40googlegroups.com.
On Wed, May 9, 2018 at 2:27 PM Xiang Li <xiang...@gmail.com> wrote:
On Wednesday, May 9, 2018 at 1:23:20 PM UTC-7, Michael Pratt wrote:On the KVM platform, system call interception works much like a normal OS. When running in guest mode, the platform sets MSR_LSTAR to a system call handler, which is invoked whenever an application (or the sentry itself) executes a SYSCALL instruction.System calls from the sentry to the host are a bit more involved, as they require the sentry to switch from guest mode back to host mode before calling into the host kernel.Can you explain this a little bit more? When and why would sentry issue syscall to the host kernel when it is in the guest mode?The sentry is developed as a normal user-space application (see "How is gVisor different from other container isolation mechanisms?" and the following Architecture section of our README). As such, it may make host system calls for many different reasons. e.g., external file system access performs read()s and write()s to a 9p server over a Unix Domain Socket. The Go runtime itself uses clone(), futex(), and mmap() (among others) for host system thread creation, synchronization primitives, and memory allocation, respectively.
The vast majority of sentry code (anything outside of pkg/sentry/platform/kvm or pkg/sentry/platform/ring0) assumes that it is a normal Linux process. Those packages are responsible for ensuring that interactions with the host (syscalls) still work properly.
--On Thu, May 3, 2018 at 2:49 AM <3n4...@gmail.com> wrote:Hello all,--Just curious about the tech details on how gvisor trap to syscall handler using vmx,it grateful if you can also figure the source code file and functions which finishing such tasks.Thanks !
You received this message because you are subscribed to the Google Groups "gVisor Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gvisor-users...@googlegroups.com.
To post to this group, send email to gvisor...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gvisor-users/fcd51d0b-3925-45ca-ab36-6e1049b25a47%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
You received this message because you are subscribed to the Google Groups "gVisor Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gvisor-users+unsubscribe@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to gvisor-users+unsubscribe@googlegroups.com.
Thanks for the detailed explanation.> The sentry is normally mapped at a normal userspace addressIs this because of the fact that sentry is developed as a normal user-space application with go runtime?
If the sentry runs in ring0 with the normal userspace address, how would its own syscalls (either from go runtime or to access host resources) get trapped? Is it handled here (https://github.com/google/gvisor/blob/master/pkg/sentry/platform/ring0/entry_amd64.s#L163-L221)? It seems that CPU_KERNEL_SYSCALL is a HLT instruction for vm exit?
Thanks for the hints, it is very useful for me to understand the code. Very interesting.This reminds me dune without kernel module involved. The "dune lib/process" is a linux emulator, and the untrusted code in ring3 is user application. I also read a little bit on the memory management part, and am wondering if gviosr is also similar to dune but implementing the Guest Physical -> Host Virtual with a software approach?
To view this discussion on the web visit https://groups.google.com/d/msgid/gvisor-users/e620e264-6eb9-4eef-993c-02b8ac182a9c%40googlegroups.com.
> >>>> <https://github.com/google/gvisor/blob/master/pkg/sentry/platform/kvm/machine_amd64.go#L110> is
> >>>> probably an interesting starting point.
> >>>>
> >>>> The control flow is bit hard to follow. At a high level it goes:
> >>>> bluepill() -> execute CLI (allowed if already in guest mode, or ...) ->
> >>>> SIGILL signal handler -> bluepillHandler() -> KVM_RUN with RIP @ CLI
> >>>> instruction -> execute CLI in guest mode, bluepill() returns
> >>>>
> >>>>
> >>>>
> >>>>>
> >>>>> On Thu, May 10, 2018 at 1:58 PM, Michael Pratt <mpr...@google.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Almost, except in guest mode, the sentry always executes in ring 0.
> >>>>>> You can see the core flow here:
> >>>>>> https://github.com/google/gvisor/blob/master/pkg/sentry/platform/ring0/kernel_amd64.go#L215-L231
> >>>>>>
> >>>>>> The sentry is normally mapped at a normal userspace address which
> >>>>>> cannot be mapped into application address spaces (since it would conflict
> >>>>>> with application mappings). So there is a sentry page table with the normal
> >>>>>> mappings, plus a mirror of relevant sentry mappings in the kernel range
> >>>>>> (bit 63 set) in all application page tables. This mirrored copy is what
> >>>>>> executes between jumpToKernel() and jumpToUser().
> >>>>>>
> >>>>>> iret()/sysret() save RSP/RBP so that the syscall handler (sysenter())
> >>>>>> can restore them and then "return" to the call site in SwitchToUser.
> >>>>>>
> >>>>>> The full execution path looks like:
> >>>>>> kernel.runApp.execute -> kernel.Task.p.Switch (kvm.context.Switch) ->
> >>>>>> kvm.vCPU.SwitchToUser -> ring0.CPU.SwitchToUser
> >>>>>>
> >>>>>> kernel.runApp is part of the core task lifecycle state machine which
> >>>>>> handles application syscalls (eventually calling one of the handlers
> >>>>>> <https://github.com/google/gvisor/blob/master/pkg/sentry/syscalls/linux/linux64.go#L48>).
> >>>>>> The kernel package is independent of the execution platform.
> >>>>>>
> >>>>>> On Wed, May 9, 2018 at 11:23 PM Xiang Li <xiang...@gmail.com> wrote:
> >>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Wed, May 9, 2018 at 3:43 PM, Michael Pratt <mpr...@google.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Wed, May 9, 2018 at 2:27 PM Xiang Li <xiang...@gmail.com> wrote:
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Wednesday, May 9, 2018 at 1:23:20 PM UTC-7, Michael Pratt wrote:
> >>>>>>>>>>
> >>>>>>>>>> On the KVM platform, system call interception works much like a
> >>>>>>>>>> normal OS. When running in guest mode, the platform sets MSR_LSTAR to a system
> >>>>>>>>>> call handler
> >>>>>>>>>> <https://github.com/google/gvisor/blob/master/pkg/sentry/platform/ring0/entry_amd64.go#L23-L32>,
> >>>>>>>>>> which is invoked whenever an application (or the sentry itself) executes a
> >>>>>>>>>> SYSCALL instruction.
> >>>>>>>>>>
> >>>>>>>>>> System calls from the sentry to the host are a bit more involved,
> >>>>>>>>>> as they require the sentry to switch from guest mode back to host mode
> >>>>>>>>>> before calling into the host kernel.
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Can you explain this a little bit more? When and why would sentry
> >>>>>>>>> issue syscall to the host kernel when it is in the guest mode?
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> The sentry is developed as a normal user-space application (see "How
> >>>>>>>> is gVisor different from other container isolation mechanisms?"
> >>>>>>>> <https://github.com/google/gvisor#how-is-gvisor-different-from-other-container-isolation-mechanisms> and
> >>>>>>>>>>> <https://groups.google.com/d/msgid/gvisor-users/fcd51d0b-3925-45ca-ab36-6e1049b25a47%40googlegroups.com?utm_medium=email&utm_source=footer>
> >>>>>>>>>>> .
> >>>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
> >>>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>> You received this message because you are subscribed to the Google
> >>>>>>>>> Groups "gVisor Users" group.
> >>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
> >>>>>>>>> send an email to gvisor-users...@googlegroups.com.
> >>>>>>>>> To post to this group, send email to gvisor...@googlegroups.com.
> >>>>>>>>> To view this discussion on the web visit
> >>>>>>>>> https://groups.google.com/d/msgid/gvisor-users/0225cff8-0249-488e-94a4-2edb71b6c55d%40googlegroups.com
> >>>>>>>>> .
> >>>>>>>>> For more options, visit https://groups.google.com/d/optout.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>> --
> >>> You received this message because you are subscribed to the Google
> >>> Groups "gVisor Users" group.
> >>> To unsubscribe from this group and stop receiving emails from it, send
> >>> an email to gvisor-users...@googlegroups.com.
> >>> To post to this group, send email to gvisor...@googlegroups.com.
> >>> To view this discussion on the web visit
> >>> https://groups.google.com/d/msgid/gvisor-users/e620e264-6eb9-4eef-993c-02b8ac182a9c%40googlegroups.com
> >>> <https://groups.google.com/d/msgid/gvisor-users/e620e264-6eb9-4eef-993c-02b8ac182a9c%40googlegroups.com?utm_medium=email&utm_source=footer>
> >>> .
> >>> For more options, visit https://groups.google.com/d/optout.
> >>>
> >>
>
> --
> You received this message because you are subscribed to the Google Groups "gVisor Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to gvisor-users...@googlegroups.com.
> To post to this group, send email to gvisor...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/gvisor-users/ef88a9c7-f0c1-47e8-b793-eaa5e609a6ff%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number: 302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
AFIAU the sentry kernel runs on ring0 VMX non-root, which is host's
ring3/user space.
On Mon, May 21, 2018 at 03:03:06AM -0700, Qixuan Wu wrote:
> And if sentry is running on ring0, why they are called user-space kernel. ?
>
> 在 2018年5月21日星期一 UTC+8上午11:04:30,Qixuan Wu写道:
> >
> > As per your discussion, Sentry kernel is running in the ring0 of guest
> > mode, so the picture should be like this, right ?
> >
> > Ring 3 User App
> > ------------------------------------------------ guest
> > Ring 0 Sentry
> >
> > ///////////////////////////////////////////////////////////////////////
> >
> > Ring 3 Sentry.kvm_platform host
To view this discussion on the web visit https://groups.google.com/d/msgid/gvisor-users/20180521110207.GA8431%40zurbaran.ger.intel.com.
AFIAU the sentry kernel runs on ring0 VMX non-root, which is host's
ring3/user space.Most importantly, gVisor (in either ptrace or kvm mode) depends on normal system calls to the host kernel like any other user-space program, rather than virtualized block devices, NICs, etc like a standard virtual machine. KVM provides access to Intel VMX/AMD SVM hardware virtualization features which are used primarily for fast system call interception. We don't use KVM to build a full "virtual machine" with other virtual hardware like a standard VMM.
On Mon, May 21, 2018 at 03:03:06AM -0700, Qixuan Wu wrote:
> And if sentry is running on ring0, why they are called user-space kernel. ?
>
> 在 2018年5月21日星期一 UTC+8上午11:04:30,Qixuan Wu写道:
> >
> > As per your discussion, Sentry kernel is running in the ring0 of guest
> > mode, so the picture should be like this, right ?
> >
> > Ring 3 User App
> > ------------------------------------------------ guest
> > Ring 0 Sentry
> >
> > ///////////////////////////////////////////////////////////////////////
> >
> > Ring 3 Sentry.kvm_platform hostYes, this is more correct. However, it should be noted that host -> guest and guest -> host transitions only occur as necessary (app execution must occur in guest mode, host syscalls must occur in host mode). Otherwise, the sentry remains in whichever mode it happens to be in as long as possible. This means that most sentry operations (e.g., handling a read from tmpfs) may occur in either host or guest mode.