Re: [Stratos-dev] Enabling hypervisor agnosticism for VirtIO backends

10 Sep 2021


      On Mon, Sep 06, 2021 at 07:41:48PM -0700, Christopher Clark wrote:
...
On Sun, Sep 5, 2021 at 7:24 PM AKASHI Takahiro via Stratos-dev <
stratos-dev@op-lists.linaro.org> wrote:
...
Alex,
On Fri, Sep 03, 2021 at 10:28:06AM +0100, Alex Benn??e wrote:
...
AKASHI Takahiro takahiro.akashi@linaro.org writes:
...
Alex,
On Wed, Sep 01, 2021 at 01:53:34PM +0100, Alex Benn??e wrote:
...
Stefan Hajnoczi stefanha@redhat.com writes:
...
[[PGP Signed Part:Undecided]]
On Wed, Aug 04, 2021 at 12:20:01PM -0700, Stefano Stabellini wrote:
> > Could we consider the kernel internally converting IOREQ
messages from
...
...
...
...
> > the Xen hypervisor to eventfd events? Would this scale with
other kernel
...
...
...
...
> > hypercall interfaces?
> >
> > So any thoughts on what directions are worth experimenting with?
>
> One option we should consider is for each backend to connect to
Xen via
...
...
...
...
> the IOREQ interface. We could generalize the IOREQ interface and
make it
...
...
...
...
> hypervisor agnostic. The interface is really trivial and easy to
add.
...
...
...
...
> The only Xen-specific part is the notification mechanism, which is
an
...
...
...
...
> event channel. If we replaced the event channel with something
else the
...
...
...
...
> interface would be generic. See:
>
https://gitlab.com/xen-project/xen/-/blob/staging/xen/include/public/hvm/ior...
...
...
...
...
There have been experiments with something kind of similar in KVM
recently (see struct ioregionfd_cmd):
https://lore.kernel.org/kvm/dad3d025bcf15ece11d9df0ff685e8ab0a4f2edd.1613828...
...
...
...
Reading the cover letter was very useful in showing how this provides
a
...
...
...
separate channel for signalling IO events to userspace instead of
using
...
...
...
the normal type-2 vmexit type event. I wonder how deeply tied the
userspace facing side of this is to KVM? Could it provide a common FD
type interface to IOREQ?
Why do you stick to a "FD" type interface?
I mean most user space interfaces on POSIX start with a file descriptor
and the usual read/write semantics or a series of ioctls.
Who do you assume is responsible for implementing this kind of
fd semantics, OSs on BE or hypervisor itself?
I think such interfaces can only be easily implemented on type-2
hypervisors.
# In this sense, I don't think rust-vmm, as it is, cannot be
# a general solution.
...
...
...
As I understand IOREQ this is currently a direct communication between
userspace and the hypervisor using the existing Xen message bus. My
With IOREQ server, IO event occurrences are notified to BE via Xen's
event
...
...
channel, while the actual contexts of IO events (see struct ioreq in
ioreq.h)
...
...
are put in a queue on a single shared memory page which is to be
assigned
...
...
beforehand with xenforeignmemory_map_resource hypervisor call.
If we abstracted the IOREQ via the kernel interface you would probably
just want to put the ioreq structure on a queue rather than expose the
shared page to userspace.
Where is that queue?
...
...
...
worry would be that by adding knowledge of what the underlying
hypervisor is we'd end up with excess complexity in the kernel. For
one
...
...
...
thing we certainly wouldn't want an API version dependency on the
kernel
...
...
...
to understand which version of the Xen hypervisor it was running on.
That's exactly what virtio-proxy in my proposal[1] does; All the
hypervisor-
...
...
specific details of IO event handlings are contained in virtio-proxy
and virtio BE will communicate with virtio-proxy through a virtqueue
(yes, virtio-proxy is seen as yet another virtio device on BE) and will
get IO event-related *RPC* callbacks, either MMIO read or write, from
virtio-proxy.
See page 8 (protocol flow) and 10 (interfaces) in [1].
There are two areas of concern with the proxy approach at the moment.
The first is how the bootstrap of the virtio-proxy channel happens and
As I said, from BE point of view, virtio-proxy would be seen
as yet another virtio device by which BE could talk to "virtio
proxy" vm or whatever else.
This way we guarantee BE's hypervisor-agnosticism instead of having
"common" hypervisor interfaces. That is the base of my idea.
...
the second is how many context switches are involved in a transaction.
Of course with all things there is a trade off. Things involving the
very tightest latency would probably opt for a bare metal backend which
I think would imply hypervisor knowledge in the backend binary.
In configuration phase of virtio device, the latency won't be a big matter.
In device operations (i.e. read/write to block devices), if we can
resolve 'mmap' issue, as Oleksandr is proposing right now, the only issue
is
how efficiently we can deliver notification to the opposite side. Right?
And this is a very common problem whatever approach we would take.
Anyhow, if we do care the latency in my approach, most of virtio-proxy-
related code can be re-implemented just as a stub (or shim?) library
since the protocols are defined as RPCs.
In this case, however, we would lose the benefit of providing "single
binary"
BE.
(I know this is is an arguable requirement, though.)
# Would we better discuss what "hypervisor-agnosticism" means?
Is there a call that you could recommend that we join to discuss this and
the topics of this thread?
Stratos call?
Alex should have more to say.
-Takahiro Akashi
...
There is definitely interest in pursuing a new interface for Argo that can
be implemented in other hypervisors and enable guest binary portability
between them, at least on the same hardware architecture, with VirtIO
transport as a primary use case.
The notes from the Xen Summit Design Session on VirtIO Cross-Project BoF
for Xen and Guest OS, which include context about the several separate
approaches to VirtIO on Xen, have now been posted here:
https://lists.xenproject.org/archives/html/xen-devel/2021-09/msg00472.html
Christopher
...
-Takahiro Akashi

2025

2024

2023

2022

2021

2020

Re: [Stratos-dev] Enabling hypervisor agnosticism for VirtIO backends