Re: [Stratos-dev] Enabling hypervisor agnosticism for VirtIO backends

2 Sep 2021


      Hi Akashi, Oleksandr,
...
-----Original Message-----
From: Xen-devel xen-devel-bounces@lists.xenproject.org On Behalf Of Wei
Chen
Sent: 2021年9月2日 9:31
To: AKASHI Takahiro takahiro.akashi@linaro.org
Cc: Oleksandr Tyshchenko olekstysh@gmail.com; Stefano Stabellini
sstabellini@kernel.org; Alex Benn??e alex.bennee@linaro.org; Kaly Xin
Kaly.Xin@arm.com; Stratos Mailing List stratos-dev@op-lists.linaro.org;
virtio-dev@lists.oasis-open.org; Arnd Bergmann arnd.bergmann@linaro.org;
Viresh Kumar viresh.kumar@linaro.org; Stefano Stabellini
stefano.stabellini@xilinx.com; stefanha@redhat.com; Jan Kiszka
jan.kiszka@siemens.com; Carl van Schaik cvanscha@qti.qualcomm.com;
pratikp@quicinc.com; Srivatsa Vaddagiri vatsa@codeaurora.org; Jean-
Philippe Brucker jean-philippe@linaro.org; Mathieu Poirier
mathieu.poirier@linaro.org; Oleksandr Tyshchenko
Oleksandr_Tyshchenko@epam.com; Bertrand Marquis
Bertrand.Marquis@arm.com; Artem Mygaiev Artem_Mygaiev@epam.com; Julien
Grall julien@xen.org; Juergen Gross jgross@suse.com; Paul Durrant
paul@xen.org; nd nd@arm.com; Xen Devel xen-devel@lists.xen.org
Subject: RE: Enabling hypervisor agnosticism for VirtIO backends
Hi Akashi,
...
-----Original Message-----
From: AKASHI Takahiro takahiro.akashi@linaro.org
Sent: 2021年9月1日 20:29
To: Wei Chen Wei.Chen@arm.com
Cc: Oleksandr Tyshchenko olekstysh@gmail.com; Stefano Stabellini
sstabellini@kernel.org; Alex Benn??e alex.bennee@linaro.org; Kaly
Xin
...
Kaly.Xin@arm.com; Stratos Mailing List <stratos-dev@op-
lists.linaro.org>;
...
virtio-dev@lists.oasis-open.org; Arnd Bergmann
arnd.bergmann@linaro.org;
...
Viresh Kumar viresh.kumar@linaro.org; Stefano Stabellini
stefano.stabellini@xilinx.com; stefanha@redhat.com; Jan Kiszka
jan.kiszka@siemens.com; Carl van Schaik cvanscha@qti.qualcomm.com;
pratikp@quicinc.com; Srivatsa Vaddagiri vatsa@codeaurora.org; Jean-
Philippe Brucker jean-philippe@linaro.org; Mathieu Poirier
mathieu.poirier@linaro.org; Oleksandr Tyshchenko
Oleksandr_Tyshchenko@epam.com; Bertrand Marquis
Bertrand.Marquis@arm.com; Artem Mygaiev Artem_Mygaiev@epam.com;
Julien
...
Grall julien@xen.org; Juergen Gross jgross@suse.com; Paul Durrant
paul@xen.org; nd nd@arm.com; Xen Devel xen-devel@lists.xen.org
Subject: Re: Enabling hypervisor agnosticism for VirtIO backends
Hi Wei,
On Wed, Sep 01, 2021 at 11:12:58AM +0000, Wei Chen wrote:
...
Hi Akashi,
...
-----Original Message-----
From: AKASHI Takahiro takahiro.akashi@linaro.org
Sent: 2021年8月31日 14:18
To: Wei Chen Wei.Chen@arm.com
Cc: Oleksandr Tyshchenko olekstysh@gmail.com; Stefano Stabellini
sstabellini@kernel.org; Alex Benn??e alex.bennee@linaro.org;
Kaly
...
Xin
...
...
Kaly.Xin@arm.com; Stratos Mailing List <stratos-dev@op-
lists.linaro.org>;
...
...
virtio-dev@lists.oasis-open.org; Arnd Bergmann
arnd.bergmann@linaro.org;
...
...
Viresh Kumar viresh.kumar@linaro.org; Stefano Stabellini
stefano.stabellini@xilinx.com; stefanha@redhat.com; Jan Kiszka
jan.kiszka@siemens.com; Carl van Schaik
cvanscha@qti.qualcomm.com;
...
...
...
pratikp@quicinc.com; Srivatsa Vaddagiri vatsa@codeaurora.org;
Jean-
...
...
...
Philippe Brucker jean-philippe@linaro.org; Mathieu Poirier
mathieu.poirier@linaro.org; Oleksandr Tyshchenko
Oleksandr_Tyshchenko@epam.com; Bertrand Marquis
Bertrand.Marquis@arm.com; Artem Mygaiev Artem_Mygaiev@epam.com;
Julien
...
...
Grall julien@xen.org; Juergen Gross jgross@suse.com; Paul
Durrant
...
...
...
paul@xen.org; Xen Devel xen-devel@lists.xen.org
Subject: Re: Enabling hypervisor agnosticism for VirtIO backends
Wei,
On Thu, Aug 26, 2021 at 12:10:19PM +0000, Wei Chen wrote:
...
Hi Akashi,
...
-----Original Message-----
From: AKASHI Takahiro takahiro.akashi@linaro.org
Sent: 2021年8月26日 17:41
To: Wei Chen Wei.Chen@arm.com
Cc: Oleksandr Tyshchenko olekstysh@gmail.com; Stefano
Stabellini
...
...
...
...
...
sstabellini@kernel.org; Alex Benn??e alex.bennee@linaro.org;
Kaly
...
...
Xin
...
...
Kaly.Xin@arm.com; Stratos Mailing List <stratos-dev@op-
lists.linaro.org>;
...
...
virtio-dev@lists.oasis-open.org; Arnd Bergmann
arnd.bergmann@linaro.org;
...
...
Viresh Kumar viresh.kumar@linaro.org; Stefano Stabellini
stefano.stabellini@xilinx.com; stefanha@redhat.com; Jan Kiszka
jan.kiszka@siemens.com; Carl van Schaik
cvanscha@qti.qualcomm.com;
...
...
...
...
pratikp@quicinc.com; Srivatsa Vaddagiri vatsa@codeaurora.org;
Jean-
...
...
...
...
Philippe Brucker jean-philippe@linaro.org; Mathieu Poirier
mathieu.poirier@linaro.org; Oleksandr Tyshchenko
Oleksandr_Tyshchenko@epam.com; Bertrand Marquis
Bertrand.Marquis@arm.com; Artem Mygaiev
Artem_Mygaiev@epam.com;
...
...
...
Julien
...
...
Grall julien@xen.org; Juergen Gross jgross@suse.com; Paul
Durrant
...
...
...
...
paul@xen.org; Xen Devel xen-devel@lists.xen.org
Subject: Re: Enabling hypervisor agnosticism for VirtIO backends
Hi Wei,
On Fri, Aug 20, 2021 at 03:41:50PM +0900, AKASHI Takahiro wrote:
> On Wed, Aug 18, 2021 at 08:35:51AM +0000, Wei Chen wrote:
> > Hi Akashi，
> >
> > > -----Original Message-----
> > > From: AKASHI Takahiro takahiro.akashi@linaro.org
> > > Sent: 2021年8月18日 13:39
> > > To: Wei Chen Wei.Chen@arm.com
> > > Cc: Oleksandr Tyshchenko olekstysh@gmail.com; Stefano
Stabellini
...
...
> > > sstabellini@kernel.org; Alex Benn??e
alex.bennee@linaro.org;
...
...
...
...
Stratos
> > > Mailing List stratos-dev@op-lists.linaro.org; virtio-
dev@lists.oasis-
> > > open.org; Arnd Bergmann arnd.bergmann@linaro.org; Viresh
Kumar
...
...
...
...
> > > viresh.kumar@linaro.org; Stefano Stabellini
> > > stefano.stabellini@xilinx.com; stefanha@redhat.com; Jan
Kiszka
...
...
...
...
> > > jan.kiszka@siemens.com; Carl van Schaik
cvanscha@qti.qualcomm.com;
> > > pratikp@quicinc.com; Srivatsa Vaddagiri
vatsa@codeaurora.org;
...
...
...
...
Jean-
> > > Philippe Brucker jean-philippe@linaro.org; Mathieu
Poirier
...
...
...
...
...
> > > mathieu.poirier@linaro.org; Oleksandr Tyshchenko
> > > Oleksandr_Tyshchenko@epam.com; Bertrand Marquis
> > > Bertrand.Marquis@arm.com; Artem Mygaiev
Artem_Mygaiev@epam.com;
...
...
Julien
> > > Grall julien@xen.org; Juergen Gross jgross@suse.com;
Paul
...
...
...
...
Durrant
> > > paul@xen.org; Xen Devel xen-devel@lists.xen.org
> > > Subject: Re: Enabling hypervisor agnosticism for VirtIO
backends
...
...
...
...
> > >
> > > On Tue, Aug 17, 2021 at 08:39:09AM +0000, Wei Chen wrote:
> > > > Hi Akashi,
> > > >
> > > > > -----Original Message-----
> > > > > From: AKASHI Takahiro takahiro.akashi@linaro.org
> > > > > Sent: 2021年8月17日 16:08
> > > > > To: Wei Chen Wei.Chen@arm.com
> > > > > Cc: Oleksandr Tyshchenko olekstysh@gmail.com;
Stefano
...
...
...
...
...
Stabellini
> > > > > sstabellini@kernel.org; Alex Benn??e
alex.bennee@linaro.org;
...
...
> > > Stratos
> > > > > Mailing List stratos-dev@op-lists.linaro.org;
virtio-
...
...
...
...
...
> > > dev@lists.oasis-
> > > > > open.org; Arnd Bergmann arnd.bergmann@linaro.org;
Viresh
...
...
Kumar
...
...
> > > > > viresh.kumar@linaro.org; Stefano Stabellini
> > > > > stefano.stabellini@xilinx.com; stefanha@redhat.com;
Jan
...
...
Kiszka
...
...
> > > > > jan.kiszka@siemens.com; Carl van Schaik
cvanscha@qti.qualcomm.com;
> > > > > pratikp@quicinc.com; Srivatsa Vaddagiri
vatsa@codeaurora.org;
...
...
Jean-
> > > > > Philippe Brucker jean-philippe@linaro.org; Mathieu
Poirier
...
...
...
...
> > > > > mathieu.poirier@linaro.org; Oleksandr Tyshchenko
> > > > > Oleksandr_Tyshchenko@epam.com; Bertrand Marquis
> > > > > Bertrand.Marquis@arm.com; Artem Mygaiev
Artem_Mygaiev@epam.com;
> > > Julien
> > > > > Grall julien@xen.org; Juergen Gross
jgross@suse.com;
...
...
...
Paul
...
...
Durrant
> > > > > paul@xen.org; Xen Devel xen-devel@lists.xen.org
> > > > > Subject: Re: Enabling hypervisor agnosticism for
VirtIO
...
...
...
backends
...
...
> > > > >
> > > > > Hi Wei, Oleksandr,
> > > > >
> > > > > On Mon, Aug 16, 2021 at 10:04:03AM +0000, Wei Chen
wrote:
...
...
...
...
...
> > > > > > Hi All,
> > > > > >
> > > > > > Thanks for Stefano to link my kvmtool for Xen
proposal
...
...
...
here.
...
...
> > > > > > This proposal is still discussing in Xen and KVM
communities.
...
...
> > > > > > The main work is to decouple the kvmtool from KVM
and
...
make
...
...
...
...
> > > > > > other hypervisors can reuse the virtual device
implementations.
...
...
> > > > > >
> > > > > > In this case, we need to introduce an intermediate
hypervisor
...
...
> > > > > > layer for VMM abstraction, Which is, I think it's
very
...
...
...
close
...
...
> > > > > > to stratos' virtio hypervisor agnosticism work.
> > > > >
> > > > > # My proposal[1] comes from my own idea and doesn't
always
...
...
...
...
represent
> > > > > # Linaro's view on this subject nor reflect Alex's
concerns.
...
...
...
...
> > > Nevertheless,
> > > > >
> > > > > Your idea and my proposal seem to share the same
background.
...
...
...
...
> > > > > Both have the similar goal and currently start with,
at
...
...
...
first,
...
...
Xen
> > > > > and are based on kvm-tool. (Actually, my work is
derived
...
...
...
from
...
...
> > > > > EPAM's virtio-disk, which is also based on kvm-tool.)
> > > > >
> > > > > In particular, the abstraction of hypervisor
interfaces
...
has
...
...
a
...
...
same
> > > > > set of interfaces (for your "struct vmm_impl" and my
"RPC
...
...
...
...
interfaces").
> > > > > This is not co-incident as we both share the same
origin
...
as
...
...
I
...
...
said
> > > above.
> > > > > And so we will also share the same issues. One of them
is a
...
...
way
...
...
of
> > > > > "sharing/mapping FE's memory". There is some trade-off
between
...
...
> > > > > the portability and the performance impact.
> > > > > So we can discuss the topic here in this ML, too.
> > > > > (See Alex's original email, too).
> > > > >
> > > > Yes, I agree.
> > > >
> > > > > On the other hand, my approach aims to create a
"single-
...
...
...
binary"
...
...
> > > solution
> > > > > in which the same binary of BE vm could run on any
hypervisors.
...
...
> > > > > Somehow similar to your "proposal-#2" in [2], but in
my
...
...
...
solution,
...
...
all
> > > > > the hypervisor-specific code would be put into another
entity
...
...
(VM),
> > > > > named "virtio-proxy" and the abstracted operations are
served
...
...
via RPC.
> > > > > (In this sense, BE is hypervisor-agnostic but might
have
...
OS
...
...
...
...
> > > dependency.)
> > > > > But I know that we need discuss if this is a
requirement
...
...
...
even
...
...
> > > > > in Stratos project or not. (Maybe not)
> > > > >
> > > >
> > > > Sorry, I haven't had time to finish reading your virtio-
proxy
...
...
...
...
completely
> > > > (I will do it ASAP). But from your description, it seems
we
...
...
need a
...
...
> > > > 3rd VM between FE and BE? My concern is that, if my
assumption
...
...
is
...
...
right,
> > > > will it increase the latency in data transport path?
Even
...
if
...
...
we're
...
...
> > > > using some lightweight guest like RTOS or Unikernel,
> > >
> > > Yes, you're right. But I'm afraid that it is a matter of
degree.
...
...
...
...
> > > As far as we execute 'mapping' operations at every fetch
of
...
...
...
payload,
...
...
> > > we will see latency issue (even in your case) and if we
have
...
...
...
some
...
...
solution
> > > for it, we won't see it neither in my proposal :)
> > >
> >
> > Oleksandr has sent a proposal to Xen mailing list to reduce
this
...
...
kind
...
...
> > of "mapping/unmapping" operations. So the latency caused by
this
...
...
...
...
behavior
> > on Xen may eventually be eliminated, and Linux-KVM doesn't
have
...
...
that
...
...
problem.
>
> Obviously, I have not yet caught up there in the discussion.
> Which patch specifically?
Can you give me the link to the discussion or patch, please?
It's a RFC discussion. We have tested this RFC patch internally.
https://lists.xenproject.org/archives/html/xen-devel/2021-
07/msg01532.html
I'm afraid that I miss something here, but I don't know
why this proposed API will lead to eliminating 'mmap' in accessing
the queued payload at every request?
This API give Xen device model (QEMU or kvmtool) the ability to map
whole guest RAM in device model's address space. In this case, device
model doesn't need dynamic hypercall to map/unmap payload memory.
It can use a flat offset to access payload memory in its address
space directly. Just Like KVM device model does now.
Thank you. Quickly, let me make sure one thing:
This API itself doesn't do any mapping operations, right?
So I suppose that virtio BE guest is responsible to

fetch the information about all the memory regions in FE,
call this API to allocate a big chunk of unused space in BE,
create grant/foreign mappings for FE onto this region(S)

in the initialization/configuration of emulated virtio devices.
Is this the way this API is expected to be used?
Does Xen already has an interface for (1)?
They are discussing in that thread to find a proper way to do it.
Because this API is common, both x86 and Arm should be considered.
Please ignore my above reply. I hadn't seen Oleksandr had replied
this question. Sorry about it!
...
...
-Takahiro Akashi
...
Before this API, When device model to map whole guest memory, will
severely consume the physical pages of Dom-0/Dom-D.
...
-Takahiro Akashi
...
...
Thanks,
-Takahiro Akashi
> -Takahiro Akashi
>
> > > > > Specifically speaking about kvm-tool, I have a concern
about
...
...
its
...
...
> > > > > license term; Targeting different hypervisors and
different
...
...
OSs
...
...
> > > > > (which I assume includes RTOS's), the resultant
library
...
...
...
should
...
...
be
> > > > > license permissive and GPL for kvm-tool might be an
issue.
...
...
...
...
> > > > > Any thoughts?
> > > > >
> > > >
> > > > Yes. If user want to implement a FreeBSD device model,
but
...
the
...
...
...
...
virtio
> > > > library is GPL. Then GPL would be a problem. If we have
another
...
...
good
> > > > candidate, I am open to it.
> > >
> > > I have some candidates, particularly for vq/vring, in my
mind:
...
...
...
...
> > > * Open-AMP, or
> > > * corresponding Free-BSD code
> > >
> >
> > Interesting, I will look into them : )
> >
> > Cheers,
> > Wei Chen
> >
> > > -Takahiro Akashi
> > >
> > >
> > > > > -Takahiro Akashi
> > > > >
> > > > >
> > > > > [1] https://op-lists.linaro.org/pipermail/stratos-
dev/2021-
...
...
...
...
> > > > > August/000548.html
> > > > > [2] https://marc.info/?l=xen-
devel&m=162373754705233&w=2
...
...
...
...
...
> > > > >
> > > > > >
> > > > > > > From: Oleksandr Tyshchenko olekstysh@gmail.com
> > > > > > > Sent: 2021年8月14日 23:38
> > > > > > > To: AKASHI Takahiro takahiro.akashi@linaro.org;
Stefano
...
...
> > > Stabellini
> > > > > sstabellini@kernel.org
> > > > > > > Cc: Alex Benn??e alex.bennee@linaro.org; Stratos
Mailing
...
...
List
> > > > > stratos-dev@op-lists.linaro.org; virtio-
dev@lists.oasis-
...
...
...
...
open.org;
> > > Arnd
> > > > > Bergmann arnd.bergmann@linaro.org; Viresh Kumar
> > > > > viresh.kumar@linaro.org; Stefano Stabellini
> > > > > stefano.stabellini@xilinx.com; stefanha@redhat.com;
Jan
...
...
Kiszka
...
...
> > > > > jan.kiszka@siemens.com; Carl van Schaik
cvanscha@qti.qualcomm.com;
> > > > > pratikp@quicinc.com; Srivatsa Vaddagiri
vatsa@codeaurora.org;
...
...
Jean-
> > > > > Philippe Brucker jean-philippe@linaro.org; Mathieu
Poirier
...
...
...
...
> > > > > mathieu.poirier@linaro.org; Wei Chen
Wei.Chen@arm.com;
...
...
...
...
Oleksandr
> > > > > Tyshchenko Oleksandr_Tyshchenko@epam.com; Bertrand
Marquis
...
...
...
...
> > > > > Bertrand.Marquis@arm.com; Artem Mygaiev
Artem_Mygaiev@epam.com;
> > > Julien
> > > > > Grall julien@xen.org; Juergen Gross
jgross@suse.com;
...
...
...
Paul
...
...
Durrant
> > > > > paul@xen.org; Xen Devel xen-devel@lists.xen.org
> > > > > > > Subject: Re: Enabling hypervisor agnosticism for
VirtIO
...
...
...
...
backends
> > > > > > >
> > > > > > > Hello, all.
> > > > > > >
> > > > > > > Please see some comments below. And sorry for the
possible
...
...
format
> > > > > issues.
> > > > > > >
> > > > > > > > On Wed, Aug 11, 2021 at 9:27 AM AKASHI Takahiro
> > > > > mailto:takahiro.akashi@linaro.org wrote:
> > > > > > > > On Wed, Aug 04, 2021 at 12:20:01PM -0700,
Stefano
...
...
...
...
...
Stabellini
> > > wrote:
> > > > > > > > > CCing people working on Xen+VirtIO and IOREQs.
Not
...
...
...
...
trimming
> > > the
> > > > > original
> > > > > > > > > email to let them read the full context.
> > > > > > > > >
> > > > > > > > > My comments below are related to a potential
Xen
...
...
...
...
...
> > > implementation,
> > > > > not
> > > > > > > > > because it is the only implementation that
matters,
...
...
but
...
...
> > > because it
> > > > > is
> > > > > > > > > the one I know best.
> > > > > > > >
> > > > > > > > Please note that my proposal (and hence the
working
...
...
...
...
prototype)[1]
> > > > > > > > is based on Xen's virtio implementation (i.e.
IOREQ)
...
...
and
...
...
> > > > > particularly
> > > > > > > > EPAM's virtio-disk application (backend server).
> > > > > > > > It has been, I believe, well generalized but is
still
...
...
a
...
...
bit
> > > biased
> > > > > > > > toward this original design.
> > > > > > > >
> > > > > > > > So I hope you like my approach :)
> > > > > > > >
> > > > > > > > [1] https://op-
lists.linaro.org/pipermail/stratos-
...
...
...
...
...
dev/2021-
> > > > > August/000546.html
> > > > > > > >
> > > > > > > > Let me take this opportunity to explain a bit
more
...
...
...
about
...
...
my
> > > approach
> > > > > below.
> > > > > > > >
> > > > > > > > > Also, please see this relevant email thread:
> > > > > > > > > https://marc.info/?l=xen-
devel&m=162373754705233&w=2
...
...
...
...
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Wed, 4 Aug 2021, Alex Bennée wrote:
> > > > > > > > > > Hi,
> > > > > > > > > >
> > > > > > > > > > One of the goals of Project Stratos is to
enable
...
...
...
...
hypervisor
> > > > > agnostic
> > > > > > > > > > backends so we can enable as much re-use of
code
...
...
as
...
...
possible
> > > and
> > > > > avoid
> > > > > > > > > > repeating ourselves. This is the flip side
of
...
the
...
...
...
...
front end
> > > > > where
> > > > > > > > > > multiple front-end implementations are
required -
...
...
one
...
...
per OS,
> > > > > assuming
> > > > > > > > > > you don't just want Linux guests. The
resultant
...
...
guests
...
...
are
> > > > > trivially
> > > > > > > > > > movable between hypervisors modulo any
abstracted
...
...
...
...
paravirt
> > > type
> > > > > > > > > > interfaces.
> > > > > > > > > >
> > > > > > > > > > In my original thumb nail sketch of a
solution
...
I
...
...
...
...
envisioned
> > > > > vhost-user
> > > > > > > > > > daemons running in a broadly POSIX like
environment.
...
...
The
> > > > > interface to
> > > > > > > > > > the daemon is fairly simple requiring only
some
...
...
mapped
...
...
> > > memory
> > > > > and some
> > > > > > > > > > sort of signalling for events (on Linux this
is
...
...
...
...
eventfd).
> > > The
> > > > > idea was a
> > > > > > > > > > stub binary would be responsible for any
hypervisor
...
...
specific
> > > > > setup and
> > > > > > > > > > then launch a common binary to deal with the
actual
...
...
> > > virtqueue
> > > > > requests
> > > > > > > > > > themselves.
> > > > > > > > > >
> > > > > > > > > > Since that original sketch we've seen an
expansion
...
...
in
...
...
the
> > > sort
> > > > > of ways
> > > > > > > > > > backends could be created. There is interest
in
...
...
...
...
> > > encapsulating
> > > > > backends
> > > > > > > > > > in RTOSes or unikernels for solutions like
SCMI.
...
...
There
...
...
> > > interest
> > > > > in Rust
> > > > > > > > > > has prompted ideas of using the trait
interface to
...
...
...
...
abstract
> > > > > differences
> > > > > > > > > > away as well as the idea of bare-metal Rust
backends.
...
...
> > > > > > > > > >
> > > > > > > > > > We have a card (STR-12) called "Hypercall
Standardisation"
> > > which
> > > > > > > > > > calls for a description of the APIs needed
from
...
...
the
...
...
> > > hypervisor
> > > > > side to
> > > > > > > > > > support VirtIO guests and their backends.
However
...
...
we
...
...
are
> > > some
> > > > > way off
> > > > > > > > > > from that at the moment as I think we need
to
...
at
...
...
least
...
...
> > > > > demonstrate one
> > > > > > > > > > portable backend before we start codifying
requirements. To
> > > that
> > > > > end I
> > > > > > > > > > want to think about what we need for a
backend
...
to
...
...
...
...
function.
> > > > > > > > > >
> > > > > > > > > > Configuration
> > > > > > > > > > =============
> > > > > > > > > >
> > > > > > > > > > In the type-2 setup this is typically fairly
simple
...
...
because
> > > the
> > > > > host
> > > > > > > > > > system can orchestrate the various modules
that
...
...
make
...
...
up the
> > > > > complete
> > > > > > > > > > system. In the type-1 case (or even type-2
with
...
...
...
...
delegated
> > > > > service VMs)
> > > > > > > > > > we need some sort of mechanism to inform the
backend
...
...
VM
> > > about
> > > > > key
> > > > > > > > > > details about the system:
> > > > > > > > > >
> > > > > > > > > >   - where virt queue memory is in it's
address
...
...
...
space
...
...
> > > > > > > > > >   - how it's going to receive (interrupt)
and
...
...
...
trigger
...
...
(kick)
> > > > > events
> > > > > > > > > >   - what (if any) resources the backend
needs
...
to
...
...
...
...
connect to
> > > > > > > > > >
> > > > > > > > > > Obviously you can elide over configuration
issues
...
...
by
...
...
having
> > > > > static
> > > > > > > > > > configurations and baking the assumptions
into
...
...
...
your
...
...
guest
> > > images
> > > > > however
> > > > > > > > > > this isn't scalable in the long term. The
obvious
...
...
...
...
solution
> > > seems
> > > > > to be
> > > > > > > > > > extending a subset of Device Tree data to
user
...
...
...
space
...
...
but
> > > perhaps
> > > > > there
> > > > > > > > > > are other approaches?
> > > > > > > > > >
> > > > > > > > > > Before any virtio transactions can take
place
...
the
...
...
...
...
> > > appropriate
> > > > > memory
> > > > > > > > > > mappings need to be made between the FE
guest
...
and
...
...
the
...
...
BE
> > > guest.
> > > > > > > > >
> > > > > > > > > > Currently the whole of the FE guests address
space
...
...
...
...
needs to
> > > be
> > > > > visible
> > > > > > > > > > to whatever is serving the virtio requests.
I
...
can
...
...
...
...
envision 3
> > > > > approaches:
> > > > > > > > > >
> > > > > > > > > >  * BE guest boots with memory already mapped
> > > > > > > > > >
> > > > > > > > > >  This would entail the guest OS knowing
where
...
in
...
...
it's
...
...
Guest
> > > > > Physical
> > > > > > > > > >  Address space is already taken up and
avoiding
...
...
...
...
clashing. I
> > > > > would assume
> > > > > > > > > >  in this case you would want a standard
interface
...
...
to
...
...
> > > userspace
> > > > > to then
> > > > > > > > > >  make that address space visible to the
backend
...
...
daemon.
...
...
> > > > > > > >
> > > > > > > > Yet another way here is that we would have well
known
...
...
...
...
"shared
> > > > > memory" between
> > > > > > > > VMs. I think that Jailhouse's ivshmem gives us
good
...
...
...
...
insights on
> > > this
> > > > > matter
> > > > > > > > and that it can even be an alternative for
hypervisor-
...
...
...
...
agnostic
> > > > > solution.
> > > > > > > >
> > > > > > > > (Please note memory regions in ivshmem appear as
a
...
PCI
...
...
...
...
device
> > > and
> > > > > can be
> > > > > > > > mapped locally.)
> > > > > > > >
> > > > > > > > I want to add this shared memory aspect to my
virtio-
...
...
proxy,
...
...
but
> > > > > > > > the resultant solution would eventually look
similar
...
...
to
...
...
ivshmem.
> > > > > > > >
> > > > > > > > > >  * BE guests boots with a hypervisor handle
to
...
...
...
memory
...
...
> > > > > > > > > >
> > > > > > > > > >  The BE guest is then free to map the FE's
memory
...
...
to
...
...
where
> > > it
> > > > > wants in
> > > > > > > > > >  the BE's guest physical address space.
> > > > > > > > >
> > > > > > > > > I cannot see how this could work for Xen.
There
...
is
...
...
no
...
...
"handle"
> > > to
> > > > > give
> > > > > > > > > to the backend if the backend is not running
in
...
dom0.
...
...
So
...
...
for
> > > Xen I
> > > > > think
> > > > > > > > > the memory has to be already mapped
> > > > > > > >
> > > > > > > > In Xen's IOREQ solution (virtio-blk), the
following
...
...
...
...
information
> > > is
> > > > > expected
> > > > > > > > to be exposed to BE via Xenstore:
> > > > > > > > (I know that this is a tentative approach
though.)
...
...
...
...
...
> > > > > > > >    - the start address of configuration space
> > > > > > > >    - interrupt number
> > > > > > > >    - file path for backing storage
> > > > > > > >    - read-only flag
> > > > > > > > And the BE server have to call a particular
hypervisor
...
...
...
...
interface
> > > to
> > > > > > > > map the configuration space.
> > > > > > >
> > > > > > > Yes, Xenstore was chosen as a simple way to pass
configuration
> > > info to
> > > > > the backend running in a non-toolstack domain.
> > > > > > > I remember, there was a wish to avoid using
Xenstore
...
in
...
...
...
...
Virtio
> > > backend
> > > > > itself if possible, so for non-toolstack domain, this
could
...
...
done
...
...
with
> > > > > adjusting devd (daemon that listens for devices and
launches
...
...
...
...
backends)
> > > > > > > to read backend configuration from the Xenstore
anyway
...
...
and
...
...
pass it
> > > to
> > > > > the backend via command line arguments.
> > > > > > >
> > > > > >
> > > > > > Yes, in current PoC code we're using xenstore to
pass
...
...
...
device
...
...
> > > > > configuration.
> > > > > > We also designed a static device configuration parse
method
...
...
for
> > > Dom0less
> > > > > or
> > > > > > other scenarios don't have xentool. yes, it's from
device
...
...
...
...
model
> > > command
> > > > > line
> > > > > > or a config file.
> > > > > >
> > > > > > > But, if ...
> > > > > > >
> > > > > > > >
> > > > > > > > In my approach (virtio-proxy), all those Xen (or
hypervisor)-
> > > > > specific
> > > > > > > > stuffs are contained in virtio-proxy, yet
another
...
VM,
...
...
to
...
...
hide
> > > all
> > > > > details.
> > > > > > >
> > > > > > > ... the solution how to overcome that is already
found
...
...
and
...
...
proven
> > > to
> > > > > work then even better.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > # My point is that a "handle" is not mandatory
for
...
...
...
...
...
executing
> > > mapping.
> > > > > > > >
> > > > > > > > > and the mapping probably done by the
> > > > > > > > > toolstack (also see below.) Or we would have
to
...
...
...
invent a
...
...
new
> > > Xen
> > > > > > > > > hypervisor interface and Xen virtual machine
privileges
...
...
to
> > > allow
> > > > > this
> > > > > > > > > kind of mapping.
> > > > > > > >
> > > > > > > > > If we run the backend in Dom0 that we have no
problems
...
...
of
> > > course.
> > > > > > > >
> > > > > > > > One of difficulties on Xen that I found in my
approach
...
...
is
...
...
that
> > > > > calling
> > > > > > > > such hypervisor intefaces (registering IOREQ,
mapping
...
...
...
...
memory) is
> > > > > only
> > > > > > > > allowed on BE servers themselvies and so we will
have
...
...
to
...
...
extend
> > > > > those
> > > > > > > > interfaces.
> > > > > > > > This, however, will raise some concern on
security
...
and
...
...
...
...
privilege
> > > > > distribution
> > > > > > > > as Stefan suggested.
> > > > > > >
> > > > > > > We also faced policy related issues with Virtio
backend
...
...
...
...
running in
> > > > > other than Dom0 domain in a "dummy" xsm mode. In our
target
...
...
...
...
system we
> > > run
> > > > > the backend in a driver
> > > > > > > domain (we call it DomD) where the underlying H/W
resides.
...
...
We
> > > trust it,
> > > > > so we wrote policy rules (to be used in "flask" xsm
mode)
...
to
...
...
...
...
provide
> > > it
> > > > > with a little bit more privileges than a simple DomU
had.
...
...
...
...
...
> > > > > > > Now it is permitted to issue device-model,
resource
...
and
...
...
...
...
memory
> > > > > mappings, etc calls.
> > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > To activate the mapping will
> > > > > > > > > >  require some sort of hypercall to the
hypervisor.
...
...
I
...
...
can see
> > > two
> > > > > options
> > > > > > > > > >  at this point:
> > > > > > > > > >
> > > > > > > > > >   - expose the handle to userspace for
daemon/helper
...
...
to
> > > trigger
> > > > > the
> > > > > > > > > >     mapping via existing hypercall
interfaces.
...
If
...
...
...
...
using a
> > > helper
> > > > > you
> > > > > > > > > >     would have a hypervisor specific one to
avoid
...
...
the
...
...
daemon
> > > > > having to
> > > > > > > > > >     care too much about the details or push
that
...
...
...
...
complexity
> > > into
> > > > > a
> > > > > > > > > >     compile time option for the daemon which
would
...
...
...
...
result in
> > > > > different
> > > > > > > > > >     binaries although a common source base.
> > > > > > > > > >
> > > > > > > > > >   - expose a new kernel ABI to abstract the
hypercall
...
...
> > > > > differences away
> > > > > > > > > >     in the guest kernel. In this case the
userspace
...
...
would
> > > > > essentially
> > > > > > > > > >     ask for an abstract "map guest N memory
to
...
...
...
...
...
userspace
> > > ptr"
> > > > > and let
> > > > > > > > > >     the kernel deal with the different
hypercall
...
...
...
...
interfaces.
> > > > > This of
> > > > > > > > > >     course assumes the majority of BE guests
would
...
...
be
...
...
Linux
> > > > > kernels and
> > > > > > > > > >     leaves the bare-metal/unikernel
approaches
...
to
...
...
...
...
their own
> > > > > devices.
> > > > > > > > > >
> > > > > > > > > > Operation
> > > > > > > > > > =========
> > > > > > > > > >
> > > > > > > > > > The core of the operation of VirtIO is
fairly
...
...
...
simple.
...
...
Once
> > > the
> > > > > > > > > > vhost-user feature negotiation is done it's
a
...
case
...
...
of
...
...
> > > receiving
> > > > > update
> > > > > > > > > > events and parsing the resultant virt queue
for
...
...
data.
...
...
The
> > > vhost-
> > > > > user
> > > > > > > > > > specification handles a bunch of setup
before
...
that
...
...
...
...
point,
> > > mostly
> > > > > to
> > > > > > > > > > detail where the virt queues are set up FD's
for
...
...
...
...
memory and
> > > > > event
> > > > > > > > > > communication. This is where the envisioned
stub
...
...
...
...
process
> > > would
> > > > > be
> > > > > > > > > > responsible for getting the daemon up and
ready to
...
...
run.
...
...
This
> > > is
> > > > > > > > > > currently done inside a big VMM like QEMU
but
...
I
...
...
...
...
suspect a
> > > modern
> > > > > > > > > > approach would be to use the rust-vmm vhost
crate.
...
...
It
...
...
would
> > > then
> > > > > either
> > > > > > > > > > communicate with the kernel's abstracted ABI
or be
...
...
re-
...
...
> > > targeted
> > > > > as a
> > > > > > > > > > build option for the various hypervisors.
> > > > > > > > >
> > > > > > > > > One thing I mentioned before to Alex is that
Xen
...
...
...
doesn't
...
...
have
> > > VMMs
> > > > > the
> > > > > > > > > way they are typically envisioned and
described
...
in
...
...
other
...
...
> > > > > environments.
> > > > > > > > > Instead, Xen has IOREQ servers. Each of them
connects
...
...
> > > > > independently to
> > > > > > > > > Xen via the IOREQ interface. E.g. today
multiple
...
...
...
QEMUs
...
...
could
> > > be
> > > > > used as
> > > > > > > > > emulators for a single Xen VM, each of them
connecting
...
...
to Xen
> > > > > > > > > independently via the IOREQ interface.
> > > > > > > > >
> > > > > > > > > The component responsible for starting a
daemon
...
...
...
and/or
...
...
setting
> > > up
> > > > > shared
> > > > > > > > > interfaces is the toolstack: the xl command
and
...
the
...
...
...
...
> > > libxl/libxc
> > > > > > > > > libraries.
> > > > > > > >
> > > > > > > > I think that VM configuration management (or
orchestration
...
...
in
> > > > > Startos
> > > > > > > > jargon?) is a subject to debate in parallel.
> > > > > > > > Otherwise, is there any good assumption to avoid
it
...
...
right
...
...
now?
> > > > > > > >
> > > > > > > > > Oleksandr and others I CCed have been working
on
...
...
...
ways
...
...
for the
> > > > > toolstack
> > > > > > > > > to create virtio backends and setup memory
mappings.
...
...
...
...
They
> > > might be
> > > > > able
> > > > > > > > > to provide more info on the subject. I do
think
...
we
...
...
miss
...
...
a way
> > > to
> > > > > provide
> > > > > > > > > the configuration to the backend and anything
else
...
...
that
...
...
the
> > > > > backend
> > > > > > > > > might require to start doing its job.
> > > > > > >
> > > > > > > Yes, some work has been done for the toolstack to
handle
...
...
...
...
Virtio
> > > MMIO
> > > > > devices in
> > > > > > > general and Virtio block devices in particular.
However,
...
...
it
...
...
has
> > > not
> > > > > been upstreaned yet.
> > > > > > > Updated patches on review now:
> > > > > > > https://lore.kernel.org/xen-devel/1621626361-
29076-
...
1-
...
...
git-
...
...
send-
> > > email-
> > > > > olekstysh@gmail.com/
> > > > > > >
> > > > > > > There is an additional (also important) activity
to
...
...
...
...
...
improve/fix
> > > > > foreign memory mapping on Arm which I am also involved
in.
...
...
...
...
> > > > > > > The foreign memory mapping is proposed to be used
for
...
...
Virtio
...
...
> > > backends
> > > > > (device emulators) if there is a need to run guest OS
completely
...
...
> > > > > unmodified.
> > > > > > > Of course, the more secure way would be to use
grant
...
...
...
memory
...
...
> > > mapping.
> > > > > Brietly, the main difference between them is that with
foreign
...
...
mapping
> > > the
> > > > > backend
> > > > > > > can map any guest memory it wants to map, but with
grant
...
...
...
...
mapping
> > > it is
> > > > > allowed to map only what was previously granted by the
frontend.
...
...
> > > > > > >
> > > > > > > So, there might be a problem if we want to pre-map
some
...
...
...
...
guest
> > > memory
> > > > > in advance or to cache mappings in the backend in
order
...
to
...
...
...
...
improve
> > > > > performance (because the mapping/unmapping guest pages
every
...
...
...
...
request
> > > > > requires a lot of back and forth to Xen + P2M updates).
In a
...
...
...
...
nutshell,
> > > > > currently, in order to map a guest page into the
backend
...
...
...
address
...
...
space
> > > we
> > > > > need to steal a real physical page from the backend
domain.
...
...
So,
...
...
with
> > > the
> > > > > said optimizations we might end up with no free memory
in
...
...
the
...
...
backend
> > > > > domain (see XSA-300). And what we try to achieve is to
not
...
...
waste
...
...
a
> > > real
> > > > > domain memory at all by providing safe non-allocated-
yet
...
(so
...
...
...
...
unused)
> > > > > address space for the foreign (and grant) pages to be
mapped
...
...
...
...
into,
> > > this
> > > > > enabling work implies Xen and Linux (and likely DTB
bindings)
...
...
...
...
changes.
> > > > > However, as it turned out, for this to work in a
proper
...
and
...
...
safe
...
...
way
> > > some
> > > > > prereq work needs to be done.
> > > > > > > You can find the related Xen discussion at:
> > > > > > > https://lore.kernel.org/xen-devel/1627489110-
25633-
...
1-
...
...
git-
...
...
send-
> > > email-
> > > > > olekstysh@gmail.com/
> > > > > > >
> > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > One question is how to best handle
notification
...
...
and
...
...
kicks.
> > > The
> > > > > existing
> > > > > > > > > > vhost-user framework uses eventfd to signal
the
...
...
daemon
...
...
> > > (although
> > > > > QEMU
> > > > > > > > > > is quite capable of simulating them when you
use
...
...
TCG).
...
...
Xen
> > > has
> > > > > it's own
> > > > > > > > > > IOREQ mechanism. However latency is an
important
...
...
...
...
factor and
> > > > > having
> > > > > > > > > > events go through the stub would add quite a
lot.
...
...
...
...
> > > > > > > > >
> > > > > > > > > Yeah I think, regardless of anything else, we
want
...
...
the
...
...
> > > backends to
> > > > > > > > > connect directly to the Xen hypervisor.
> > > > > > > >
> > > > > > > > In my approach,
> > > > > > > >  a) BE -> FE: interrupts triggered by BE calling
a
...
...
...
...
...
hypervisor
> > > > > interface
> > > > > > > >               via virtio-proxy
> > > > > > > >  b) FE -> BE: MMIO to config raises events (in
event
...
...
...
...
channels),
> > > > > which is
> > > > > > > >               converted to a callback to BE via
virtio-
...
...
proxy
> > > > > > > >               (Xen's event channel is
internnally
...
...
...
...
...
implemented by
> > > > > interrupts.)
> > > > > > > >
> > > > > > > > I don't know what "connect directly" means here,
but
...
...
...
...
sending
> > > > > interrupts
> > > > > > > > to the opposite side would be best efficient.
> > > > > > > > Ivshmem, I suppose, takes this approach by
utilizing
...
...
PCI's
...
...
msi-x
> > > > > mechanism.
> > > > > > >
> > > > > > > Agree that MSI would be more efficient than SPI...
> > > > > > > At the moment, in order to notify the frontend,
the
...
...
...
backend
...
...
issues
> > > a
> > > > > specific device-model call to query Xen to inject a
corresponding SPI
> > > to
> > > > > the guest.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > >
> > > > > > > > > > Could we consider the kernel internally
converting
...
...
...
...
IOREQ
> > > > > messages from
> > > > > > > > > > the Xen hypervisor to eventfd events? Would
this
...
...
scale
...
...
with
> > > > > other kernel
> > > > > > > > > > hypercall interfaces?
> > > > > > > > > >
> > > > > > > > > > So any thoughts on what directions are worth
experimenting
> > > with?
> > > > > > > > >
> > > > > > > > > One option we should consider is for each
backend to
...
...
...
...
connect
> > > to
> > > > > Xen via
> > > > > > > > > the IOREQ interface. We could generalize the
IOREQ
...
...
...
...
interface
> > > and
> > > > > make it
> > > > > > > > > hypervisor agnostic. The interface is really
trivial
...
...
and
...
...
easy
> > > to
> > > > > add.
> > > > > > > >
> > > > > > > > As I said above, my proposal does the same thing
that
...
...
you
...
...
> > > mentioned
> > > > > here :)
> > > > > > > > The difference is that I do call hypervisor
interfaces
...
...
via
...
...
> > > virtio-
> > > > > proxy.
> > > > > > > >
> > > > > > > > > The only Xen-specific part is the notification
mechanism,
...
...
> > > which is
> > > > > an
> > > > > > > > > event channel. If we replaced the event
channel
...
with
...
...
...
...
something
> > > > > else the
> > > > > > > > > interface would be generic. See:
> > > > > > > > > https://gitlab.com/xen-project/xen/-
> > > > > /blob/staging/xen/include/public/hvm/ioreq.h#L52
> > > > > > > > >
> > > > > > > > > I don't think that translating IOREQs to
eventfd
...
in
...
...
the
...
...
kernel
> > > is
> > > > > a
> > > > > > > > > good idea: if feels like it would be extra
complexity
...
...
and that
> > > the
> > > > > > > > > kernel shouldn't be involved as this is a
backend-
...
...
...
...
hypervisor
> > > > > interface.
> > > > > > > >
> > > > > > > > Given that we may want to implement BE as a
bare-
...
metal
...
...
...
...
> > > application
> > > > > > > > as I did on Zephyr, I don't think that the
translation
...
...
...
...
would not
> > > be
> > > > > > > > a big issue, especially on RTOS's.
> > > > > > > > It will be some kind of abstraction layer of
interrupt
...
...
...
...
handling
> > > > > > > > (or nothing but a callback mechanism).
> > > > > > > >
> > > > > > > > > Also, eventfd is very Linux-centric and we are
trying to
...
...
> > > design an
> > > > > > > > > interface that could work well for RTOSes too.
If we
...
...
...
...
want to
> > > do
> > > > > > > > > something different, both OS-agnostic and
hypervisor-
...
...
agnostic,
> > > > > perhaps
> > > > > > > > > we could design a new interface. One that
could
...
be
...
...
...
...
> > > implementable
> > > > > in the
> > > > > > > > > Xen hypervisor itself (like IOREQ) and of
course
...
any
...
...
...
...
other
> > > > > hypervisor
> > > > > > > > > too.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > There is also another problem. IOREQ is
probably
...
not
...
...
be
...
...
the
> > > only
> > > > > > > > > interface needed. Have a look at
> > > > > > > > > https://marc.info/?l=xen-
devel&m=162373754705233&w=2.
...
...
...
...
Don't we
> > > > > also need
> > > > > > > > > an interface for the backend to inject
interrupts
...
...
into
...
...
the
> > > > > frontend? And
> > > > > > > > > if the backend requires dynamic memory
mappings
...
of
...
...
...
...
frontend
> > > pages,
> > > > > then
> > > > > > > > > we would also need an interface to map/unmap
domU
...
...
pages.
...
...
> > > > > > > >
> > > > > > > > My proposal document might help here; All the
interfaces
...
...
> > > required
> > > > > for
> > > > > > > > virtio-proxy (or hypervisor-related interfaces)
are
...
...
listed
...
...
as
> > > > > > > > RPC protocols :)
> > > > > > > >
> > > > > > > > > These interfaces are a lot more problematic
than
...
...
...
IOREQ:
...
...
IOREQ
> > > is
> > > > > tiny
> > > > > > > > > and self-contained. It is easy to add anywhere.
A
...
...
new
...
...
> > > interface to
> > > > > > > > > inject interrupts or map pages is more
difficult
...
to
...
...
...
...
manage
> > > because
> > > > > it
> > > > > > > > > would require changes scattered across the
various
...
...
...
...
emulators.
> > > > > > > >
> > > > > > > > Exactly. I have no confident yet that my
approach
...
will
...
...
...
...
also
> > > apply
> > > > > > > > to other hypervisors than Xen.
> > > > > > > > Technically, yes, but whether people can accept
it
...
or
...
...
not
...
...
is a
> > > > > different
> > > > > > > > matter.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > -Takahiro Akashi
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Regards,
> > > > > > >
> > > > > > > Oleksandr Tyshchenko

2026

2025

2024

2023

2022

2021

2020

Re: [Stratos-dev] Enabling hypervisor agnosticism for VirtIO backends