Hi,
The following is a breakdown (as best I can figure) of the work needed to demonstrate VirtIO backends in Rust on the Xen hypervisor. It requires work across a number of projects but notably core rust and virtio enabling in the Xen project (building on the work EPAM has already done) and the start of enabling rust-vmm crate to work with Xen.
The first demo is a fairly simple toy to exercise the direct hypercall approach for a unikernel backend. On it's own it isn't super impressive but hopefully serves as a proof of concept for the idea of having backends running in a single exception level where latency will be important.
The second is a much more ambitious bridge between Xen and vhost-user to allow for re-use of the existing vhost-user backends with the bridge acting as a proxy for what would usually be a full VMM in the type-2 hypervisor case. With that in mind the rust-vmm work is only aimed at doing the device emulation and doesn't address the larger question of how type-1 hypervisors can be integrated into the rust-vmm hypervisor model.
A quick note about the estimates. They are exceedingly rough guesses plucked out of the air and I would be grateful for feedback from the appropriate domain experts on if I'm being overly optimistic or pessimistic.
The links to the Stratos JIRA should be at least read accessible to all although they contain the same information as the attached document (albeit with nicer PNG renderings of my ASCII art ;-). There is a Stratos sync-up call next Thursday:
https://calendar.google.com/event?action=TEMPLATE&tmeid=MWpidm5lbzM5Njly...
and I'm sure there will also be discussion in the various projects (hence the wide CC list). The Stratos calls are open to anyone who wants to attend and we welcome feedback from all who are interested.
So on with the work breakdown:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ STRATOS PLANNING FOR 21 TO 22
Alex Bennée ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Table of Contents ─────────────────
1. Xen Rust Bindings ([STR-51]) .. 1. Upstream an "official" rust crate for Xen ([STR-52]) .. 2. Basic Hypervisor Interactions hypercalls ([STR-53]) .. 3. [#10] Access to XenStore service ([STR-54]) .. 4. VirtIO support hypercalls ([STR-55]) 2. Xen Hypervisor Support for Stratos ([STR-56]) .. 1. Stable ABI for foreignmemory mapping to non-dom0 ([STR-57]) .. 2. Tweaks to tooling to launch VirtIO guests 3. rust-vmm support for Xen VirtIO ([STR-59]) .. 1. Make vm-memory Xen aware ([STR-60]) .. 2. Xen IO notification and IRQ injections ([STR-61]) 4. Stratos Demos .. 1. Rust based stubdomain monitor ([STR-62]) .. 2. Xen aware vhost-user master ([STR-63])
1 Xen Rust Bindings ([STR-51]) ══════════════════════════════
There exists a [placeholder repository] with the start of a set of x86_64 bindings for Xen and a very basic hello world uni-kernel example. This forms the basis of the initial Xen Rust work and will be available as a [xen-sys crate] via cargo.
[STR-51] https://linaro.atlassian.net/browse/STR-51
[placeholder repository] https://gitlab.com/cardoe/oxerun.git
[xen-sys crate] https://crates.io/crates/xen-sys
1.1 Upstream an "official" rust crate for Xen ([STR-52]) ────────────────────────────────────────────────────────
To start with we will want an upstream location for future work to be based upon. The intention is the crate is independent of the version of Xen it runs on (above the baseline version chosen). This will entail:
• ☐ agreeing with upstream the name/location for the source • ☐ documenting the rules for the "stable" hypercall ABI • ☐ establish an internal interface to elide between ioctl mediated and direct hypercalls • ☐ ensure the crate is multi-arch and has feature parity for arm64
As such we expect the implementation to be standalone, i.e. not wrapping the existing Xen libraries for mediation. There should be a close (1-to-1) mapping between the interfaces in the crate and the eventual hypercall made to the hypervisor.
Estimate: 4w (elapsed likely longer due to discussion)
[STR-52] https://linaro.atlassian.net/browse/STR-52
1.2 Basic Hypervisor Interactions hypercalls ([STR-53]) ───────────────────────────────────────────────────────
These are the bare minimum hypercalls implemented as both ioctl and direct calls. These allow for a very basic binary to:
• ☐ console_io - output IO via the Xen console • ☐ domctl stub - basic stub for domain control (different API?) • ☐ sysctl stub - basic stub for system control (different API?)
The idea would be this provides enough hypercall interface to query the list of domains and output their status via the xen console. There is an open question about if the domctl and sysctl hypercalls are way to go.
Estimate: 6w
[STR-53] https://linaro.atlassian.net/browse/STR-53
1.3 [#10] Access to XenStore service ([STR-54]) ───────────────────────────────────────────────
This is a shared configuration storage space accessed via either Unix sockets (on dom0) or via the Xenbus. This is used to access configuration information for the domain.
Is this needed for a backend though? Can everything just be passed direct on the command line?
Estimate: 4w
[STR-54] https://linaro.atlassian.net/browse/STR-54
1.4 VirtIO support hypercalls ([STR-55]) ────────────────────────────────────────
These are the hypercalls that need to be implemented to support a VirtIO backend. This includes the ability to map another guests memory into the current domains address space, register to receive IOREQ events when the guest knocks at the doorbell and inject kicks into the guest. The hypercalls we need to support would be:
• ☐ dmop - device model ops (*_ioreq_server, setirq, nr_vpus) • ☐ foreignmemory - map and unmap guest memory
The DMOP space is larger than what we need for an IOREQ backend so I've based it just on what arch/arm/dm.c exports which is the subset introduced for EPAM's virtio work.
Estimate: 12w
[STR-55] https://linaro.atlassian.net/browse/STR-55
2 Xen Hypervisor Support for Stratos ([STR-56]) ═══════════════════════════════════════════════
These tasks include tasks needed to support the various different deployments of Stratos components in Xen.
[STR-56] https://linaro.atlassian.net/browse/STR-56
2.1 Stable ABI for foreignmemory mapping to non-dom0 ([STR-57]) ───────────────────────────────────────────────────────────────
Currently the foreign memory mapping support only works for dom0 due to reference counting issues. If we are to support backends running in their own domains this will need to get fixed.
Estimate: 8w
[STR-57] https://linaro.atlassian.net/browse/STR-57
2.2 Tweaks to tooling to launch VirtIO guests ─────────────────────────────────────────────
There might not be too much to do here. The EPAM work already did something similar for their PoC for virtio-block. Essentially we need to ensure: • ☐ DT bindings are passed to the guest for virtio-mmio device discovery • ☐ Our rust backend can be instantiated before the domU is launched
This currently assumes the tools and the backend are running in dom0.
Estimate: 4w
3 rust-vmm support for Xen VirtIO ([STR-59]) ════════════════════════════════════════════
This encompasses the tasks required to get a vhost-user server up and running while interfacing to the Xen hypervisor. This will require the xen-sys.rs crate for the actual interface to the hypervisor.
We need to work out how a Xen configuration option would be passed to the various bits of rust-vmm when something is being built.
[STR-59] https://linaro.atlassian.net/browse/STR-59
3.1 Make vm-memory Xen aware ([STR-60]) ───────────────────────────────────────
The vm-memory crate is the root crate for abstracting access to the guests memory. It currently has multiple configuration builds to handle difference between mmap on Windows and Unix. Although mmap isn't directly exposed the public interfaces support a mmap like interface. We would need to:
• ☐ work out how to expose foreign memory via the vm-memory mechanism
I'm not sure if this just means implementing the GuestMemory trait for a GuestMemoryXen or if we need to present a mmap like interface.
Estimate: 8w
[STR-60] https://linaro.atlassian.net/browse/STR-60
3.2 Xen IO notification and IRQ injections ([STR-61]) ─────────────────────────────────────────────────────
The KVM world provides for ioeventfd (notifications) and irqfd (injection) to signal asynchronously between the guest and the backend. As far a I can tell this is currently handled inside the various VMMs which assume a KVM backend.
While the vhost-user slave code doesn't see the register_ioevent/register_irqfd events it does deal with EventFDs throughout the code. Perhaps the best approach here would be to create a IOREQ crate that can create EventFD descriptors which can then be passed to the slaves to use for notification and injection.
Otherwise there might be an argument for a new crate that can encapsulate this behaviour for both KVM/ioeventd and Xen/IOREQ setups?
Estimate: 8w?
[STR-61] https://linaro.atlassian.net/browse/STR-61
4 Stratos Demos ═══════════════
These tasks cover the creation of demos that brig together all the previous bits of work to demonstrate a new area of capability that has been opened up by Stratos work.
4.1 Rust based stubdomain monitor ([STR-62]) ────────────────────────────────────────────
This is a basic demo that is a proof of concept for a unikernel style backend written in pure Rust. This work would be a useful precursor for things such as the RTOS Dom0 on a safety island ([STR-11]) or as a carrier for the virtio-scmi backend.
The monitor program will periodically poll the state of the other domains and echo their status to the Xen console.
Estimate: 4w
#+name: stub-domain-example #+begin_src ditaa :cmdline -o :file stub_domain_example.png Dom0 | DomU | DomStub | | : /-------------\ : | |cPNK | | | | | | | | | | /------------------------------------\ | | GuestOS | | |cPNK | | | | | EL0 | Dom0 Userspace (xl tools, QEMU) | | | | | /---------------\ | | | | | | |cYEL | ------------------------------------/ | | | | | | +------------------------------------+ | | | | | Rust Monitor | EL1 |cA1B Dom0 Kernel | | | | | | | +------------------------------------+ | -------------/ | ---------------/ -------------------------------------------------------------------------------=------------------ +-------------------------------------------------------------------------------------+ EL2 |cC02 Xen Hypervisor | +-------------------------------------------------------------------------------------+ #+end_src
[STR-62] https://linaro.atlassian.net/browse/STR-62
[STR-11] https://linaro.atlassian.net/browse/STR-11
4.2 Xen aware vhost-user master ([STR-63]) ──────────────────────────────────────────
Usually the master side of a vhost-user system is embedded directly in the VMM itself. However in a Xen deployment their is no overarching VMM but a series of utility programs that query the hypervisor directly. The Xen tooling is also responsible for setting up any support processes that are responsible for emulating HW for the guest.
The task aims to bridge the gap between Xen's normal HW emulation path (ioreq) and VirtIO's userspace device emulation (vhost-user). The process would be started with some information on where the virtio-mmio address space is and what the slave binary will be. It will then:
• map the guest into Dom0 userspace and attach to a MemFD • register the appropriate memory regions as IOREQ regions with Xen • create EventFD channels for the virtio kick notifications (one each way) • spawn the vhost-user slave process and mediate the notifications and kicks between the slave and Xen itself
#+name: xen-vhost-user-master #+begin_src ditaa :cmdline -o :file xen_vhost_user_master.png
Dom0 DomU | | | | | | +-------------------+ +-------------------+ | | |----------->| | | | vhost-user | vhost-user | vhost-user | : /------------------------------------\ | slave | protocol | master | | | | | (existing) |<-----------| (rust) | | | | +-------------------+ +-------------------+ | | | ^ ^ | ^ | | Guest Userspace | | | | | | | | | | | IOREQ | | | | | | | | | | | v v V | | ------------------------------------/ +---------------------------------------------------+ | +------------------------------------+ | ^ ^ | ioctl ^ | | | | | | iofd/irqfd eventFD | | | | | | Guest Kernel | | +---------------------------+ | | | | | +-------------+ | | | | | | | | virtio-dev | | | Host Kernel V | | | | +-------------+ | +---------------------------------------------------+ | +------------------------------------+ | ^ | | ^ | hyper | | | ----------------------=------------- | -=--- | ----=------ | -----=- | --------=------------------ | call | Trap | | IRQ V | V | +-------------------------------------------------------------------------------------+ | | ^ | ^ | | | +-------------+ | | EL2 | Xen Hypervisor | | | | +-------------------------------+ | | | +-------------------------------------------------------------------------------------+
#+end_src
[STR-63] https://linaro.atlassian.net/browse/STR-63
On Fri, Sep 24, 2021 at 05:02:46PM +0100, Alex Bennée wrote:
Hi,
Hi,
2.1 Stable ABI for foreignmemory mapping to non-dom0 ([STR-57]) ───────────────────────────────────────────────────────────────
Currently the foreign memory mapping support only works for dom0 due to reference counting issues. If we are to support backends running in their own domains this will need to get fixed.
Estimate: 8w
I'm pretty sure it was discussed before, but I can't find relevant (part of) thread right now: does your model assumes the backend (running outside of dom0) will gain ability to map (or access in other way) _arbitrary_ memory page of a frontend domain? Or worse: any domain? That is a significant regression in terms of security model Xen provides. It would give the backend domain _a lot more_ control over the system that it normally has with Xen PV drivers - negating significant part of security benefits of using driver domains.
So, does the above require frontend agreeing (explicitly or implicitly) for accessing specific pages by the backend? There were several approaches to that discussed, including using grant tables (as PV drivers do), vIOMMU(?), or even drastically different model with no shared memory at all (Argo). Can you clarify which (if any) approach your attempt of VirtIO on Xen will use?
A more general idea: can we collect info on various VirtIO on Xen approaches (since there is more than one) in a single place, including: - key characteristics, differences - who is involved - status - links to relevant threads, maybe
I'd propose to revive https://wiki.xenproject.org/wiki/Virtio_On_Xen
Marek Marczykowski-Górecki marmarek@invisiblethingslab.com writes:
[[PGP Signed Part:Undecided]] On Fri, Sep 24, 2021 at 05:02:46PM +0100, Alex Bennée wrote:
Hi,
Hi,
2.1 Stable ABI for foreignmemory mapping to non-dom0 ([STR-57]) ───────────────────────────────────────────────────────────────
Currently the foreign memory mapping support only works for dom0 due to reference counting issues. If we are to support backends running in their own domains this will need to get fixed.
Estimate: 8w
I'm pretty sure it was discussed before, but I can't find relevant (part of) thread right now: does your model assumes the backend (running outside of dom0) will gain ability to map (or access in other way) _arbitrary_ memory page of a frontend domain? Or worse: any domain?
The aim is for some DomU's to host backends for other DomU's instead of all backends being in Dom0. Those backend DomU's would have to be considered trusted because as you say the default memory model of VirtIO is to have full access to the frontend domains memory map.
That is a significant regression in terms of security model Xen provides. It would give the backend domain _a lot more_ control over the system that it normally has with Xen PV drivers - negating significant part of security benefits of using driver domains.
It's part of the continual trade off between security and speed. For things like block and network backends there is a penalty if data has to be bounce buffered before it ends up in the guest address space.
So, does the above require frontend agreeing (explicitly or implicitly) for accessing specific pages by the backend? There were several approaches to that discussed, including using grant tables (as PV drivers do), vIOMMU(?), or even drastically different model with no shared memory at all (Argo). Can you clarify which (if any) approach your attempt of VirtIO on Xen will use?
There are separate strands of work in Stratos looking at how we could further secure VirtIO for architectures with distributed backends (e.g. you may accept the block backend having access to the whole of memory but an i2c multiplexer has different performance characteristics).
Currently the only thing we have prototyped is "fat virtqueues" which Arnd has been working on. Here the only actual shared memory required is the VirtIO config space and the relevant virt queues.
Other approaches have been discussed including using the virtio-iommu to selectively make areas available to the backend or use memory zoning so for example network buffers are only allocated in a certain region of guest physical memory that is shared with the backend.
A more general idea: can we collect info on various VirtIO on Xen approaches (since there is more than one) in a single place, including:
- key characteristics, differences
- who is involved
- status
- links to relevant threads, maybe
I'd propose to revive https://wiki.xenproject.org/wiki/Virtio_On_Xen
From the Stratos point of view Xen is a useful proving ground for
general VirtIO experimentation due to being both a type-1 and open source. Our ultimate aim is have a high degree of code sharing for backends regardless of the hypervisor choice so a guest can use a VirtIO device model without having to be locked into KVM.
If your technology choice is already fixed with a Xen hypervisor and portability isn't a concern you might well just stick to the existing well tested Xen PV interfaces.
On Mon, Sep 27, 2021 at 3:06 AM Alex Bennée via Stratos-dev < stratos-dev@op-lists.linaro.org> wrote:
Marek Marczykowski-Górecki marmarek@invisiblethingslab.com writes:
[[PGP Signed Part:Undecided]] On Fri, Sep 24, 2021 at 05:02:46PM +0100, Alex Bennée wrote:
Hi,
Hi,
2.1 Stable ABI for foreignmemory mapping to non-dom0 ([STR-57]) ───────────────────────────────────────────────────────────────
Currently the foreign memory mapping support only works for dom0 due to reference counting issues. If we are to support backends running in their own domains this will need to get fixed.
Estimate: 8w
I'm pretty sure it was discussed before, but I can't find relevant (part of) thread right now: does your model assumes the backend (running outside of dom0) will gain ability to map (or access in other way) _arbitrary_ memory page of a frontend domain? Or worse: any domain?
The aim is for some DomU's to host backends for other DomU's instead of all backends being in Dom0. Those backend DomU's would have to be considered trusted because as you say the default memory model of VirtIO is to have full access to the frontend domains memory map.
I share Marek's concern. I believe that there are Xen-based systems that will want to run guests using VirtIO devices without extending this level of trust to the backend domains.
That is a significant regression in terms of security model Xen provides. It would give the backend domain _a lot more_ control over the system that it normally has with Xen PV drivers - negating significant part of security benefits of using driver domains.
It's part of the continual trade off between security and speed. For things like block and network backends there is a penalty if data has to be bounce buffered before it ends up in the guest address space.
I think we have significant flexibility in being able to modify several layers of the stack here to make this efficient, and it would be beneficial to avoid bounce buffering if possible without sacrificing the ability to enforce isolation. I wonder if there's a viable approach possible with some implementation of a virtual IOMMU (which enforces access control) that would allow a backend to commission I/O on a physical device on behalf of a guest, where the data buffers do not need to be mapped into the backend and so avoid the need for a bounce?
So, does the above require frontend agreeing (explicitly or implicitly) for accessing specific pages by the backend? There were several approaches to that discussed, including using grant tables (as PV drivers do), vIOMMU(?), or even drastically different model with no shared memory at all (Argo). Can you clarify which (if any) approach your attempt of VirtIO on Xen will use?
There are separate strands of work in Stratos looking at how we could further secure VirtIO for architectures with distributed backends (e.g. you may accept the block backend having access to the whole of memory but an i2c multiplexer has different performance characteristics).
Currently the only thing we have prototyped is "fat virtqueues" which Arnd has been working on. Here the only actual shared memory required is the VirtIO config space and the relevant virt queues.
I think the "fat virtqueues" work is a positive path for investigation and I don't think shared memory between front and backend is hard requirement for those to function: a VirtIO-Argo transport driver would be able to operate with them without shared memory.
Other approaches have been discussed including using the virtio-iommu to selectively make areas available to the backend or use memory zoning so for example network buffers are only allocated in a certain region of guest physical memory that is shared with the backend.
A more general idea: can we collect info on various VirtIO on Xen approaches (since there is more than one) in a single place, including:
- key characteristics, differences
- who is involved
- status
- links to relevant threads, maybe
I'd propose to revive https://wiki.xenproject.org/wiki/Virtio_On_Xen
Thanks for the reminder, Marek -- I've just overhauled that page to give an overview of the several approaches in the Xen community to enabling VirtIO on Xen, and have included a first pass at including the content you describe. I'm happy to be involved in improving it further.
From the Stratos point of view Xen is a useful proving ground for general VirtIO experimentation due to being both a type-1 and open source. Our ultimate aim is have a high degree of code sharing for backends regardless of the hypervisor choice so a guest can use a VirtIO device model without having to be locked into KVM.
Thanks, Alex - this context is useful.
If your technology choice is already fixed with a Xen hypervisor and portability isn't a concern you might well just stick to the existing well tested Xen PV interfaces.
I wouldn't quite agree; there are additional reasons beyond portability to be looking at other options than the traditional Xen PV interfaces: eg. an Argo-based interdomain transport for PV devices will enable fine-grained enforcement of Mandatory Access Control over the frontend / backend communication, and will not depend on XenStore which is advantageous for Hyperlaunch / dom0less Xen deployment configurations.
thanks,
Christopher
-- Alex Bennée -- Stratos-dev mailing list Stratos-dev@op-lists.linaro.org https://op-lists.linaro.org/mailman/listinfo/stratos-dev
On Mon, 27 Sep 2021, Christopher Clark wrote:
On Mon, Sep 27, 2021 at 3:06 AM Alex Bennée via Stratos-dev stratos-dev@op-lists.linaro.org wrote:
Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> writes: > [[PGP Signed Part:Undecided]] > On Fri, Sep 24, 2021 at 05:02:46PM +0100, Alex Bennée wrote: >> Hi, > > Hi, > >> 2.1 Stable ABI for foreignmemory mapping to non-dom0 ([STR-57]) >> ─────────────────────────────────────────────────────────────── >> >> Currently the foreign memory mapping support only works for dom0 due >> to reference counting issues. If we are to support backends running in >> their own domains this will need to get fixed. >> >> Estimate: 8w >> >> >> [STR-57] <https://linaro.atlassian.net/browse/STR-57> > > I'm pretty sure it was discussed before, but I can't find relevant > (part of) thread right now: does your model assumes the backend (running > outside of dom0) will gain ability to map (or access in other way) > _arbitrary_ memory page of a frontend domain? Or worse: any domain? The aim is for some DomU's to host backends for other DomU's instead of all backends being in Dom0. Those backend DomU's would have to be considered trusted because as you say the default memory model of VirtIO is to have full access to the frontend domains memory map.
I share Marek's concern. I believe that there are Xen-based systems that will want to run guests using VirtIO devices without extending this level of trust to the backend domains.
From a safety perspective, it would be challenging to deploy a system
with privileged backends. From a safety perspective, it would be a lot easier if the backend were unprivileged.
This is one of those times where safety and security requirements are actually aligned.
On Tue, Sep 28, 2021 at 9:26 AM Stefano Stabellini sstabellini@kernel.org wrote:
Hi Stefano, all
[Sorry for the possible format issues]
On Mon, 27 Sep 2021, Christopher Clark wrote:
On Mon, Sep 27, 2021 at 3:06 AM Alex Bennée via Stratos-dev <
stratos-dev@op-lists.linaro.org> wrote:
Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
writes:
> [[PGP Signed Part:Undecided]] > On Fri, Sep 24, 2021 at 05:02:46PM +0100, Alex Bennée wrote: >> Hi, > > Hi, > >> 2.1 Stable ABI for foreignmemory mapping to non-dom0 ([STR-57]) >> ─────────────────────────────────────────────────────────────── >> >> Currently the foreign memory mapping support only works for
dom0 due
>> to reference counting issues. If we are to support backends
running in
>> their own domains this will need to get fixed. >> >> Estimate: 8w >> >> >> [STR-57] <https://linaro.atlassian.net/browse/STR-57> > > I'm pretty sure it was discussed before, but I can't find
relevant
> (part of) thread right now: does your model assumes the backend
(running
> outside of dom0) will gain ability to map (or access in other
way)
> _arbitrary_ memory page of a frontend domain? Or worse: any
domain?
The aim is for some DomU's to host backends for other DomU's
instead of
all backends being in Dom0. Those backend DomU's would have to be considered trusted because as you say the default memory model of
VirtIO
is to have full access to the frontend domains memory map.
I share Marek's concern. I believe that there are Xen-based systems that
will want to run guests using VirtIO devices without extending
this level of trust to the backend domains.
From a safety perspective, it would be challenging to deploy a system with privileged backends. From a safety perspective, it would be a lot easier if the backend were unprivileged.
This is one of those times where safety and security requirements are actually aligned.
Well, the foreign memory mapping has one advantage in the context of Virtio use-case which is that Virtio infrastructure in Guest doesn't require any modifications to run on top Xen. The only issue with foreign memory here is that Guest memory actually mapped without its agreement which doesn't perfectly fit into the security model. (although there is one more issue with XSA-300, but I think it will go away sooner or later, at least there are some attempts to eliminate it). While the ability to map any part of Guest memory is not an issue for the backend running in Dom0 (which we usually trust), this will certainly violate Xen security model if we want to run it in other domain, so I completely agree with the existing concern.
It was discussed before [1], but I couldn't find any decisions regarding that. As I understand, the one of the possible ideas is to have some entity in Xen (PV IOMMU/virtio-iommu/whatever) that works in protection mode, so it denies all foreign mapping requests from the backend running in DomU by default and only allows requests with mapping which were *implicitly* granted by the Guest before. For example, Xen could be informed which MMIOs hold the queue PFN and notify registers (as it traps the accesses to these registers anyway) and could theoretically parse the frontend request and retrieve descriptors to make a decision which GFNs are actually *allowed*.
I can't say for sure (sorry not familiar enough with the topic), but implementing the virtio-iommu device in Xen we could probably avoid Guest modifications at all. Of course, for this to work the Virtio infrastructure in Guest should use DMA API as mentioned in [1].
Would the “restricted foreign mapping” solution retain the Xen security model and be accepted by the Xen community? I wonder, has someone already looked in this direction, are there any pitfalls here or is this even feasible?
[1] https://lore.kernel.org/xen-devel/464e91ec-2b53-2338-43c7-a018087fc7f6@arm.c...
On Tue, 28 Sep 2021, Oleksandr Tyshchenko wrote:
On Tue, Sep 28, 2021 at 9:26 AM Stefano Stabellini sstabellini@kernel.org wrote:
Hi Stefano, all
[Sorry for the possible format issues]
On Mon, 27 Sep 2021, Christopher Clark wrote: > On Mon, Sep 27, 2021 at 3:06 AM Alex Bennée via Stratos-dev <stratos-dev@op-lists.linaro.org> wrote: > > Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> writes: > > > [[PGP Signed Part:Undecided]] > > On Fri, Sep 24, 2021 at 05:02:46PM +0100, Alex Bennée wrote: > >> Hi, > > > > Hi, > > > >> 2.1 Stable ABI for foreignmemory mapping to non-dom0 ([STR-57]) > >> ─────────────────────────────────────────────────────────────── > >> > >> Currently the foreign memory mapping support only works for dom0 due > >> to reference counting issues. If we are to support backends running in > >> their own domains this will need to get fixed. > >> > >> Estimate: 8w > >> > >> > >> [STR-57] <https://linaro.atlassian.net/browse/STR-57> > > > > I'm pretty sure it was discussed before, but I can't find relevant > > (part of) thread right now: does your model assumes the backend (running > > outside of dom0) will gain ability to map (or access in other way) > > _arbitrary_ memory page of a frontend domain? Or worse: any domain? > > The aim is for some DomU's to host backends for other DomU's instead of > all backends being in Dom0. Those backend DomU's would have to be > considered trusted because as you say the default memory model of VirtIO > is to have full access to the frontend domains memory map. > > > I share Marek's concern. I believe that there are Xen-based systems that will want to run guests using VirtIO devices without extending > this level of trust to the backend domains. >From a safety perspective, it would be challenging to deploy a system with privileged backends. From a safety perspective, it would be a lot easier if the backend were unprivileged. This is one of those times where safety and security requirements are actually aligned.
Well, the foreign memory mapping has one advantage in the context of Virtio use-case which is that Virtio infrastructure in Guest doesn't require any modifications to run on top Xen. The only issue with foreign memory here is that Guest memory actually mapped without its agreement which doesn't perfectly fit into the security model. (although there is one more issue with XSA-300, but I think it will go away sooner or later, at least there are some attempts to eliminate it). While the ability to map any part of Guest memory is not an issue for the backend running in Dom0 (which we usually trust), this will certainly violate Xen security model if we want to run it in other domain, so I completely agree with the existing concern.
Yep, that's what I was referring to.
It was discussed before [1], but I couldn't find any decisions regarding that. As I understand, the one of the possible ideas is to have some entity in Xen (PV IOMMU/virtio-iommu/whatever) that works in protection mode, so it denies all foreign mapping requests from the backend running in DomU by default and only allows requests with mapping which were *implicitly* granted by the Guest before. For example, Xen could be informed which MMIOs hold the queue PFN and notify registers (as it traps the accesses to these registers anyway) and could theoretically parse the frontend request and retrieve descriptors to make a decision which GFNs are actually *allowed*.
I can't say for sure (sorry not familiar enough with the topic), but implementing the virtio-iommu device in Xen we could probably avoid Guest modifications at all. Of course, for this to work the Virtio infrastructure in Guest should use DMA API as mentioned in [1].
Would the “restricted foreign mapping” solution retain the Xen security model and be accepted by the Xen community? I wonder, has someone already looked in this direction, are there any pitfalls here or is this even feasible?
[1] https://lore.kernel.org/xen-devel/464e91ec-2b53-2338-43c7-a018087fc7f6@arm.c...
The discussion that went further is actually one based on the idea that there is a pre-shared memory area and the frontend always passes addresses from it. For ease of implementation, the pre-shared area is the virtqueue itself so this approach has been called "fat virtqueue". But it requires guest modifications and it probably results in additional memory copies.
I am not sure if the approach you mentioned could be implemented completely without frontend changes. It looks like Xen would have to learn how to inspect virtqueues in order to verify implicit grants without frontend changes. With or without guest modifications, I am not aware of anyone doing research and development on this approach.
On Sat, Oct 2, 2021 at 2:58 AM Stefano Stabellini sstabellini@kernel.org wrote:
Hi Stefano, all
[Sorry for the possible format issues] [I have CCed Julien]
On Tue, 28 Sep 2021, Oleksandr Tyshchenko wrote:
On Tue, Sep 28, 2021 at 9:26 AM Stefano Stabellini <
sstabellini@kernel.org> wrote:
Hi Stefano, all
[Sorry for the possible format issues]
On Mon, 27 Sep 2021, Christopher Clark wrote: > On Mon, Sep 27, 2021 at 3:06 AM Alex Bennée via Stratos-dev <
stratos-dev@op-lists.linaro.org> wrote:
> > Marek Marczykowski-Górecki <
marmarek@invisiblethingslab.com> writes:
> > > [[PGP Signed Part:Undecided]] > > On Fri, Sep 24, 2021 at 05:02:46PM +0100, Alex Bennée
wrote:
> >> Hi, > > > > Hi, > > > >> 2.1 Stable ABI for foreignmemory mapping to non-dom0
([STR-57])
> >>
───────────────────────────────────────────────────────────────
> >> > >> Currently the foreign memory mapping support only
works for dom0 due
> >> to reference counting issues. If we are to support
backends running in
> >> their own domains this will need to get fixed. > >> > >> Estimate: 8w > >> > >> > >> [STR-57] <https://linaro.atlassian.net/browse/STR-57> > > > > I'm pretty sure it was discussed before, but I can't
find relevant
> > (part of) thread right now: does your model assumes the
backend (running
> > outside of dom0) will gain ability to map (or access in
other way)
> > _arbitrary_ memory page of a frontend domain? Or worse:
any domain?
> > The aim is for some DomU's to host backends for other
DomU's instead of
> all backends being in Dom0. Those backend DomU's would
have to be
> considered trusted because as you say the default memory
model of VirtIO
> is to have full access to the frontend domains memory map. > > > I share Marek's concern. I believe that there are Xen-based
systems that will want to run guests using VirtIO devices without
extending > this level of trust to the backend domains. >From a safety perspective, it would be challenging to deploy a
system
with privileged backends. From a safety perspective, it would be a
lot
easier if the backend were unprivileged. This is one of those times where safety and security requirements
are
actually aligned.
Well, the foreign memory mapping has one advantage in the context of
Virtio use-case
which is that Virtio infrastructure in Guest doesn't require any
modifications to run on top Xen.
The only issue with foreign memory here is that Guest memory actually
mapped without its agreement
which doesn't perfectly fit into the security model. (although there is
one more issue with XSA-300,
but I think it will go away sooner or later, at least there are some
attempts to eliminate it).
While the ability to map any part of Guest memory is not an issue for
the backend running in Dom0
(which we usually trust), this will certainly violate Xen security model
if we want to run it in other
domain, so I completely agree with the existing concern.
Yep, that's what I was referring to.
It was discussed before [1], but I couldn't find any decisions regarding
that. As I understand,
the one of the possible ideas is to have some entity in Xen (PV
IOMMU/virtio-iommu/whatever)
that works in protection mode, so it denies all foreign mapping requests
from the backend running in DomU
by default and only allows requests with mapping which were *implicitly*
granted by the Guest before.
For example, Xen could be informed which MMIOs hold the queue PFN and
notify registers
(as it traps the accesses to these registers anyway) and could
theoretically parse the frontend request
and retrieve descriptors to make a decision which GFNs are actually
*allowed*.
I can't say for sure (sorry not familiar enough with the topic), but
implementing the virtio-iommu device
in Xen we could probably avoid Guest modifications at all. Of course,
for this to work
the Virtio infrastructure in Guest should use DMA API as mentioned in
[1].
Would the “restricted foreign mapping” solution retain the Xen security
model and be accepted
by the Xen community? I wonder, has someone already looked in this
direction, are there any
pitfalls here or is this even feasible?
[1]
https://lore.kernel.org/xen-devel/464e91ec-2b53-2338-43c7-a018087fc7f6@arm.c...
The discussion that went further is actually one based on the idea that there is a pre-shared memory area and the frontend always passes addresses from it. For ease of implementation, the pre-shared area is the virtqueue itself so this approach has been called "fat virtqueue". But it requires guest modifications and it probably results in additional memory copies.
I got it. Although we would need to map that pre-shared area anyway (I presume it could be done at once during initialization), I think it much better than map arbitrary pages at runtime. If there is a way for Xen to know the pre-shared area location in advance it will be able to allow mapping this region only and deny other attempts.
I am not sure if the approach you mentioned could be implemented completely without frontend changes. It looks like Xen would have to learn how to inspect virtqueues in order to verify implicit grants without frontend changes.
I looked through the virtio-iommu specification and corresponding Linux driver but I am sure I don't see all the challenges and pitfalls. Having a limited knowledge of IOMMU infrastructure in Linux, below is just my guess, which might be wrong.
1. I think, if we want to avoid frontend changes the backend in Xen would need to fully conform to the specification, I am afraid that besides just inspecting virtqueues, the backend needs to properly and completely emulate the virtio device, handle shadow page tables, etc. Otherwise we might break the guest. I expect a huge amount of work to implement this properly.
2. Also, if I got the things correctly, it looks like when enabling virtio-iommu, all addresses passed in requests to the virtio devices behind the virtio-iommu will be in guest virtual address space (IOVA). So we would need to find a way for userspace (if the backend is IOREQ server) to translate them to guest physical addresses (IPA) via these shadow page tables in the backend in front of mapping them via foreign memory map calls. So I expect Xen, toolstack and Linux privcmd driver changes and additional complexity taking into account how the data structures could be accessed (data structures being continuously in IOVA, could be discontinuous in IPA, indirect table descriptors, etc). I am wondering, would it be possible to have identity IOMMU mapping (IOVA == GPA) at the guest side but without bypassing an IOMMU, as we need the virtio-iommu frontend to send map/unmap requests, can we control this behaviour somehow? I think this would simplify things.
3. Also, we would probably want to have a single virtio-iommu device instance per guest, so all virtio devices which belong to this guest will share the IOMMU mapping for the optimization purposes. For this to work all virtio devices inside a guest should be attached to the same IOMMU domain. Probably, we could control that, but I am not 100% sure.
With or without guest modifications, I am not aware of anyone doing research and development on this approach.
On Sat, 2 Oct 2021, Oleksandr Tyshchenko wrote:
On Sat, Oct 2, 2021 at 2:58 AM Stefano Stabellini sstabellini@kernel.org wrote:
Hi Stefano, all
[Sorry for the possible format issues] [I have CCed Julien]
On Tue, 28 Sep 2021, Oleksandr Tyshchenko wrote: > On Tue, Sep 28, 2021 at 9:26 AM Stefano Stabellini <sstabellini@kernel.org> wrote: > > Hi Stefano, all > > [Sorry for the possible format issues] > > > On Mon, 27 Sep 2021, Christopher Clark wrote: > > On Mon, Sep 27, 2021 at 3:06 AM Alex Bennée via Stratos-dev <stratos-dev@op-lists.linaro.org> wrote: > > > > Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> writes: > > > > > [[PGP Signed Part:Undecided]] > > > On Fri, Sep 24, 2021 at 05:02:46PM +0100, Alex Bennée wrote: > > >> Hi, > > > > > > Hi, > > > > > >> 2.1 Stable ABI for foreignmemory mapping to non-dom0 ([STR-57]) > > >> ─────────────────────────────────────────────────────────────── > > >> > > >> Currently the foreign memory mapping support only works for dom0 due > > >> to reference counting issues. If we are to support backends running in > > >> their own domains this will need to get fixed. > > >> > > >> Estimate: 8w > > >> > > >> > > >> [STR-57] <https://linaro.atlassian.net/browse/STR-57> > > > > > > I'm pretty sure it was discussed before, but I can't find relevant > > > (part of) thread right now: does your model assumes the backend (running > > > outside of dom0) will gain ability to map (or access in other way) > > > _arbitrary_ memory page of a frontend domain? Or worse: any domain? > > > > The aim is for some DomU's to host backends for other DomU's instead of > > all backends being in Dom0. Those backend DomU's would have to be > > considered trusted because as you say the default memory model of VirtIO > > is to have full access to the frontend domains memory map. > > > > > > I share Marek's concern. I believe that there are Xen-based systems that will want to run guests using VirtIO devices without > extending > > this level of trust to the backend domains. > > >From a safety perspective, it would be challenging to deploy a system > with privileged backends. From a safety perspective, it would be a lot > easier if the backend were unprivileged. > > This is one of those times where safety and security requirements are > actually aligned. > > > Well, the foreign memory mapping has one advantage in the context of Virtio use-case > which is that Virtio infrastructure in Guest doesn't require any modifications to run on top Xen. > The only issue with foreign memory here is that Guest memory actually mapped without its agreement > which doesn't perfectly fit into the security model. (although there is one more issue with XSA-300, > but I think it will go away sooner or later, at least there are some attempts to eliminate it). > While the ability to map any part of Guest memory is not an issue for the backend running in Dom0 > (which we usually trust), this will certainly violate Xen security model if we want to run it in other > domain, so I completely agree with the existing concern. Yep, that's what I was referring to. > It was discussed before [1], but I couldn't find any decisions regarding that. As I understand, > the one of the possible ideas is to have some entity in Xen (PV IOMMU/virtio-iommu/whatever) > that works in protection mode, so it denies all foreign mapping requests from the backend running in DomU > by default and only allows requests with mapping which were *implicitly* granted by the Guest before. > For example, Xen could be informed which MMIOs hold the queue PFN and notify registers > (as it traps the accesses to these registers anyway) and could theoretically parse the frontend request > and retrieve descriptors to make a decision which GFNs are actually *allowed*. > > I can't say for sure (sorry not familiar enough with the topic), but implementing the virtio-iommu device > in Xen we could probably avoid Guest modifications at all. Of course, for this to work > the Virtio infrastructure in Guest should use DMA API as mentioned in [1]. > > Would the “restricted foreign mapping” solution retain the Xen security model and be accepted > by the Xen community? I wonder, has someone already looked in this direction, are there any > pitfalls here or is this even feasible? > > [1] https://lore.kernel.org/xen-devel/464e91ec-2b53-2338-43c7-a018087fc7f6@arm.com/ The discussion that went further is actually one based on the idea that there is a pre-shared memory area and the frontend always passes addresses from it. For ease of implementation, the pre-shared area is the virtqueue itself so this approach has been called "fat virtqueue". But it requires guest modifications and it probably results in additional memory copies.
I got it. Although we would need to map that pre-shared area anyway (I presume it could be done at once during initialization), I think it much better than map arbitrary pages at runtime.
Yeah that's the idea
If there is a way for Xen to know the pre-shared area location in advance it will be able to allow mapping this region only and deny other attempts.
No, but there are patches (not yet upstream) to introduce a way to pre-share memory regions between VMs using xl: https://github.com/Xilinx/xen/commits/xilinx/release-2021.1?after=4bd2da58b5...
So I think it would probably be the other way around: xen/libxl advertises on device tree (or ACPI) the presence of the pre-shared regions to both domains. Then frontend and backend would start using it.
I am not sure if the approach you mentioned could be implemented completely without frontend changes. It looks like Xen would have to learn how to inspect virtqueues in order to verify implicit grants without frontend changes.
I looked through the virtio-iommu specification and corresponding Linux driver but I am sure I don't see all the challenges and pitfalls. Having a limited knowledge of IOMMU infrastructure in Linux, below is just my guess, which might be wrong.
- I think, if we want to avoid frontend changes the backend in Xen would need to fully conform to the specification, I am afraid that
besides just inspecting virtqueues, the backend needs to properly and completely emulate the virtio device, handle shadow page tables, etc. Otherwise we might break the guest. I expect a huge amount of work to implement this properly.
Yeah, I think we would want to stay away from shadow pagetables unless we are really forced to go there.
- Also, if I got the things correctly, it looks like when enabling virtio-iommu, all addresses passed in requests to the virtio devices
behind the virtio-iommu will be in guest virtual address space (IOVA). So we would need to find a way for userspace (if the backend is IOREQ server) to translate them to guest physical addresses (IPA) via these shadow page tables in the backend in front of mapping them via foreign memory map calls. So I expect Xen, toolstack and Linux privcmd driver changes and additional complexity taking into account how the data structures could be accessed (data structures being continuously in IOVA, could be discontinuous in IPA, indirect table descriptors, etc). I am wondering, would it be possible to have identity IOMMU mapping (IOVA == GPA) at the guest side but without bypassing an IOMMU, as we need the virtio-iommu frontend to send map/unmap requests, can we control this behaviour somehow? I think this would simplify things.
None of the above looks easy. I think you are right that we would need IOVA == GPA to make the implementation feasible and with decent performance. But if we need a spec change, then I think Juergen's proposal of introducing a new transport that uses grant table references instead of GPAs is worth considering.
- Also, we would probably want to have a single virtio-iommu device instance per guest, so all virtio devices which belong to this guest
will share the IOMMU mapping for the optimization purposes. For this to work all virtio devices inside a guest should be attached to the same IOMMU domain. Probably, we could control that, but I am not 100% sure.
On 05.10.21 00:53, Stefano Stabellini wrote:
Hi Stefano, all
On Sat, 2 Oct 2021, Oleksandr Tyshchenko wrote:
On Sat, Oct 2, 2021 at 2:58 AM Stefano Stabellini sstabellini@kernel.org wrote:
Hi Stefano, all
[Sorry for the possible format issues] [I have CCed Julien]
On Tue, 28 Sep 2021, Oleksandr Tyshchenko wrote: > On Tue, Sep 28, 2021 at 9:26 AM Stefano Stabellini <sstabellini@kernel.org> wrote: > > Hi Stefano, all > > [Sorry for the possible format issues] > > > On Mon, 27 Sep 2021, Christopher Clark wrote: > > On Mon, Sep 27, 2021 at 3:06 AM Alex Bennée via Stratos-dev <stratos-dev@op-lists.linaro.org> wrote: > > > > Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> writes: > > > > > [[PGP Signed Part:Undecided]] > > > On Fri, Sep 24, 2021 at 05:02:46PM +0100, Alex Bennée wrote: > > >> Hi, > > > > > > Hi, > > > > > >> 2.1 Stable ABI for foreignmemory mapping to non-dom0 ([STR-57]) > > >> ─────────────────────────────────────────────────────────────── > > >> > > >> Currently the foreign memory mapping support only works for dom0 due > > >> to reference counting issues. If we are to support backends running in > > >> their own domains this will need to get fixed. > > >> > > >> Estimate: 8w > > >> > > >> > > >> [STR-57] <https://linaro.atlassian.net/browse/STR-57> > > > > > > I'm pretty sure it was discussed before, but I can't find relevant > > > (part of) thread right now: does your model assumes the backend (running > > > outside of dom0) will gain ability to map (or access in other way) > > > _arbitrary_ memory page of a frontend domain? Or worse: any domain? > > > > The aim is for some DomU's to host backends for other DomU's instead of > > all backends being in Dom0. Those backend DomU's would have to be > > considered trusted because as you say the default memory model of VirtIO > > is to have full access to the frontend domains memory map. > > > > > > I share Marek's concern. I believe that there are Xen-based systems that will want to run guests using VirtIO devices without > extending > > this level of trust to the backend domains. > > >From a safety perspective, it would be challenging to deploy a system > with privileged backends. From a safety perspective, it would be a lot > easier if the backend were unprivileged. > > This is one of those times where safety and security requirements are > actually aligned. > > > Well, the foreign memory mapping has one advantage in the context of Virtio use-case > which is that Virtio infrastructure in Guest doesn't require any modifications to run on top Xen. > The only issue with foreign memory here is that Guest memory actually mapped without its agreement > which doesn't perfectly fit into the security model. (although there is one more issue with XSA-300, > but I think it will go away sooner or later, at least there are some attempts to eliminate it). > While the ability to map any part of Guest memory is not an issue for the backend running in Dom0 > (which we usually trust), this will certainly violate Xen security model if we want to run it in other > domain, so I completely agree with the existing concern. Yep, that's what I was referring to. > It was discussed before [1], but I couldn't find any decisions regarding that. As I understand, > the one of the possible ideas is to have some entity in Xen (PV IOMMU/virtio-iommu/whatever) > that works in protection mode, so it denies all foreign mapping requests from the backend running in DomU > by default and only allows requests with mapping which were *implicitly* granted by the Guest before. > For example, Xen could be informed which MMIOs hold the queue PFN and notify registers > (as it traps the accesses to these registers anyway) and could theoretically parse the frontend request > and retrieve descriptors to make a decision which GFNs are actually *allowed*. > > I can't say for sure (sorry not familiar enough with the topic), but implementing the virtio-iommu device > in Xen we could probably avoid Guest modifications at all. Of course, for this to work > the Virtio infrastructure in Guest should use DMA API as mentioned in [1]. > > Would the “restricted foreign mapping” solution retain the Xen security model and be accepted > by the Xen community? I wonder, has someone already looked in this direction, are there any > pitfalls here or is this even feasible? > > [1] https://lore.kernel.org/xen-devel/464e91ec-2b53-2338-43c7-a018087fc7f6@arm.com/ The discussion that went further is actually one based on the idea that there is a pre-shared memory area and the frontend always passes addresses from it. For ease of implementation, the pre-shared area is the virtqueue itself so this approach has been called "fat virtqueue". But it requires guest modifications and it probably results in additional memory copies.
I got it. Although we would need to map that pre-shared area anyway (I presume it could be done at once during initialization), I think it much better than map arbitrary pages at runtime.
Yeah that's the idea
If there is a way for Xen to know the pre-shared area location in advance it will be able to allow mapping this region only and deny other attempts.
No, but there are patches (not yet upstream) to introduce a way to pre-share memory regions between VMs using xl: https://github.com/Xilinx/xen/commits/xilinx/release-2021.1?after=4bd2da58b5...
So I think it would probably be the other way around: xen/libxl advertises on device tree (or ACPI) the presence of the pre-shared regions to both domains. Then frontend and backend would start using it.
Thank you for the explanation. I remember this series has already appeared in ML. If I got the idea correctly this way we won't need to map the foreign memory from the backend at all (I assume this eliminates security concern?). It looks like the every pre-shared region (described in config file) is mapped by the toolstack at the domains creation time and the details of this region are also written to the Xenstore. All what backend needs to do is to map the region into its address space (via mmap). For this to work the guest should allocate virtqueue from Xen specific reserved memory [1].
[1] https://www.kernel.org/doc/Documentation/devicetree/bindings/reserved-memory...
I am not sure if the approach you mentioned could be implemented completely without frontend changes. It looks like Xen would have to learn how to inspect virtqueues in order to verify implicit grants without frontend changes.
I looked through the virtio-iommu specification and corresponding Linux driver but I am sure I don't see all the challenges and pitfalls. Having a limited knowledge of IOMMU infrastructure in Linux, below is just my guess, which might be wrong.
- I think, if we want to avoid frontend changes the backend in Xen would need to fully conform to the specification, I am afraid that
besides just inspecting virtqueues, the backend needs to properly and completely emulate the virtio device, handle shadow page tables, etc. Otherwise we might break the guest. I expect a huge amount of work to implement this properly.
Yeah, I think we would want to stay away from shadow pagetables unless we are really forced to go there.
- Also, if I got the things correctly, it looks like when enabling virtio-iommu, all addresses passed in requests to the virtio devices
behind the virtio-iommu will be in guest virtual address space (IOVA). So we would need to find a way for userspace (if the backend is IOREQ server) to translate them to guest physical addresses (IPA) via these shadow page tables in the backend in front of mapping them via foreign memory map calls. So I expect Xen, toolstack and Linux privcmd driver changes and additional complexity taking into account how the data structures could be accessed (data structures being continuously in IOVA, could be discontinuous in IPA, indirect table descriptors, etc). I am wondering, would it be possible to have identity IOMMU mapping (IOVA == GPA) at the guest side but without bypassing an IOMMU, as we need the virtio-iommu frontend to send map/unmap requests, can we control this behaviour somehow? I think this would simplify things.
None of the above looks easy. I think you are right that we would need IOVA == GPA to make the implementation feasible and with decent performance.
Yes. Otherwise, I am afraid, the implementation is going to be quite difficult with questionable performance at the end.
I found out that IOMMU domain in Linux can be identity mapped (IOMMU_DOMAIN_IDENTITY - DMA addresses are system physical addresses) and this can be controlled via cmd line. I admit I didn't test, but from the IOMMU framework code it looks like that driver's map/unmap callback won't be called in this mode and as the result the IOMMU mapping never reaches the backend. Unfortunately, this is not what we want as we won't have any understating what the GFNs are...
But if we need a spec change, then I think Juergen's proposal of introducing a new transport that uses grant table references instead of GPAs is worth considering.
Agree, if we the spec changes cannot be avoided then yes.
- Also, we would probably want to have a single virtio-iommu device instance per guest, so all virtio devices which belong to this guest
will share the IOMMU mapping for the optimization purposes. For this to work all virtio devices inside a guest should be attached to the same IOMMU domain. Probably, we could control that, but I am not 100% sure.
Hello all.
[Sorry for the possible format issues]
I have an update regarding (valid) concern which has been also raised in current thread which is the virtio backend's ability (when using Xen foreign mapping) to map any guest pages without guest "agreement" on that. There is a PoC (with virtio-mmio on Arm) which is based on Juergen Gross’ work to reuse secure Xen grant mapping for the virtio communications. All details are at: https://lore.kernel.org/xen-devel/1649963973-22879-1-git-send-email-olekstys... https://lore.kernel.org/xen-devel/1649964960-24864-1-git-send-email-olekstys...
Oleksandr Tyshchenko olekstysh@gmail.com writes:
Hello all.
[Sorry for the possible format issues]
I have an update regarding (valid) concern which has been also raised in current thread which is the virtio backend's ability (when using Xen foreign mapping) to map any guest pages without guest "agreement" on that. There is a PoC (with virtio-mmio on Arm) which is based on Juergen Gross’ work to reuse secure Xen grant mapping for the virtio communications. All details are at: https://lore.kernel.org/xen-devel/1649963973-22879-1-git-send-email-olekstys... https://lore.kernel.org/xen-devel/1649964960-24864-1-git-send-email-olekstys...
Thanks for that. I shall try and find some time to have a look at it.
Did you see Viresh's post about getting our rust-vmm vhost-user backends working on Xen?
One thing that came up during that work was how guest pages are mapped into the dom0 domain where Xen needs to use kernel allocated pages via privcmd rather than then normal shared mmap that is used on KVM. As I understand it this is to avoid the situation where dom0 may invalidate a user PTE causing issues for the hypervisor itself. At some point we would like to fix that wrinkle so we can remove the (minor) hack in rust-vmm's mmap code to be truly hypervisor agnostic.
Anyway I hope you and your team are safe and well.
On 15.04.22 12:07, Alex Bennée wrote:
Hello Alex
Oleksandr Tyshchenko olekstysh@gmail.com writes:
Hello all.
[Sorry for the possible format issues]
I have an update regarding (valid) concern which has been also raised in current thread which is the virtio backend's ability (when using Xen foreign mapping) to map any guest pages without guest "agreement" on that. There is a PoC (with virtio-mmio on Arm) which is based on Juergen Gross’ work to reuse secure Xen grant mapping for the virtio communications. All details are at: https://lore.kernel.org/xen-devel/1649963973-22879-1-git-send-email-olekstys... https://lore.kernel.org/xen-devel/1649964960-24864-1-git-send-email-olekstys...
Thanks for that. I shall try and find some time to have a look at it.
Did you see Viresh's post about getting our rust-vmm vhost-user backends working on Xen?
Great work! I see the email in my mailbox, but didn't analyze it yet. I will definitely take a look at it.
One thing that came up during that work was how guest pages are mapped into the dom0 domain where Xen needs to use kernel allocated pages via privcmd rather than then normal shared mmap that is used on KVM. As I understand it this is to avoid the situation where dom0 may invalidate a user PTE causing issues for the hypervisor itself. At some point we would like to fix that wrinkle so we can remove the (minor) hack in rust-vmm's mmap code to be truly hypervisor agnostic.
Anyway I hope you and your team are safe and well.
Thank you!
On Tue, Sep 28, 2021 at 7:55 AM Christopher Clark christopher.w.clark@gmail.com wrote:
On Mon, Sep 27, 2021 at 3:06 AM Alex Bennée via Stratos-dev stratos-dev@op-lists.linaro.org wrote:
Marek Marczykowski-Górecki marmarek@invisiblethingslab.com writes:
[[PGP Signed Part:Undecided]] On Fri, Sep 24, 2021 at 05:02:46PM +0100, Alex Bennée wrote: That is a significant regression in terms of security model Xen provides. It would give the backend domain _a lot more_ control over the system that it normally has with Xen PV drivers - negating significant part of security benefits of using driver domains.
It's part of the continual trade off between security and speed. For things like block and network backends there is a penalty if data has to be bounce buffered before it ends up in the guest address space.
I think we have significant flexibility in being able to modify several layers of the stack here to make this efficient, and it would be beneficial to avoid bounce buffering if possible without sacrificing the ability to enforce isolation. I wonder if there's a viable approach possible with some implementation of a virtual IOMMU (which enforces access control) that would allow a backend to commission I/O on a physical device on behalf of a guest, where the data buffers do not need to be mapped into the backend and so avoid the need for a bounce?
This may not require much modification for Linux guest drivers. Although the VIRTIO drivers traditionally assumed devices can DMA to any memory location, there are already constraints in other situations like Confidential Computing, where swiotlb is used for bounce buffering.
Stefan
On 24.09.21 19:02, Alex Bennée wrote:
Hi Alex
[snip]
[STR-56] https://linaro.atlassian.net/browse/STR-56
2.1 Stable ABI for foreignmemory mapping to non-dom0 ([STR-57]) ───────────────────────────────────────────────────────────────
Currently the foreign memory mapping support only works for dom0 due to reference counting issues. If we are to support backends running in their own domains this will need to get fixed.
Estimate: 8w
If I got this paragraph correctly, this is already fixed on Arm [1]
[1] https://lore.kernel.org/xen-devel/1611884932-1851-17-git-send-email-olekstys...
[snip]
On 24/09/2021 17:02, Alex Bennée wrote:
1.1 Upstream an "official" rust crate for Xen ([STR-52]) ────────────────────────────────────────────────────────
To start with we will want an upstream location for future work to be based upon. The intention is the crate is independent of the version of Xen it runs on (above the baseline version chosen). This will entail:
• ☐ agreeing with upstream the name/location for the source
Probably github/xen-project/rust-bindings unless anyone has a better suggestion.
We almost certainly want a companion repository configured as a hello-world example using the bindings and (cross-)compiled for each backend target.
• ☐ documenting the rules for the "stable" hypercall ABI
Easy. There shall be no use of unstable interfaces at all.
This is the *only* way to avoid making the bindings dependent on the version of the hypervisor, and will be a major improvement in the Xen ecosystem.
Any unstable hypercall wanting to be used shall be stabilised in Xen first, which has been vehemently agreed to at multiple dev summits in the past, and will be a useful way of guiding the stabilisation effort.
• ☐ establish an internal interface to elide between ioctl mediated and direct hypercalls • ☐ ensure the crate is multi-arch and has feature parity for arm64
As such we expect the implementation to be standalone, i.e. not wrapping the existing Xen libraries for mediation. There should be a close (1-to-1) mapping between the interfaces in the crate and the eventual hypercall made to the hypervisor.
Estimate: 4w (elapsed likely longer due to discussion)
1.2 Basic Hypervisor Interactions hypercalls ([STR-53]) ───────────────────────────────────────────────────────
These are the bare minimum hypercalls implemented as both ioctl and direct calls. These allow for a very basic binary to:
• ☐ console_io - output IO via the Xen console • ☐ domctl stub - basic stub for domain control (different API?) • ☐ sysctl stub - basic stub for system control (different API?)
The idea would be this provides enough hypercall interface to query the list of domains and output their status via the xen console. There is an open question about if the domctl and sysctl hypercalls are way to go.
console_io probably wants implementing as a backend to println!() or the log module, because users of the crate won't want change how they printf()/etc depending on the target.
That said, console_io hypercalls only do anything for unprivleged VMs in debug builds of the hypervisor. This is fine for development, and less fine in production, so logging ought to use the PV console instead (with room for future expansion to an Argo transport).
domctl/sysctl are unstable interfaces. I don't think they'll be necessary for a basic virtio backend, and they will be the most complicated hypercalls to stabilise.
Estimate: 6w
1.3 [#10] Access to XenStore service ([STR-54]) ───────────────────────────────────────────────
This is a shared configuration storage space accessed via either Unix sockets (on dom0) or via the Xenbus. This is used to access configuration information for the domain.
Is this needed for a backend though? Can everything just be passed direct on the command line?
Currently, if you want a stubdom and you want to instruct it to shut down cleanly, it needs xenstore. Any stubdom which wants disk or network needs xenstore too.
xenbus (the transport) does need to split between ioctl()'s and raw hypercalls. xenstore (the protocol) could be in the xen crate, or a separate one as it is a piece of higher level functionality.
However, we should pay attention to non-xenstore usecases and not paint ourselves into a corner. Some security usecases would prefer not to use shared memory, and e.g. might consider using an Argo transport instead of the traditional grant-shared page.
Estimate: 4w
1.4 VirtIO support hypercalls ([STR-55]) ────────────────────────────────────────
These are the hypercalls that need to be implemented to support a VirtIO backend. This includes the ability to map another guests memory into the current domains address space, register to receive IOREQ events when the guest knocks at the doorbell and inject kicks into the guest. The hypercalls we need to support would be:
• ☐ dmop - device model ops (*_ioreq_server, setirq, nr_vpus) • ☐ foreignmemory - map and unmap guest memory
also evtchn, which you need for ioreq notifications.
The DMOP space is larger than what we need for an IOREQ backend so I've based it just on what arch/arm/dm.c exports which is the subset introduced for EPAM's virtio work.
One thing we will want to be is careful with the interface. The current DMOPs are a mess of units (particularly frames vs addresses, which will need to change in Xen in due course) as well as range inclusivity/exclusivity.
Estimate: 12w
2 Xen Hypervisor Support for Stratos ([STR-56]) ═══════════════════════════════════════════════
These tasks include tasks needed to support the various different deployments of Stratos components in Xen.
2.1 Stable ABI for foreignmemory mapping to non-dom0 ([STR-57]) ───────────────────────────────────────────────────────────────
Currently the foreign memory mapping support only works for dom0 due to reference counting issues. If we are to support backends running in their own domains this will need to get fixed.
Oh. It appears as if some of this was completed in https://xenbits.xen.org/gitweb/?p=xen.git%3Ba=commitdiff%3Bh=4922caf1de5a08d...
~Andrew
stratos-dev@op-lists.linaro.org