Re: [Stratos-dev] Limited memory sharing investigation

13 Oct 2020


      On Mon, 12 Oct 2020, Alex Bennée wrote:
...
...
On Wed, Oct 7, 2020 at 12:37 AM Stefano Stabellini
stefano.stabellini@xilinx.com wrote:
...
On Fri, 2 Oct 2020, Arnd Bergmann via Stratos-dev wrote:
...
...
In a previous discussion [1], several people suggested using a vIOMMU to
dynamically update the mappings rather than statically setting a memory
region usable by the backend. I believe that approach is still worth
considering because it satisfies the security requirement and doesn't
necessarily have worse performance. There is a trade-off between bounce
buffers on one hand, and map notifications on the other.
The problem with static regions is that all of the traffic will require
copying. Sub-page payloads will need bounce buffering anyway, for proper
isolation. But for large payloads bounce buffering might be prohibitive,
and using a virtual IOMMU might actually be more efficient. Instead of
copying large buffers the guest would send a MAP request to the
hypervisor, which would then map the pages into the backend. Despite
taking plenty of cycles for context switching and setting up the maps, it
might be less costly than copying.
Agreed, I would think the iommu based approach is much more promising
here.
The two approaches are not mutually exclusive. The first approach could be
demoed in a couple weeks, while this approach will require months of
work and at least one new virtio interface.
My suggestion would be to hack together a pre-shared memory solution, or
a hypervisor-mediated solution with Argo, do some benchmarks to
understand the degradation, and figure out if the degradation is bad
enough that we need to go down the virtio IOMMU route.
Yes, this makes sense. I'll see if I can come up with a basic design for
virtio devices based on pre-shared memory in place of the virtqueue,
and then we can see if we can prototype the device side in qemu talking
to a modified Linux guest. If that works, the next step would be to
share the memory with another guest and have that implement
the backend instead of qemu.
Just FYI, Xen has mechanisms to pre-share memory areas statically (from
VM creation) between VMs. We could pre-share a memory region between
domU1 and domU2 and use it for virtio. We would have to come up with a
way to mark the memory as "special virtio memory" and let Linux/QEMU
know about it. Maybe we could use a reserved-memory binding for it.
...
That seems like a reasonable set of steps. So the pre-shared region will
be the source of memory for the virtqueues as well as the "direct"
buffers the virtqueues reference?
My original thinking was to use the pre-shared region for everything, in
a dma_ops swiotlb fashion: the kernel (frontend) would end up picking
bounce buffers out of the pre-shared region thanks to a special swiotlb
instance made for the purpose. The backend would be told to map the
pre-shared region at initialization and only use already-mapped pages
from it.
However, I am not sure this is the best way to do -- you and Arnd might
have better ideas on how to integrate the pre-shared region with the
rest of the virtio infrastructure.
...
...
...
Xen PV drivers have started with the equivalent of a virtio IOMMU in
place, which we call "grant table". A virtual machine uses the grant
table to share memory explicitly with another virtual machine.
Specifically, the frontend uses the grant table to share memory with the
backend, otherwise the backend is not allowed to map the memory.
(There is a question on whether we could standardize the grant table
interface.)
Speaking from that experience, we ended up switching PV network to use
hypervisor-based copy (without Argo; Argo came later and it is a more
generic solution) because it was faster than the alternatives. We are
still using the grant table for everything else (block, framebuffer,
etc.)
My feeling is that the grant table approach is too specific to Xen
and wouldn't lean itself to porting to most other hypervisors. The
idea of picking virtio seems to be based around the assumption that
this is already portable.
Yeah, you are right. I didn't mean the grant table "as is"; I meant
making a few changes to it so that it becomes easy to implement in other
hypervisors and turn it into a virtio interface. But I don't know, it
could be easier to start from scratch.
...
...
Adding a vIOMMU requirement for regular virtio devices seems
possible and builds on existing guest drivers, but it certainly adds
complexity in all areas (front-end, hypervisor and back-end) without
being an obviously faster than a simpler approach.
OTOH, we can probably use the existing grant table implementation
in Xen for a performance comparison, assuming that a virtio+viommu
based approach would generally be slower than that.
With the existing grant tables we should be able to test a DomU
vhost-user with a pre-shared chunk of memory if we can someway to pass
signalling events to the DomU guest. The front-end could be in the main
Dom0 initially.
I'm hoping this is something that Akashi can look at once he's up to
speed with Xen although I expect it might lag the KVM based PoC a bit.
Excellent idea! We should definitely be able to use the existing grant
table to do measurements and get some useful data points.

2025

2024

2023

2022

2021

2020

Re: [Stratos-dev] Limited memory sharing investigation