New subject: Limited memory sharing investigation

13 Oct 2020


      Arnd Bergmann via Stratos-dev stratos-dev@op-lists.linaro.org writes:
...
On Wed, Oct 7, 2020 at 12:37 AM Stefano Stabellini
stefano.stabellini@xilinx.com wrote:
...
On Fri, 2 Oct 2020, Arnd Bergmann via Stratos-dev wrote:
...
...
In a previous discussion [1], several people suggested using a vIOMMU to
dynamically update the mappings rather than statically setting a memory
region usable by the backend. I believe that approach is still worth
considering because it satisfies the security requirement and doesn't
necessarily have worse performance. There is a trade-off between bounce
buffers on one hand, and map notifications on the other.
The problem with static regions is that all of the traffic will require
copying. Sub-page payloads will need bounce buffering anyway, for proper
isolation. But for large payloads bounce buffering might be prohibitive,
and using a virtual IOMMU might actually be more efficient. Instead of
copying large buffers the guest would send a MAP request to the
hypervisor, which would then map the pages into the backend. Despite
taking plenty of cycles for context switching and setting up the maps, it
might be less costly than copying.
Agreed, I would think the iommu based approach is much more promising
here.
The two approaches are not mutually exclusive. The first approach could be
demoed in a couple weeks, while this approach will require months of
work and at least one new virtio interface.
My suggestion would be to hack together a pre-shared memory solution, or
a hypervisor-mediated solution with Argo, do some benchmarks to
understand the degradation, and figure out if the degradation is bad
enough that we need to go down the virtio IOMMU route.
Yes, this makes sense. I'll see if I can come up with a basic design for
virtio devices based on pre-shared memory in place of the virtqueue,
and then we can see if we can prototype the device side in qemu talking
to a modified Linux guest. If that works, the next step would be to
share the memory with another guest and have that implement
the backend instead of qemu.
That seems like a reasonable set of steps. So the pre-shared region will
be the source of memory for the virtqueues as well as the "direct"
buffers the virtqueues reference?
...
...
Xen PV drivers have started with the equivalent of a virtio IOMMU in
place, which we call "grant table". A virtual machine uses the grant
table to share memory explicitly with another virtual machine.
Specifically, the frontend uses the grant table to share memory with the
backend, otherwise the backend is not allowed to map the memory.
(There is a question on whether we could standardize the grant table
interface.)
Speaking from that experience, we ended up switching PV network to use
hypervisor-based copy (without Argo; Argo came later and it is a more
generic solution) because it was faster than the alternatives. We are
still using the grant table for everything else (block, framebuffer,
etc.)
My feeling is that the grant table approach is too specific to Xen
and wouldn't lean itself to porting to most other hypervisors. The
idea of picking virtio seems to be based around the assumption that
this is already portable.
Adding a vIOMMU requirement for regular virtio devices seems
possible and builds on existing guest drivers, but it certainly adds
complexity in all areas (front-end, hypervisor and back-end) without
being an obviously faster than a simpler approach.
OTOH, we can probably use the existing grant table implementation
in Xen for a performance comparison, assuming that a virtio+viommu
based approach would generally be slower than that.
With the existing grant tables we should be able to test a DomU
vhost-user with a pre-shared chunk of memory if we can someway to pass
signalling events to the DomU guest. The front-end could be in the main
Dom0 initially.
I'm hoping this is something that Akashi can look at once he's up to
speed with Xen although I expect it might lag the KVM based PoC a bit.
...
...
The take away is that the results might differ significantly, not just
between a protocol and the other (net and block), but also between
hypervisors (Xen and KVM), and between SoCs. These are difficult
waters to navigate.
Definitely - more data points required ;-)
...
Agreed.
 Arnd

-- 
Alex Bennée

Re: [Stratos-dev] Limited memory sharing investigation