[Stratos-dev] Re: [virtio-dev] Some thoughts on zero-copy between VM domains for discussion

17 Jan 2022


      Hi Alex,
It looks like the buffer allocation should also be tracked up to the
application. If we don't do that we may have a nice in-kernel technology
that can't properly serve applications.
From network applications perspective, we can distinguish between DPDK/ODP
applications and "normal" applications. I believe applications that use
AF_XDP sockets may need the same support as "normal" applications but this
probably need to be evaluated.
DPDK apps could be adapted so that they "naturally" use memory part of the
right SVA.
For "normal" applications, can bzero-copy be really transparent? I am not
convinced because page translation maintenance may end-up killing the
benefit of zero-copy.
That's why I think applications that need performance should be mindful of
a zero-copy capable sockets. Should the environment not support zero-copy,
the behavior falls back to normal handling.
- A socket is opened with an AF_ZERO_COPY flag
- mmap(NULL, size, AF_ZERO_COPY, socketfd, offset) to get memory from the
right SVA (something along those lines)
- send to the socket from malloced memory would either fail or fall back to
best effort (may be controlled by a flag for simplifying porting to the new
scheme)
Cheers
FF
On Fri, 14 Jan 2022 at 18:16, Jean-Philippe Brucker via Stratos-dev <
stratos-dev@op-lists.linaro.org> wrote:
...
On Fri, Jan 14, 2022 at 02:28:11PM +0000, Alex Bennée wrote:
...
Stefan Hajnoczi stefanha@redhat.com writes:
...
[[PGP Signed Part:Undecided]]
On Thu, Jan 06, 2022 at 05:03:38PM +0000, Alex Bennée wrote:
...
Hi,
To start the new year I thought would dump some of my thoughts on
zero-copy between VM domains. For project Stratos we've gamely avoided
thinking too hard about this while we've been concentrating on solving
more tractable problems. However we can't put it off forever so lets
work through the problem.
Memory Sharing
<snip>
>> Buffer Allocation
<snip>
>> Transparent fallback and scaling
<snip>
>>
>>  - what other constraints we need to take into account?
>>  - can we leverage existing sub-systems to build this support?
>>
>> I look forward to your thoughts ;-)
>
> (Side note: Shared Virtual Addressing (
https://lwn.net/Articles/747230/)
...
...
is an interesting IOMMU feature. It would be neat to have a CPU
equivalent where loads and stores from/to another address space could
be
...
...
done cheaply with CPU support. I don't think this is possible today and
that's why software IOMMUs are slow for per-transaction page
protection.
...
...
In other words, a virtio-net TX VM would set up a page table allowing
read access only to the TX buffers and vring and the virtual network
switch VM would have the capability to access the vring and buffers
through the TX VM's dedicated address space.)
Does binding a device to an address space mean the driver allocations
will be automatically done from the address space or do the drivers need
modifying to take advantage of that? Jean-Phillipe?
The drivers do need modifying and the APIs are very different: with SVA
you're assigning partitions of devices (for example virtqueues) to
different applications. So the kernel driver only does management and
userspace takes care of the data path. A device partition accesses the
whole address space of the process it is assigned to, so there is no
explicit DMA allocation.
Note that it also requires special features from the device (multiple
address spaces with PCIe PASID, recoverable DMA page faults with PRI).
And the concept doesn't necessarily fit nicely with all device classes -
you probably don't want to load a full network stack in any application
that needs to send a couple of packets. One that demonstrates the concept
well in my opinion is the crypto/compression accelerators that use SVA in
Linux [1]: any application that needs fast compression or encryption on
some of its memory can open a queue in the accelerator, submit jobs with
pointers to input and output buffers and wait for completion while doing
something else. It is several orders of magnitude faster than letting the
CPU do this work and only the core mm deals with memory management.
Anyway this seems mostly off-topic. As Stefan said what would be useful
for our problem is help from the CPUs to share bits of address space
without disclosing the whole VM memory. At the moment any vIOMMU mapping
needs to be shadowed by the hypervisor somehow. Unless we use static
shared memory regions, frontend VM needs to tell the hypervisor which
pages it chooses to share with backend VM, and the hypervisor to map then
unmap that into backend VM's stage-2. I think that operation requires two
context switches to the hypervisor any way we look at it.
Thanks,
Jean
[1]
https://lore.kernel.org/linux-iommu/1579097568-17542-1-git-send-email-zhangf...
...
...
Some storage and networking applications use buffered I/O where the
guest kernel owns the DMA buffer while others use zero-copy I/O where
guest userspace pages are pinned for DMA. I think both cases need to be
considered.
Are guest userspace-visible API changes allowed (e.g. making the
userspace application aware at buffer allocation time)?
I assume you mean enhanced rather than breaking APIs here? I don't see
why not. Certainly for the vhost-user backends we are writing we aren't
beholden to sticking to an old API.
...
Ideally the
requirement would be that zero-copy must work for unmodified
applications, but I'm not sure if that's realistic.
By the way, VIRTIO 1.2 introduces Shared Memory Regions. These are
memory regions (e.g. PCI BAR ranges) that the guest driver can map. If
the host pages must come from a special pool then Shared Memory Regions
would be one way to configure the guest to use this special zero-copy
area for data buffers and even vrings. New VIRTIO feature bits and
transport-specific information may need to be added to do this.
Are these fixed sizes or could be accommodate a growing/shrinking region?
Thanks for the pointers.
--
Alex Bennée
--
Stratos-dev mailing list -- stratos-dev@op-lists.linaro.org
To unsubscribe send an email to stratos-dev-leave@op-lists.linaro.org

2025

2024

2023

2022

2021

2020

[Stratos-dev] Re: [virtio-dev] Some thoughts on zero-copy between VM domains for discussion