On Tue, 13 Oct 2020, Arnd Bergmann wrote:
On Tue, Oct 13, 2020 at 2:53 AM Stefano Stabellini stefano.stabellini@xilinx.com wrote:
On Mon, 12 Oct 2020, Alex Bennée wrote:
On Wed, Oct 7, 2020 at 12:37 AM Stefano Stabellini stefano.stabellini@xilinx.com wrote:
On Fri, 2 Oct 2020, Arnd Bergmann via Stratos-dev wrote: My suggestion would be to hack together a pre-shared memory solution, or a hypervisor-mediated solution with Argo, do some benchmarks to understand the degradation, and figure out if the degradation is bad enough that we need to go down the virtio IOMMU route.
Yes, this makes sense. I'll see if I can come up with a basic design for virtio devices based on pre-shared memory in place of the virtqueue, and then we can see if we can prototype the device side in qemu talking to a modified Linux guest. If that works, the next step would be to share the memory with another guest and have that implement the backend instead of qemu.
Just FYI, Xen has mechanisms to pre-share memory areas statically (from VM creation) between VMs. We could pre-share a memory region between domU1 and domU2 and use it for virtio. We would have to come up with a way to mark the memory as "special virtio memory" and let Linux/QEMU know about it. Maybe we could use a reserved-memory binding for it.
If the host allocates that memory, I'd probably just define it as a device specific area (e.g. a PCI BAR) rather than making it part of system RAM and then marking it as reserved.
What I had in mind would be more like the existing virtio though: allocate a contiguous guest physical memory area at device initialization time and register it in place of the normal virtqueue descriptor table.
And the memory allocation would come from the guest with the frontends, right? I think it would be best if the memory allocation was done by the domU with the frontends, rather than the domain with the backends, because otherwise we risk the backend domain running low on memory (many frontend domains connect to a single backend domain).
That seems like a reasonable set of steps. So the pre-shared region will be the source of memory for the virtqueues as well as the "direct" buffers the virtqueues reference?
My original thinking was to use the pre-shared region for everything, in a dma_ops swiotlb fashion: the kernel (frontend) would end up picking bounce buffers out of the pre-shared region thanks to a special swiotlb instance made for the purpose. The backend would be told to map the pre-shared region at initialization and only use already-mapped pages from it.
However, I am not sure this is the best way to do -- you and Arnd might have better ideas on how to integrate the pre-shared region with the rest of the virtio infrastructure.
The swiotlb is also what Jean-Philippe described earlier. The advantage would be that it would be superficially compatible with the virtio specification, but in practice the implementation would remain incompatible with existing guests since this is not how swiotlb works today. It's also more complicated to implement from scratch and less efficient than just having everything in a single FIFO, because the swiotlb code now has to manage allocations in the address space, go through multiple indirections in the dma-mapping code, and touch two memory areas instead of one.
OK