On Fri, Oct 2, 2020 at 4:26 PM Arnd Bergmann arnd@linaro.org wrote:
On Fri, Oct 2, 2020 at 3:44 PM Jean-Philippe Brucker jean-philippe@linaro.org wrote:
- it is a fairly substantial departure from the virtio specification, which defines that transfers can be made to any part of the guest physical address space
One more thought on this: If we do something that is a significant departure from today's virtio-1.1 specification in order to do what we want, it makes sense to consider an even wider change, in this case to the way the virtqueues are defined. We currently have four fundamental layouts:
- split virtqueue, direct descriptors - split virtqueue, indirect descriptors - packed virtqueue, direct descriptors - packed virtqueue, indirect descriptors
In a scenario that always requires bounce buffers and limit the address range accessed by the virtio device implementation, a more logical way to handle this would be shared memory ring with no descriptors pointing to external memory at all, but with all data packed into a FIFO similar to how 9p packs its messages into a simple 2-way FIFO.
This may be possible to implement with just one additional virtio ring layout but minimal changes to any of the higher levels, essentially just flags for negotiating the ring type. (It's also likely that I'm missing a major problem here that would make it way more complicated).
For inter-guest communication, the ideal outcome would be that the virtio driver in one guest makes its ring buffers available to others by allocating and mapping the guest-physical pages through its iommu, while the virtio device implementation in another guest can map the shared ring through a PCI BAR or an MMIO range.
Similarly, a host user implementation of the virtio device would mmap() that ring buffer into its address space and require a notification mechanism but no other access to the memory of the guest running the virtio driver.
I don't know if this approach has been discussed before, but if so, I'd like to know if it has already been rejected or if someone has tried implementing it this way.
Arnd