* Arnd Bergmann via Stratos-dev stratos-dev@op-lists.linaro.org [2020-10-12 15:56:01]:
Yes, my rough idea would be to stay as close as possible to the existing virtio/virtqueue design, but replace the existing descriptors pointing to guest physical memory with TLV headers describing the data in the ring buffer itself. This might not actually be possible, but my hope is that we can get away with making the ring buffer large enough for all data that needs to be in flight at any time, require all data to be processed in order,
I had the impression that virtio allows out-of-order processing of requests i.e requests can be completed not strictly in FIFO order. I think your proposed virtqueue design should still allow that?
Also I think reserving 512kB of memory per virtqueue may not be acceptable in all cases. Some device virtqueue could be more busy than other at a given time - the busy virtqueue fills up fast and stalls for more memory to become available, while there is free memory available with other device virtqueues. A global (swiotlb) pool shared among multiple devices would allow for better sharing of available memory in that sense. For memory constrained configurations, a global shared pool may be desirable solution (vs dedicating per-virtqueue buffers of fixed size).
On Fri, Oct 23, 2020 at 2:29 PM Srivatsa Vaddagiri vatsa@codeaurora.org wrote:
- Arnd Bergmann via Stratos-dev stratos-dev@op-lists.linaro.org [2020-10-12 15:56:01]:
Yes, my rough idea would be to stay as close as possible to the existing virtio/virtqueue design, but replace the existing descriptors pointing to guest physical memory with TLV headers describing the data in the ring buffer itself. This might not actually be possible, but my hope is that we can get away with making the ring buffer large enough for all data that needs to be in flight at any time, require all data to be processed in order,
I had the impression that virtio allows out-of-order processing of requests i.e requests can be completed not strictly in FIFO order. I think your proposed virtqueue design should still allow that?
It does allow this, but I was trying to simplify the design by not doing it.
If there is a single ring containing descriptors and data, then processing them in order would require keeping track of which memory is still in use, and it is more likely to run into an out-of-memory condition because of fragmentation.
As I understood it, the virtio device can be negotiated to either require data to be processed in order or not, so I would require it for this purpose.
Also I think reserving 512kB of memory per virtqueue may not be acceptable in all cases. Some device virtqueue could be more busy than other at a given time - the busy virtqueue fills up fast and stalls for more memory to become available, while there is free memory available with other device virtqueues.
Can you give an example where 512KB is insufficient? Is this just a performance issue when you cannot submit more than 512KB in a single command and have to do it in a loop, or is there loss of functionality because all that data has to be in flight at the same time?
A global (swiotlb) pool shared among multiple devices would allow for better sharing of available memory in that sense. For memory constrained configurations, a global shared pool may be desirable solution (vs dedicating per-virtqueue buffers of fixed size).
Sharing the bounce buffers across all devices would seem to add even more complexity, in particular when the devices are not all provided by the same virtual machine or hardware behind it. This means you would either have to track ownership of each page, or give all back-ends access to all the shared memory and lose isolation between them.
If only some devices share a common shmem area and swiotlb, that would seem to require additional complexity in the swiotlb implementation. My preference would be to use swiotlb as little as possible, and in particular not add features to it.
Arnd
* Arnd Bergmann arnd@linaro.org [2020-10-23 14:44:44]:
Also I think reserving 512kB of memory per virtqueue may not be acceptable in all cases. Some device virtqueue could be more busy than other at a given time - the busy virtqueue fills up fast and stalls for more memory to become available, while there is free memory available with other device virtqueues.
Can you give an example where 512KB is insufficient? Is this just a performance issue when you cannot submit more than 512KB in a single command and have to do it in a loop, or is there loss of functionality because all that data has to be in flight at the same time?
We are running with 1MB of shared memory (largely used for block device) and it fills up pretty fast during boot sometimes, resulting in guest delaying submitting new requests until space becomes available (which has a bearing on IO performance). We also have usecases which requires application (in guest) to read mega-bytes of data when it initializes, which we want to ensure gets done as quickly as possible.
A global (swiotlb) pool shared among multiple devices would allow for better sharing of available memory in that sense. For memory constrained configurations, a global shared pool may be desirable solution (vs dedicating per-virtqueue buffers of fixed size).
Sharing the bounce buffers across all devices would seem to add even more complexity, in particular when the devices are not all provided by the same virtual machine or hardware behind it. This means you would either have to track ownership of each page, or give all back-ends access to all the shared memory and lose isolation between them.
If only some devices share a common shmem area and swiotlb, that would seem to require additional complexity in the swiotlb implementation. My preference would be to use swiotlb as little as possible, and in particular not add features to it.
We are currently reusing swiotlb driver with just a minor change - to have it use memory from an indicated range (via device-tree) rather than allocate at runtime. For our usecase, we are fine to have all the virtio devices use the common shared memory pool.
stratos-dev@op-lists.linaro.org