Re: [Stratos-dev] Limited memory sharing investigation

5 Oct 2020

      On Sat, Oct 3, 2020 at 12:20 PM François Ozog francois.ozog@linaro.org wrote:
...
IO Model
...
...

the device is provided an unstructured memory area (or multiple) and

rings. The device does memory allocation as it pleases on the
unstructured area(s) [this is very common on Arm platform devices]
The structured buffer method is to be understood as computer driven IO
while the other is device driven IO.
I still need to read up on this. Can you point to a device driver that
implements this so I can see the implications?
...
User land IO
I think the use of userland IO (in the backend and/or the fronted VM)
may impact the data path in various ways and thus this should be
factored in the analysis.
Key elements of performance: use metata data (prepend, postpend to
data) to get information rather than ring descriptor which is in
uncached memory (we have concrete examples and the cost of the
different strategies may be up to 50% of the base performance)
Can you clarify why any of the descriptors would be uncached?
Do you mean just the descriptors of the physical device in case of
a noncoherent bus, or also shared memory between the guests?
...
Virtio
There are significant performance different between virtio 1.0 and
virtio 1.1: virtio 1.0 touches something like 6 cachelines to insert
an element in a queue while it is only one with virtio 1.1 (note sure
about the numbers but should not be far).
as a reference 6WIND has a VM 2 VM network driver that is not virtio
based and can go beyond 100Gbps per x86 vCPU. So I expect virtio 1.1
to reach that level of performance.
I tried to get some information about the 6WIND driver but couldn't find it.
Do you have a link to their sources, or do you know what they do
specifically?
...
Memory allocation backend
Shared memory allocation shall be controlled by the VMM (Qemu, Xen...)
but the VMM may further ask FF-A to do so. I think there has been
efforts in the FF-A to find memory with proper attributes (coherent
between device and all VMs). I frankly have no clue here but this may
be worth digging into.
I would expect any shared memory between guests to just use
the default memory attributes: For incoming data you end up
copying each packet twice anyway (from the hw driver allocated buffer
to the shared memory, and from shared memory to the actual
skb; for outbound data over a noncoherent device, the driver
needs to take care of proper barriers and cache management).
Just to clarify, the shared memory buffer in this case would be
a fixed host-physical buffer, rather than a fixed guest-physical
location with page flipping, right?
...
inbound traffic:
traffic is received in device SRAM
packet is marshalled to a memory associated to a VLAN (i.e. VM).
descriptor is updated to point to this packet
backend VM kernel handle queues (IRQ, busy polling...), create virtio
descriptors pointing to data as part of the bridging (stripping the
underlay network VLAN tag)
fr ont end DPDK app read the descriptor and the packet
if DNS at expected IP, application handle the packet otherwise drop
Would you expect the guests in this scenario to run simultaneously
on different CPUs and send IPIs between them, or would each
queue always be on a fixed CPU across both guests?
Arnd

2025

2024

2023

2022

2021

2020

Re: [Stratos-dev] Limited memory sharing investigation

IO Model

User land IO

Virtio

Memory allocation backend