Julien Grall julien@xen.org writes:
(+CC Arnd, JPB)
On 05/11/2020 10:46, Alex Bennée wrote:
Stefano Stabellini stefano.stabellini@xilinx.com writes:
Hi Alex,
Thank you for doing this. I am looking forward to STR-20, I have been wanting a clean-up in that area for years. Also STR-19 is well worded and looks mostly fine. I only have a comment on the last statement:
QEMU should also be able to act as the stub setup to pass IOREQ events to a separate vhost-user daemon as a stepping stone to a future Xen aware vhost-user stub program.
Xen can actually have multiple IOREQ servers running for the same domain. Given that the code for receiving IOREQs is minimal and easily portable, it would be better to make any other daemon a proper IOREQ server talking to Xen directly, rather than having to go via QEMU.
The intention is to enable testing of the existing virtio daemons (for example virtio-gpu) where was can translate an IOREQ message and forward a vhost-user event to the daemon. I'm not expecting it to be the final performance orientated solution.
The architecture would look cleaner and would lead to far better performance, removing a step in the hot path. In other words I think the vhost-user daemon should be run as its own IOREQ server.
Well this is where we may need to consider supporting another message format from Xen. Remember the individual daemons are attempting to be hypervisor agnostic with a minimal shim dealing with the specifics of setting up on each hypervisor. That said maybe terminating IOREQ and vhost-user messages isn't that tricky or maybe another message stream makes more sense (vfio user messages?). I haven't dug into the differences yet but I suspect there is a fair degree of commonality given the problem domain.
The main difference you are going to face is accessing the guest memory. In the case of Xen, the process has to issue hypercalls in order to map/unmap guest memory. This will be mapped in its memory address space.
Currently QEMU does this for vhost-user backends and passes over a FD to the daemon.
Arguably you could map all the memory at boot from the guest but there are some pitfalls:
- A guest is allowed to change the memory layout. So you would need
to be able to re-map when the guest is doing that. 2) The implementation in Linux doesn't scale today (see XSA-300) because we are stealing one of the page to map the guest page. IOW, if your guest has 1GB of RAM and you are using 2 daemons then you will need to make sure that you can afford to lose 2GB of RAM in dom0.
That said, this is nothing impossible to overcome. Although, it will require some works to get it correctly.
To some extend the problem will be the same even if you decide to only map part of memory (for instance QEMU will try to cache mapping).
Well the next step is investigating what it would take to limit the window into a guest that the vhost-user daemon has. This is stuff Arnd & Jean-Phillipe are currently looking at what would be required from:
- the core virtio spec - the guests driver stack
and finally how this is best exposed to the host/hypervisor so it can do the appropriate mapping. Jean-Phillipe did a write-up of the current thoughts:
Date: Fri, 2 Oct 2020 15:43:36 +0200 From: Jean-Philippe Brucker jean-philippe@linaro.org Subject: Limited memory sharing investigation Message-ID: 20201002134336.GA2196245@myrica
The eventual goal is to allow backends to exist in any arbitrary DomU (or KVM guest) to service the main guest.
Happy to provide more details if there is any interest to solve it.
I suspect the Xen grant table API will be close to the model that will be required on the hypervisor configuration side but I don't know.
Cheers,