On 05/11/2020 10:46, Alex Bennée wrote:
Stefano Stabellini stefano.stabellini@xilinx.com writes:
Hi Alex,
Thank you for doing this. I am looking forward to STR-20, I have been wanting a clean-up in that area for years. Also STR-19 is well worded and looks mostly fine. I only have a comment on the last statement:
QEMU should also be able to act as the stub setup to pass IOREQ events to a separate vhost-user daemon as a stepping stone to a future Xen aware vhost-user stub program.
Xen can actually have multiple IOREQ servers running for the same domain. Given that the code for receiving IOREQs is minimal and easily portable, it would be better to make any other daemon a proper IOREQ server talking to Xen directly, rather than having to go via QEMU.
The intention is to enable testing of the existing virtio daemons (for example virtio-gpu) where was can translate an IOREQ message and forward a vhost-user event to the daemon. I'm not expecting it to be the final performance orientated solution.
The architecture would look cleaner and would lead to far better performance, removing a step in the hot path. In other words I think the vhost-user daemon should be run as its own IOREQ server.
Well this is where we may need to consider supporting another message format from Xen. Remember the individual daemons are attempting to be hypervisor agnostic with a minimal shim dealing with the specifics of setting up on each hypervisor. That said maybe terminating IOREQ and vhost-user messages isn't that tricky or maybe another message stream makes more sense (vfio user messages?). I haven't dug into the differences yet but I suspect there is a fair degree of commonality given the problem domain.
The main difference you are going to face is accessing the guest memory. In the case of Xen, the process has to issue hypercalls in order to map/unmap guest memory. This will be mapped in its memory address space.
Arguably you could map all the memory at boot from the guest but there are some pitfalls: 1) A guest is allowed to change the memory layout. So you would need to be able to re-map when the guest is doing that. 2) The implementation in Linux doesn't scale today (see XSA-300) because we are stealing one of the page to map the guest page. IOW, if your guest has 1GB of RAM and you are using 2 daemons then you will need to make sure that you can afford to lose 2GB of RAM in dom0.
That said, this is nothing impossible to overcome. Although, it will require some works to get it correctly.
To some extend the problem will be the same even if you decide to only map part of memory (for instance QEMU will try to cache mapping).
Happy to provide more details if there is any interest to solve it.
Cheers,