Hi Alex,
Thank you for doing this. I am looking forward to STR-20, I have been wanting a clean-up in that area for years. Also STR-19 is well worded and looks mostly fine. I only have a comment on the last statement:
QEMU should also be able to act as the stub setup to pass IOREQ events to a separate vhost-user daemon as a stepping stone to a future Xen aware vhost-user stub program.
Xen can actually have multiple IOREQ servers running for the same domain. Given that the code for receiving IOREQs is minimal and easily portable, it would be better to make any other daemon a proper IOREQ server talking to Xen directly, rather than having to go via QEMU. The architecture would look cleaner and would lead to far better performance, removing a step in the hot path. In other words I think the vhost-user daemon should be run as its own IOREQ server.
On Wed, 4 Nov 2020, Alex Bennée via Stratos-dev wrote:
Hi Stefano,
I've re-written STR-19 (https://projects.linaro.org/browse/STR-19) now I have a hopefully better understanding of the relationship between IOREQ backends and how QEMU is launched with Xen. I see this approach as a stop-gap for testing additional virtio devices. Eventually we need to specify a new card for a Xen aware vhost-user launching stub that will link our hypervisor agnostic user-space daemons to the Xen specific virtio instance. I think we can hold off on that until we have a better idea of the sort of interface Xen and other type-1's need to provide for them.
I've also raised STR-20 (https://projects.linaro.org/browse/STR-20) to cover the work to fix up the current build and allow for using a leaner native qemu-system-aarch64 to be used instead of qemu-system-i386 and it's additional baggage. I have patches in train for that which I'll post soon.
Could you have a look over both for any obvious snafus?
-- Alex Bennée -- Stratos-dev mailing list Stratos-dev@op-lists.linaro.org https://op-lists.linaro.org/mailman/listinfo/stratos-dev
Stefano Stabellini stefano.stabellini@xilinx.com writes:
Hi Alex,
Thank you for doing this. I am looking forward to STR-20, I have been wanting a clean-up in that area for years. Also STR-19 is well worded and looks mostly fine. I only have a comment on the last statement:
QEMU should also be able to act as the stub setup to pass IOREQ events to a separate vhost-user daemon as a stepping stone to a future Xen aware vhost-user stub program.
Xen can actually have multiple IOREQ servers running for the same domain. Given that the code for receiving IOREQs is minimal and easily portable, it would be better to make any other daemon a proper IOREQ server talking to Xen directly, rather than having to go via QEMU.
The intention is to enable testing of the existing virtio daemons (for example virtio-gpu) where was can translate an IOREQ message and forward a vhost-user event to the daemon. I'm not expecting it to be the final performance orientated solution.
The architecture would look cleaner and would lead to far better performance, removing a step in the hot path. In other words I think the vhost-user daemon should be run as its own IOREQ server.
Well this is where we may need to consider supporting another message format from Xen. Remember the individual daemons are attempting to be hypervisor agnostic with a minimal shim dealing with the specifics of setting up on each hypervisor. That said maybe terminating IOREQ and vhost-user messages isn't that tricky or maybe another message stream makes more sense (vfio user messages?). I haven't dug into the differences yet but I suspect there is a fair degree of commonality given the problem domain.
On Wed, 4 Nov 2020, Alex Bennée via Stratos-dev wrote:
Hi Stefano,
I've re-written STR-19 (https://projects.linaro.org/browse/STR-19) now I have a hopefully better understanding of the relationship between IOREQ backends and how QEMU is launched with Xen. I see this approach as a stop-gap for testing additional virtio devices. Eventually we need to specify a new card for a Xen aware vhost-user launching stub that will link our hypervisor agnostic user-space daemons to the Xen specific virtio instance. I think we can hold off on that until we have a better idea of the sort of interface Xen and other type-1's need to provide for them.
I've also raised STR-20 (https://projects.linaro.org/browse/STR-20) to cover the work to fix up the current build and allow for using a leaner native qemu-system-aarch64 to be used instead of qemu-system-i386 and it's additional baggage. I have patches in train for that which I'll post soon.
Could you have a look over both for any obvious snafus?
-- Alex Bennée -- Stratos-dev mailing list Stratos-dev@op-lists.linaro.org https://op-lists.linaro.org/mailman/listinfo/stratos-dev
On 05/11/2020 10:46, Alex Bennée wrote:
Stefano Stabellini stefano.stabellini@xilinx.com writes:
Hi Alex,
Thank you for doing this. I am looking forward to STR-20, I have been wanting a clean-up in that area for years. Also STR-19 is well worded and looks mostly fine. I only have a comment on the last statement:
QEMU should also be able to act as the stub setup to pass IOREQ events to a separate vhost-user daemon as a stepping stone to a future Xen aware vhost-user stub program.
Xen can actually have multiple IOREQ servers running for the same domain. Given that the code for receiving IOREQs is minimal and easily portable, it would be better to make any other daemon a proper IOREQ server talking to Xen directly, rather than having to go via QEMU.
The intention is to enable testing of the existing virtio daemons (for example virtio-gpu) where was can translate an IOREQ message and forward a vhost-user event to the daemon. I'm not expecting it to be the final performance orientated solution.
The architecture would look cleaner and would lead to far better performance, removing a step in the hot path. In other words I think the vhost-user daemon should be run as its own IOREQ server.
Well this is where we may need to consider supporting another message format from Xen. Remember the individual daemons are attempting to be hypervisor agnostic with a minimal shim dealing with the specifics of setting up on each hypervisor. That said maybe terminating IOREQ and vhost-user messages isn't that tricky or maybe another message stream makes more sense (vfio user messages?). I haven't dug into the differences yet but I suspect there is a fair degree of commonality given the problem domain.
The main difference you are going to face is accessing the guest memory. In the case of Xen, the process has to issue hypercalls in order to map/unmap guest memory. This will be mapped in its memory address space.
Arguably you could map all the memory at boot from the guest but there are some pitfalls: 1) A guest is allowed to change the memory layout. So you would need to be able to re-map when the guest is doing that. 2) The implementation in Linux doesn't scale today (see XSA-300) because we are stealing one of the page to map the guest page. IOW, if your guest has 1GB of RAM and you are using 2 daemons then you will need to make sure that you can afford to lose 2GB of RAM in dom0.
That said, this is nothing impossible to overcome. Although, it will require some works to get it correctly.
To some extend the problem will be the same even if you decide to only map part of memory (for instance QEMU will try to cache mapping).
Happy to provide more details if there is any interest to solve it.
Cheers,
Julien Grall julien@xen.org writes:
(+CC Arnd, JPB)
On 05/11/2020 10:46, Alex Bennée wrote:
Stefano Stabellini stefano.stabellini@xilinx.com writes:
Hi Alex,
Thank you for doing this. I am looking forward to STR-20, I have been wanting a clean-up in that area for years. Also STR-19 is well worded and looks mostly fine. I only have a comment on the last statement:
QEMU should also be able to act as the stub setup to pass IOREQ events to a separate vhost-user daemon as a stepping stone to a future Xen aware vhost-user stub program.
Xen can actually have multiple IOREQ servers running for the same domain. Given that the code for receiving IOREQs is minimal and easily portable, it would be better to make any other daemon a proper IOREQ server talking to Xen directly, rather than having to go via QEMU.
The intention is to enable testing of the existing virtio daemons (for example virtio-gpu) where was can translate an IOREQ message and forward a vhost-user event to the daemon. I'm not expecting it to be the final performance orientated solution.
The architecture would look cleaner and would lead to far better performance, removing a step in the hot path. In other words I think the vhost-user daemon should be run as its own IOREQ server.
Well this is where we may need to consider supporting another message format from Xen. Remember the individual daemons are attempting to be hypervisor agnostic with a minimal shim dealing with the specifics of setting up on each hypervisor. That said maybe terminating IOREQ and vhost-user messages isn't that tricky or maybe another message stream makes more sense (vfio user messages?). I haven't dug into the differences yet but I suspect there is a fair degree of commonality given the problem domain.
The main difference you are going to face is accessing the guest memory. In the case of Xen, the process has to issue hypercalls in order to map/unmap guest memory. This will be mapped in its memory address space.
Currently QEMU does this for vhost-user backends and passes over a FD to the daemon.
Arguably you could map all the memory at boot from the guest but there are some pitfalls:
- A guest is allowed to change the memory layout. So you would need
to be able to re-map when the guest is doing that. 2) The implementation in Linux doesn't scale today (see XSA-300) because we are stealing one of the page to map the guest page. IOW, if your guest has 1GB of RAM and you are using 2 daemons then you will need to make sure that you can afford to lose 2GB of RAM in dom0.
That said, this is nothing impossible to overcome. Although, it will require some works to get it correctly.
To some extend the problem will be the same even if you decide to only map part of memory (for instance QEMU will try to cache mapping).
Well the next step is investigating what it would take to limit the window into a guest that the vhost-user daemon has. This is stuff Arnd & Jean-Phillipe are currently looking at what would be required from:
- the core virtio spec - the guests driver stack
and finally how this is best exposed to the host/hypervisor so it can do the appropriate mapping. Jean-Phillipe did a write-up of the current thoughts:
Date: Fri, 2 Oct 2020 15:43:36 +0200 From: Jean-Philippe Brucker jean-philippe@linaro.org Subject: Limited memory sharing investigation Message-ID: 20201002134336.GA2196245@myrica
The eventual goal is to allow backends to exist in any arbitrary DomU (or KVM guest) to service the main guest.
Happy to provide more details if there is any interest to solve it.
I suspect the Xen grant table API will be close to the model that will be required on the hypervisor configuration side but I don't know.
Cheers,
On Thu, 5 Nov 2020, Alex Bennée wrote:
Julien Grall julien@xen.org writes:
(+CC Arnd, JPB)
On 05/11/2020 10:46, Alex Bennée wrote:
Stefano Stabellini stefano.stabellini@xilinx.com writes:
Hi Alex,
Thank you for doing this. I am looking forward to STR-20, I have been wanting a clean-up in that area for years. Also STR-19 is well worded and looks mostly fine. I only have a comment on the last statement:
QEMU should also be able to act as the stub setup to pass IOREQ events to a separate vhost-user daemon as a stepping stone to a future Xen aware vhost-user stub program.
Xen can actually have multiple IOREQ servers running for the same domain. Given that the code for receiving IOREQs is minimal and easily portable, it would be better to make any other daemon a proper IOREQ server talking to Xen directly, rather than having to go via QEMU.
The intention is to enable testing of the existing virtio daemons (for example virtio-gpu) where was can translate an IOREQ message and forward a vhost-user event to the daemon. I'm not expecting it to be the final performance orientated solution.
The architecture would look cleaner and would lead to far better performance, removing a step in the hot path. In other words I think the vhost-user daemon should be run as its own IOREQ server.
Well this is where we may need to consider supporting another message format from Xen. Remember the individual daemons are attempting to be hypervisor agnostic with a minimal shim dealing with the specifics of setting up on each hypervisor. That said maybe terminating IOREQ and vhost-user messages isn't that tricky or maybe another message stream makes more sense (vfio user messages?).
Receiving IOREQs directly in the virtio deamon would be trivial. It can be done in about 100 lines of code. I wouldn't worry about it.
We could discuss more standard message formats. I am certainly open to the idea. However, keep in mind that vfio is not a standard, it is a Linux interface. And also that Xen ioreqs go back to 15 years, they predate VFIO, and it would be good to align to what is already used in production on x86-land (they have been running multiple IOREQ servers for years now.)
Overall, given that it would take a couple of hours to add IOREQ handling to something like virtio-gpu, I don't think it is worth the effort.
To me, the interesting questions are the ones Julien raised about the memory mapping hypercalls.
I haven't dug into the differences yet but I suspect there is a fair degree of commonality given the problem domain.
The main difference you are going to face is accessing the guest memory. In the case of Xen, the process has to issue hypercalls in order to map/unmap guest memory. This will be mapped in its memory address space.
Currently QEMU does this for vhost-user backends and passes over a FD to the daemon.
It looks oriented to an architecture where QEMU is central to the VM management, i.e. the VMM. QEMU is not a VMM when run on Xen. In a Xen based architecture, QEMU is like one of the virtio daemons, and we can have several.
On Thu, Nov 05, 2020 at 04:30:50PM +0000, Alex Bennée wrote:
Well this is where we may need to consider supporting another message format from Xen. Remember the individual daemons are attempting to be hypervisor agnostic with a minimal shim dealing with the specifics of setting up on each hypervisor. That said maybe terminating IOREQ and vhost-user messages isn't that tricky or maybe another message stream makes more sense (vfio user messages?).
I just remembered about the IOTLB feature in the vhost-user protocol. I'm mentioning it since I haven't seen it discussed so far, but I don't think it can help with the memory mapping problem.
The vhost protocol has bidirectional IOTLB messages, to deal with virtio devices behind vIOMMUs. It works like this: * The vhost device wants to read a buffer described by the vring, but it is behind an IOVA. The device sends an "IOTLB miss" message to the hypervisor. * The hypervisor sends an "IOTLB update" message with the IOVA->PA translation. Or "IOTLB access fail" on error. * When the mapping disappears the hypervisor sends "IOTLB invalidate".
It's inefficient, though can be accelerated with page tables (not upstream, I have a prototype somewhere). But I don't think we need it. In the scenarios were the shared memory is static, this feature isn't necessary (vhost has SET_MEM_TABLE to communicate static memory translations). And even when using a vIOMMU, for zero-copy, there doesn't need to be memory mapping messages between device and hypervisor, because a range of GPA could be reserved at boot for the device side and the hypervisor could populate stage-2 when the driver side issues mappings.
Thanks, Jean
stratos-dev@op-lists.linaro.org