Hi,
This email is driven by a brain storming session at a recent sprint where we considered what VirtIO devices we should look at implementing next. I ended up going through all the assigned device IDs hunting for missing spec discussion and existing drivers so I'd welcome feedback from anybody actively using them - especially as my suppositions about device types I'm not familiar with may be way off!
Work so far ===========
The devices we've tackled so far have been relatively simple ones and more focused on the embedded workloads. Both the i2c and gpio virtio devices allow for a fairly simple backend which can multiplex multiple client VM requests onto a set of real HW presented via the host OS.
We have also done some work on a vhost-user backend for virtio-video and have a working PoC although it is a couple of iterations behind the latest submission to the virtio spec. Continuing work on this is currently paused while Peter works on libcamera related things (although more on that later).
Upstream first ==============
We've been pretty clear about the need to do things in an upstream compatible way which means devices should be:
- properly specified in the OASIS spec - have at least one driver up-streamed (probably in Linux) - have a working public backend
for Stratos I think we are pretty happy to implement all new backends in Rust under the auspices of the rust-vmm project and the vhost-device repository.
We obviously also need a reasonable use case for why abstracting a HW type is useful. For example i2c was envisioned as useful on mobile devices where a lot of disparate auxillary HW is often hanging of an i2c bus.
Current reserved IDs ====================
Looking at the spec there are currently 42 listed device types in the reserved ID table. While there are quite a lot that have Linux driver implementations a number are nothing more than reserved numbers:
ioMemory / 6 ------------
No idea what this was meant to be.
rpmsg / 7 ---------
Not formalised in the specification but there is a driver in the Linux kernel. AFAIUI I think it's a fairly simple wrapper around the existing rpmsg bus. I think this has also been used for OpenAMP's hypervisor-less VirtIO experiments to communicate between processor domains.
mac80211 wlan / 10 mac80211 hwsim wireless simulation device / 29 ----------------------------------------------
When the discussion about a virtio-wifi come up there is inevitably a debate about what the use case is. There are usually two potential use cases:
- simulation environment
Here the desire is to have something that looks like a real WiFi device in simulation so the rest of the stack (up from the driver) can be the same as when running on real HW.
- abstraction environment
Devices with WiFi are different from fixed networking as they need to deal with portability events like changing networks and reporting connection status and quality. If the guest VM is responsible for the UI it needs to gather this information and generally wants it's userspace components to use the same kernel APIs to get it as it would with real HW.
Neither of these have up-streamed the specification to OASIS but there is an implementation of the mac80211_hwsim in the Linux kernel. I found evidence of a plain 80211 virtio_wifi.c existing in the Android kernel trees. So far I've been unable to find backends for these devices but I assume they must exist if the drivers do!
Debates about what sort of features and control channels need to be supported often run into questions about why existing specifications can't be expanded (for example expand virtio-net with a control channel to report additional wifi related metadata) or use pass through sockets for talking to the host netlink channel.
rproc serial / 11 -----------------
Again this isn't documented in the standard. I'm not sure if this is related to rpmsg but there is an implementation as part of the kernel virtio_console code.
virtio CAIF / 12 ----------------
Not documented in the specification although there is a driver in the kernel as part of the orphaned CAIF networking subsystem. From the kernel documentation this was a sub-system for talking to modem parts.
memory balloon / 13 -------------------
This seems like an abandoned attempt at a next generation version of the memory ballooning interface.
Timer/Clock device / 17 -----------------------
This looks like a simple reservation with no proposed implementation.
I don't know if there is a case for this on most modern architectures which usually have virtualised architected timers anyway.
Access to RTC information may be something that mediated by firmware/system control buses. For emulation there are a fair number of industry standard RTC chips modelled and RTC access tends not to be performance critical.
Signal Distribution Module / 21 -------------------------------
This appears to be a intra-domain communication channel for which an RFC was posted:
https://lists.oasis-open.org/archives/virtio-dev/201606/msg00030.html
it came with references to kernel and QEMU implementations. I don't know if this approach has been obviated by other communcation channels like vsock or scmi.
pstore device / 22 ------------------
This appears to be a persistent storage device that was intended to allow guests to dump information like crash dumps. There was a proposed kernel driver:
https://lwn.net/Articles/698744/
and a proposed QEMU backend:
https://lore.kernel.org/all/1469632111-23260-1-git-send-email-namhyung@kerne...
which were never merged. As far as I can tell no proposal for the virtio spec itself.
Video encoder device / 30 Video decoder device / 31 -------------------------
This is an ongoing development which has iterated several versions of the spec and the kernel side driver.
NitroSecureModule / 33 ----------------------
This is a stripped down Trusted Platform Module (TPM) intended to expose TPM functionality such as cryptographic functions and attestation to guests. This looks like it is closely tied with AWS's Nitro Enclaves.
I haven't been able to find any public definition of the spec or implementation details. How would this interact with other TPM functionality solutions?
Watchdog / 35 -------------
Discussion about this is usually conflated with reset functionality as the two are intimately related.
An early interest in this was for providing a well specified reset functionality firmware running on the -M virt machine model in QEMU. The need has been reduced somewhat with the provision of the sbsa-ref model which does have a defined reset pin.
Other questions that would need to be answered include how the functionality would interact with the hypervisor given a vCPU could easily not be scheduled by it and therefore miss its kick window.
Currently there have been no proposals for the spec or implementations.
CAN / 36 --------
This is a device of interest to the Automotive industry as it looks to consolidate numerous ECUs into VM based work loads. There was a proposed RFC last year:
https://markmail.org/message/hdxj35fsthypllkt?q=virtio-can+list:org%2Eoasis-...
and it is presumed there are frontend and backend drivers in vendor trees. At the last AGL virtualization expert meeting the Open Synergy guys said they hoped to post new versions of the spec and kernel driver soon:
https://confluence.automotivelinux.org/pages/viewpage.action?spaceKey=VE&...
During our discussion it became clear that while the message bus itself was fairly simple real HW often has a vendor specific control plane to enable specific features. Being able to present this flexibility via the virtio interface without baking in a direct mapping of the HW would be the challenge.
Parameter Server / 38 ---------------------
This is a proposal for a key-value parameter store over virtio. The exact use case is unclear but I suspect for Arm at least there is overlap with what is already supported by DT and UEFI variables.
The proposal only seems to have been partially archived on the lists:
https://www.mail-archive.com/virtio-dev@lists.oasis-open.org/msg07201.html
It may be Android related?
Audio policy device / 39 ------------------------
Again I think this stems from the Android world and provides a policy and control device to work in concert with the virtio-sound device. The initial proposal to the list is here:
https://www.mail-archive.com/virtio-dev@lists.oasis-open.org/msg07255.html
The idea seems to be to have a control layer for dealing with routing and priority of multiple audio streams.
Bluetooth device / 40 ---------------------
Bluetooth suffers from similar complexity problems as 802.11 WiFi. However the virtio_bt driver in the kernel concentrates on providing a pipe for a standardised Host Control Interface (HCI) albeit with support for a selection of vendor specific commands.
I could not find any submission of the specification for standarisation.
Specified but missing backends? ===============================
GPU device / 16 ---------------
This is now a fairly mature part of the spec and has implementations is the kernel, QEMU and a vhost-user backend. However as is commensurate with the complexity of GPUs there is ongoing development moving from the VirGL OpenGL encapsulation to a thing called GFXSTREAM which is meant to make some things easier.
A potential area of interest here is working out what the differences are in use cases between virtio-gpu and virtio-wayland. virtio-wayland is currently a ChromeOS only invention so hasn't seen any upstreaming or specification work but may make more sense where multiple VMs are drawing only elements of a final display which is composited by a master program. For further reading see Alyssa's write-up:
https://alyssa.is/using-virtio-wl/
I'm not sure how widely used the existing vhost-user backend is for virtio-gpu but it could present an opportunity for a more beefy rust-vmm backend implementation?
Audio device / 25 -----------------
This has a specification and a working kernel driver. However there isn't a working backend for QEMU although one has been proposed:
Subject: [RFC PATCH 00/27] Virtio sound card implementation Date: Thu, 29 Apr 2021 17:34:18 +0530 Message-Id: 20210429120445.694420-1-chouhan.shreyansh2702@gmail.com
this could be a candidate for a rust-vmm version?
Other suggestions =================
When we started Project Stratos there was a survey amongst members on where there was interest.
virtio-spi/virtio-greybus -------------------------
Yet another serial bus. We chose to do i2c but doing another similar bus wouldn't be pushing the state of the art. We could certainly mentor/guide someone else who wants to get involved in rust-vmm though.
virtio-tuner/virtio-radio -------------------------
These were early automotive requests. I don't know where these would sit in relation to the existing virtio-sound and audio policy devices.
virtio-camera -------------
We have a prototype of virtio-video but as the libcamera project shows interfacing with modern cameras is quite a complex task these days. Modern cameras have all sorts of features powered by complex IP blocks including various amounts of AI. Perhaps it makes more sense to leave this to see how the libcamera project progresses before seeing what common features could be exposed.
Conclusion ==========
Considering the progress we've made so far and our growing confidence with rust-vmm I think the next device we implement a backend for should be a more complex device. Discussing this with Viresh and Mathieu earlier today we thought it would be nice if the device was more demo friendly as CLI's don't often excite.
My initial thoughts is that a rust-vmm backend for virtio-gpu would fit the bill because:
- already up-streamed in specification and kernel - known working implementations in QEMU and C based vhost-user daemon - ongoing development would be a good test of Rust's flexibility
I think virtio-can would also be a useful target for the automotive use case. Given there will be a new release of the spec soon we should certainly keep an eye on it.
Anyway I welcome peoples thoughts.
Hello Alex,
Thank you for the detailed email.
On 5/31/2022 1:07 AM, Alex Bennée via Stratos-dev wrote:
Hi,
This email is driven by a brain storming session at a recent sprint where we considered what VirtIO devices we should look at implementing next. I ended up going through all the assigned device IDs hunting for missing spec discussion and existing drivers so I'd welcome feedback from anybody actively using them - especially as my suppositions about device types I'm not familiar with may be way off!
Work so far
The devices we've tackled so far have been relatively simple ones and more focused on the embedded workloads. Both the i2c and gpio virtio devices allow for a fairly simple backend which can multiplex multiple client VM requests onto a set of real HW presented via the host OS.
We have also done some work on a vhost-user backend for virtio-video and have a working PoC although it is a couple of iterations behind the latest submission to the virtio spec. Continuing work on this is currently paused while Peter works on libcamera related things (although more on that later).
I am not sure what we are using for the rust-vmm backends testing. It will be nice to improve the "vmm-reference" implementation available at the "rust-vmm" project so that we can do the independent testing and it can then also help testing w/ Type-1 hypervisor.
Though vmm-reference can't be used as product it will be a good example for any one to test without bringing in lot of complexity of the QEMU, CrosVM or FireCracker.
Upstream first
We've been pretty clear about the need to do things in an upstream compatible way which means devices should be:
- properly specified in the OASIS spec
- have at least one driver up-streamed (probably in Linux)
- have a working public backend
Yes, all of the above points are really nice and for me having the open-source guest frontend and backend are very important. Industry trend is to have the open-source frontend (in Linux most of the time) but lot of implementations keep proprietary backend eventhough they follow all the aspects of the Virtio specs. It limits the uses of the frontends in my view and it can't help community on testing coverage. "virtio-scmi" comes to my mind for this example.
for Stratos I think we are pretty happy to implement all new backends in Rust under the auspices of the rust-vmm project and the vhost-device repository.
Yes, agreed.
We obviously also need a reasonable use case for why abstracting a HW type is useful. For example i2c was envisioned as useful on mobile devices where a lot of disparate auxillary HW is often hanging of an i2c bus.
mac80211 wlan / 10 mac80211 hwsim wireless simulation device / 29
I am not sure if this related but virtio-ethernet keeps coming to us as requirement, I am not sure about the what is the support available in the various projects including Xen. This is a non-Mobile requirement particularly from the IOT or Auto segments. It will be nice to do adb over ethernet in the guest VM from the host shell.
memory balloon / 13
This seems like an abandoned attempt at a next generation version of the memory ballooning interface.
virtio-mem is having more features compared to the virtio-balloon spec. We had a offline discussion w/ David H last year on if virtio-mem is suitable for the Type-1 hypervisors or not and in the process we had found various limitations eventhough guest driver code for virtio-mem can be re-used lot w/ modifications to make it work for the Type-1. I will check if Qualcomm can summarize our thoughts on the virtio-mem and why it is only suitable for Type-2 as of now and what changes we may need to do if it can be modified to work on Type-1 hypervisors.
It is very clear that in the embedded usecases virtio-mem/balloon like usecase is needed since you can't predict the size of the memory required for guest VM accurately and sometimes it could be waste of the memory if we do the carveouts or some other options.
It will be nice to add / hotplug the memory for the guest VM in kernel and userspace on-demand.
Another limitation w/ virtio-mem is that we don't have unified / open-source way of determining the pressure in the guest VM and then asking the hotplugging of memory from the host. From what we read Administrator plugs memory from the host VM as needed for the server usecase. It will be nice to have a PSI based implementation on the guest VM which will actively monitor the pressure on the guest VM and asks Host OS to hotplug the memory.
CAN / 36
This is a device of interest to the Automotive industry as it looks to consolidate numerous ECUs into VM based work loads. There was a proposed RFC last year:
https://markmail.org/message/hdxj35fsthypllkt?q=virtio-can+list:org%2Eoasis-...
and it is presumed there are frontend and backend drivers in vendor trees. At the last AGL virtualization expert meeting the Open Synergy guys said they hoped to post new versions of the spec and kernel driver soon:
https://confluence.automotivelinux.org/pages/viewpage.action?spaceKey=VE&...
During our discussion it became clear that while the message bus itself was fairly simple real HW often has a vendor specific control plane to enable specific features. Being able to present this flexibility via the virtio interface without baking in a direct mapping of the HW would be the challenge.
Yes, we are interested to continue the discussion here and see what may be the suitable approach for CAN protocol for Auto like usecases.
Parameter Server / 38
This is a proposal for a key-value parameter store over virtio. The exact use case is unclear but I suspect for Arm at least there is overlap with what is already supported by DT and UEFI variables.
The proposal only seems to have been partially archived on the lists:
https://www.mail-archive.com/virtio-dev@lists.oasis-open.org/msg07201.html
It may be Android related?
I have never heard of it, but I will check the mailing list discussion.
---Trilok Soni
Hi,
Not sure if there was anything you wanted me to comment on, but since I'm "the wifi guy" ... :)
mac80211 wlan / 10
FWIW, even though I'm the mac80211 maintainer, I'm not aware of a specification or implementation of this ... I don't know what this is at all.
mac80211 hwsim wireless simulation device / 29
This I implemented (both a driver in mac80211-hwsim in the kernel, as well as a device in wmediumd), but I wouldn't really necessarily recommend using it for anything but testing.
I am not sure if this related but virtio-ethernet keeps coming to us as requirement, I am not sure about the what is the support available in the various projects including Xen. This is a non-Mobile requirement particularly from the IOT or Auto segments. It will be nice to do adb over ethernet in the guest VM from the host shell.
For ethernet you have normal virtio-net.
johannes
On 6/1/2022 1:06 PM, Johannes Berg wrote:
Hi,
Not sure if there was anything you wanted me to comment on, but since I'm "the wifi guy" ... :)
mac80211 wlan / 10
FWIW, even though I'm the mac80211 maintainer, I'm not aware of a specification or implementation of this ... I don't know what this is at all.
mac80211 hwsim wireless simulation device / 29
This I implemented (both a driver in mac80211-hwsim in the kernel, as well as a device in wmediumd), but I wouldn't really necessarily recommend using it for anything but testing.
I am not sure if this related but virtio-ethernet keeps coming to us as requirement, I am not sure about the what is the support available in the various projects including Xen. This is a non-Mobile requirement particularly from the IOT or Auto segments. It will be nice to do adb over ethernet in the guest VM from the host shell.
For ethernet you have normal virtio-net.
Thanks. Virtio-net is available, but I think e2e usecase w/ Type-1 Hypervisor is what I am looking for. I believe CrosVM also supports Virtio-net but I am not sure if it works w/ Xen or not.
---Trilok Soni
Trilok Soni quic_tsoni@quicinc.com writes:
On 6/1/2022 1:06 PM, Johannes Berg wrote:
Hi, Not sure if there was anything you wanted me to comment on, but since I'm "the wifi guy" ... :)
mac80211 wlan / 10
FWIW, even though I'm the mac80211 maintainer, I'm not aware of a specification or implementation of this ... I don't know what this is at all.
mac80211 hwsim wireless simulation device / 29
This I implemented (both a driver in mac80211-hwsim in the kernel, as well as a device in wmediumd), but I wouldn't really necessarily recommend using it for anything but testing.
I assume the use-case for this is something like a virtualised Android OS. For cloud native testing I guess a simulation device provides enough of what you need to exercise the guests network stack. However for real deployments you need something to allow selection of networks and reporting of network quality.
I'm not super familiar with the wifi stack but is this all usually handled in one place or do multiple userspace daemons interrogate the kernel APIs for this information?
If it all comes through one place perhaps it's enough for it to be given a pipe to the host to make those queries - effectively creating a proxy to the real host kernel interface?
I am not sure if this related but virtio-ethernet keeps coming to us as requirement, I am not sure about the what is the support available in the various projects including Xen. This is a non-Mobile requirement particularly from the IOT or Auto segments. It will be nice to do adb over ethernet in the guest VM from the host shell.
For ethernet you have normal virtio-net.
Thanks. Virtio-net is available, but I think e2e usecase w/ Type-1 Hypervisor is what I am looking for. I believe CrosVM also supports Virtio-net but I am not sure if it works w/ Xen or not.
In normal Xen you would have a Dom0 with a traditional kernel driver to service the backend. In a more modular setup you might want to have a driver domain that combines the backend with the real HW driver running as a unikernel?
---Trilok Soni
On Wed, 1 Jun 2022 at 22:02, Trilok Soni via Stratos-dev stratos-dev@op-lists.linaro.org wrote:
Hello Alex,
Thank you for the detailed email.
On 5/31/2022 1:07 AM, Alex Bennée via Stratos-dev wrote:
Hi,
This email is driven by a brain storming session at a recent sprint where we considered what VirtIO devices we should look at implementing next. I ended up going through all the assigned device IDs hunting for missing spec discussion and existing drivers so I'd welcome feedback from anybody actively using them - especially as my suppositions about device types I'm not familiar with may be way off!
Work so far
The devices we've tackled so far have been relatively simple ones and more focused on the embedded workloads. Both the i2c and gpio virtio devices allow for a fairly simple backend which can multiplex multiple client VM requests onto a set of real HW presented via the host OS.
We have also done some work on a vhost-user backend for virtio-video and have a working PoC although it is a couple of iterations behind the latest submission to the virtio spec. Continuing work on this is currently paused while Peter works on libcamera related things (although more on that later).
I am not sure what we are using for the rust-vmm backends testing. It will be nice to improve the "vmm-reference" implementation available at the "rust-vmm" project so that we can do the independent testing and it can then also help testing w/ Type-1 hypervisor.
Though vmm-reference can't be used as product it will be a good example for any one to test without bringing in lot of complexity of the QEMU, CrosVM or FireCracker.
Upstream first
We've been pretty clear about the need to do things in an upstream compatible way which means devices should be:
- properly specified in the OASIS spec
- have at least one driver up-streamed (probably in Linux)
- have a working public backend
Yes, all of the above points are really nice and for me having the open-source guest frontend and backend are very important. Industry trend is to have the open-source frontend (in Linux most of the time) but lot of implementations keep proprietary backend eventhough they follow all the aspects of the Virtio specs. It limits the uses of the frontends in my view and it can't help community on testing coverage. "virtio-scmi" comes to my mind for this example.
For the virtio-scmi we are working on adding virtio-scmi support in SCP-firmware which is also used for bare metal power coprocessors and will be available as a Pseudo TA SCMI backend as well. The goal is to use the same SW reference for running on a coprocessor, as a OP-TEE PTA or as a virtio-scmi backend. The upstream is ongoing for the SCP-firmware side and as started in libopenamp and the main open point that needs to be tackled in a proper way, is an efficient transport channel. This will be the next step
for Stratos I think we are pretty happy to implement all new backends in Rust under the auspices of the rust-vmm project and the vhost-device repository.
Yes, agreed.
We obviously also need a reasonable use case for why abstracting a HW type is useful. For example i2c was envisioned as useful on mobile devices where a lot of disparate auxillary HW is often hanging of an i2c bus.
mac80211 wlan / 10 mac80211 hwsim wireless simulation device / 29
I am not sure if this related but virtio-ethernet keeps coming to us as requirement, I am not sure about the what is the support available in the various projects including Xen. This is a non-Mobile requirement particularly from the IOT or Auto segments. It will be nice to do adb over ethernet in the guest VM from the host shell.
memory balloon / 13
This seems like an abandoned attempt at a next generation version of the memory ballooning interface.
virtio-mem is having more features compared to the virtio-balloon spec. We had a offline discussion w/ David H last year on if virtio-mem is suitable for the Type-1 hypervisors or not and in the process we had found various limitations eventhough guest driver code for virtio-mem can be re-used lot w/ modifications to make it work for the Type-1. I will check if Qualcomm can summarize our thoughts on the virtio-mem and why it is only suitable for Type-2 as of now and what changes we may need to do if it can be modified to work on Type-1 hypervisors.
It is very clear that in the embedded usecases virtio-mem/balloon like usecase is needed since you can't predict the size of the memory required for guest VM accurately and sometimes it could be waste of the memory if we do the carveouts or some other options.
It will be nice to add / hotplug the memory for the guest VM in kernel and userspace on-demand.
Another limitation w/ virtio-mem is that we don't have unified / open-source way of determining the pressure in the guest VM and then asking the hotplugging of memory from the host. From what we read Administrator plugs memory from the host VM as needed for the server usecase. It will be nice to have a PSI based implementation on the guest VM which will actively monitor the pressure on the guest VM and asks Host OS to hotplug the memory.
CAN / 36
This is a device of interest to the Automotive industry as it looks to consolidate numerous ECUs into VM based work loads. There was a proposed RFC last year:
https://markmail.org/message/hdxj35fsthypllkt?q=virtio-can+list:org%2Eoasis-...
and it is presumed there are frontend and backend drivers in vendor trees. At the last AGL virtualization expert meeting the Open Synergy guys said they hoped to post new versions of the spec and kernel driver soon:
https://confluence.automotivelinux.org/pages/viewpage.action?spaceKey=VE&...
During our discussion it became clear that while the message bus itself was fairly simple real HW often has a vendor specific control plane to enable specific features. Being able to present this flexibility via the virtio interface without baking in a direct mapping of the HW would be the challenge.
Yes, we are interested to continue the discussion here and see what may be the suitable approach for CAN protocol for Auto like usecases.
Parameter Server / 38
This is a proposal for a key-value parameter store over virtio. The exact use case is unclear but I suspect for Arm at least there is overlap with what is already supported by DT and UEFI variables.
The proposal only seems to have been partially archived on the lists:
https://www.mail-archive.com/virtio-dev@lists.oasis-open.org/msg07201.html
It may be Android related?
I have never heard of it, but I will check the mailing list discussion.
---Trilok Soni
Stratos-dev mailing list -- stratos-dev@op-lists.linaro.org To unsubscribe send an email to stratos-dev-leave@op-lists.linaro.org
Hello Vincent,
On 6/2/2022 12:11 AM, Vincent Guittot wrote:
On Wed, 1 Jun 2022 at 22:02, Trilok Soni via Stratos-dev stratos-dev@op-lists.linaro.org wrote:
Hello Alex,
Thank you for the detailed email.
On 5/31/2022 1:07 AM, Alex Bennée via Stratos-dev wrote:
Hi,
This email is driven by a brain storming session at a recent sprint where we considered what VirtIO devices we should look at implementing next. I ended up going through all the assigned device IDs hunting for missing spec discussion and existing drivers so I'd welcome feedback from anybody actively using them - especially as my suppositions about device types I'm not familiar with may be way off!
Work so far
The devices we've tackled so far have been relatively simple ones and more focused on the embedded workloads. Both the i2c and gpio virtio devices allow for a fairly simple backend which can multiplex multiple client VM requests onto a set of real HW presented via the host OS.
We have also done some work on a vhost-user backend for virtio-video and have a working PoC although it is a couple of iterations behind the latest submission to the virtio spec. Continuing work on this is currently paused while Peter works on libcamera related things (although more on that later).
I am not sure what we are using for the rust-vmm backends testing. It will be nice to improve the "vmm-reference" implementation available at the "rust-vmm" project so that we can do the independent testing and it can then also help testing w/ Type-1 hypervisor.
Though vmm-reference can't be used as product it will be a good example for any one to test without bringing in lot of complexity of the QEMU, CrosVM or FireCracker.
Upstream first
We've been pretty clear about the need to do things in an upstream compatible way which means devices should be:
- properly specified in the OASIS spec - have at least one driver up-streamed (probably in Linux) - have a working public backend
Yes, all of the above points are really nice and for me having the open-source guest frontend and backend are very important. Industry trend is to have the open-source frontend (in Linux most of the time) but lot of implementations keep proprietary backend eventhough they follow all the aspects of the Virtio specs. It limits the uses of the frontends in my view and it can't help community on testing coverage. "virtio-scmi" comes to my mind for this example.
For the virtio-scmi we are working on adding virtio-scmi support in SCP-firmware which is also used for bare metal power coprocessors and will be available as a Pseudo TA SCMI backend as well. The goal is to use the same SW reference for running on a coprocessor, as a OP-TEE PTA or as a virtio-scmi backend. The upstream is ongoing for the SCP-firmware side and as started in libopenamp and the main open point that needs to be tackled in a proper way, is an efficient transport channel. This will be the next step
We have now posted Linux kernel based scmi backend on the mailing list, so you may want to provide comments there and we can weigh in the options?
https://lkml.org/lkml/2022/6/9/153
---Trilok Soni
Hi Trilok,
On Thu, 9 Jun 2022 at 19:15, Trilok Soni quic_tsoni@quicinc.com wrote:
Hello Vincent,
On 6/2/2022 12:11 AM, Vincent Guittot wrote:
On Wed, 1 Jun 2022 at 22:02, Trilok Soni via Stratos-dev stratos-dev@op-lists.linaro.org wrote:
Hello Alex,
Thank you for the detailed email.
On 5/31/2022 1:07 AM, Alex Bennée via Stratos-dev wrote:
Hi,
This email is driven by a brain storming session at a recent sprint where we considered what VirtIO devices we should look at implementing next. I ended up going through all the assigned device IDs hunting for missing spec discussion and existing drivers so I'd welcome feedback from anybody actively using them - especially as my suppositions about device types I'm not familiar with may be way off!
Work so far
The devices we've tackled so far have been relatively simple ones and more focused on the embedded workloads. Both the i2c and gpio virtio devices allow for a fairly simple backend which can multiplex multiple client VM requests onto a set of real HW presented via the host OS.
We have also done some work on a vhost-user backend for virtio-video and have a working PoC although it is a couple of iterations behind the latest submission to the virtio spec. Continuing work on this is currently paused while Peter works on libcamera related things (although more on that later).
I am not sure what we are using for the rust-vmm backends testing. It will be nice to improve the "vmm-reference" implementation available at the "rust-vmm" project so that we can do the independent testing and it can then also help testing w/ Type-1 hypervisor.
Though vmm-reference can't be used as product it will be a good example for any one to test without bringing in lot of complexity of the QEMU, CrosVM or FireCracker.
Upstream first
We've been pretty clear about the need to do things in an upstream compatible way which means devices should be:
- properly specified in the OASIS spec - have at least one driver up-streamed (probably in Linux) - have a working public backend
Yes, all of the above points are really nice and for me having the open-source guest frontend and backend are very important. Industry trend is to have the open-source frontend (in Linux most of the time) but lot of implementations keep proprietary backend eventhough they follow all the aspects of the Virtio specs. It limits the uses of the frontends in my view and it can't help community on testing coverage. "virtio-scmi" comes to my mind for this example.
For the virtio-scmi we are working on adding virtio-scmi support in SCP-firmware which is also used for bare metal power coprocessors and will be available as a Pseudo TA SCMI backend as well. The goal is to use the same SW reference for running on a coprocessor, as a OP-TEE PTA or as a virtio-scmi backend. The upstream is ongoing for the SCP-firmware side and as started in libopenamp and the main open point that needs to be tackled in a proper way, is an efficient transport channel. This will be the next step
We have now posted Linux kernel based scmi backend on the mailing list, so you may want to provide comments there and we can weigh in the options?
Yes, I have seen the patchset on the mailing list but haven't reviewed it yet. It is on my todo list
From previous discussion with Satyaki, Azzedine and Srinivas, one goal of the patchset is to leverage on the linux clock drivers that are already implemented in the kernel
https://lkml.org/lkml/2022/6/9/153
---Trilok Soni
Hi Alex and everyone else, just catching up on some mail and wanted to clarify some things:
Alex Bennée alex.bennee@linaro.org writes:
This email is driven by a brain storming session at a recent sprint where we considered what VirtIO devices we should look at implementing next. I ended up going through all the assigned device IDs hunting for missing spec discussion and existing drivers so I'd welcome feedback from anybody actively using them - especially as my suppositions about device types I'm not familiar with may be way off!
[...snip...]
GPU device / 16
This is now a fairly mature part of the spec and has implementations is the kernel, QEMU and a vhost-user backend. However as is commensurate with the complexity of GPUs there is ongoing development moving from the VirGL OpenGL encapsulation to a thing called GFXSTREAM which is meant to make some things easier.
A potential area of interest here is working out what the differences are in use cases between virtio-gpu and virtio-wayland. virtio-wayland is currently a ChromeOS only invention so hasn't seen any upstreaming or specification work but may make more sense where multiple VMs are drawing only elements of a final display which is composited by a master program. For further reading see Alyssa's write-up:
https://alyssa.is/using-virtio-wl/
I'm not sure how widely used the existing vhost-user backend is for virtio-gpu but it could present an opportunity for a more beefy rust-vmm backend implementation?
As I understand it, virtio-wayland is effectively deprecated in favour of sending Wayland messages over cross-domain virtio-gpu contexts. It's possible to do this now with an upstream kernel, whereas virtio-wayland always required a custom driver in the Chromium kernel.
But crosvm is still the only implementation of a virtio-gpu device that supports Wayland over cross-domain contexts, so it would be great to see a more generic implementation. Especially because, while crosvm can share its virtio-gpu device over vhost-user, it does so in a way that's incompatible with the standardised vhost-user-gpu as implemented by QEMU. When I asked the crosvm developers in their Matrix channel what it would take to use the standard vhost-user-gpu variant, they said that the standard variant was lacking functionality they needed, like mapping and unmapping GPU buffers into the guest.
So if we wanted to push forward with getting making Wayland over virttio-gpu less crosvm specific, I suppose the first step would be to figure out with the crosvm developers what functionality is missing in the vhost-user-gpu protocol. That would then make it possible to use crosvm's device (with the Wayland support) with other VMMs like QEMU.
(CCing my colleage Puck, who has also been working with me on getting Wayland over virtio-gpu up and running outside of Chrome OS.)
Alyssa Ross hi@alyssa.is writes:
[[PGP Signed Part:Undecided]] Hi Alex and everyone else, just catching up on some mail and wanted to clarify some things:
Alex Bennée alex.bennee@linaro.org writes:
This email is driven by a brain storming session at a recent sprint where we considered what VirtIO devices we should look at implementing next. I ended up going through all the assigned device IDs hunting for missing spec discussion and existing drivers so I'd welcome feedback from anybody actively using them - especially as my suppositions about device types I'm not familiar with may be way off!
[...snip...]
GPU device / 16
This is now a fairly mature part of the spec and has implementations is the kernel, QEMU and a vhost-user backend. However as is commensurate with the complexity of GPUs there is ongoing development moving from the VirGL OpenGL encapsulation to a thing called GFXSTREAM which is meant to make some things easier.
A potential area of interest here is working out what the differences are in use cases between virtio-gpu and virtio-wayland. virtio-wayland is currently a ChromeOS only invention so hasn't seen any upstreaming or specification work but may make more sense where multiple VMs are drawing only elements of a final display which is composited by a master program. For further reading see Alyssa's write-up:
https://alyssa.is/using-virtio-wl/
I'm not sure how widely used the existing vhost-user backend is for virtio-gpu but it could present an opportunity for a more beefy rust-vmm backend implementation?
As I understand it, virtio-wayland is effectively deprecated in favour of sending Wayland messages over cross-domain virtio-gpu contexts. It's possible to do this now with an upstream kernel, whereas virtio-wayland always required a custom driver in the Chromium kernel.
That's good to know. I guess there is nothing that prevents the final display of virtual GPUs from multiple guests being mapped onto the final presentation? The automotive use case seems to treat each individual VM with a UI as presenting a surface which the final console manager composites up together depending on safety rules to display to the user.
But crosvm is still the only implementation of a virtio-gpu device that supports Wayland over cross-domain contexts, so it would be great to see a more generic implementation. Especially because, while crosvm can share its virtio-gpu device over vhost-user, it does so in a way that's incompatible with the standardised vhost-user-gpu as implemented by QEMU. When I asked the crosvm developers in their Matrix channel what it would take to use the standard vhost-user-gpu variant, they said that the standard variant was lacking functionality they needed, like mapping and unmapping GPU buffers into the guest.
Is this related to ensuring allocated buffers are properly aligned in the host address space so the HW can use them without needing to copy them again? I seem to recall this was one of the topics that came up in one of the AGL VirtIO GPU workshops with the OpenSynergy people:
https://confluence.automotivelinux.org/pages/viewpage.action?spaceKey=VE&...
zero-copy is a goal everyone seems to want to make the mapping from virtual to real hardware as efficient as possible. Of course zero-copy is very much in opposition to more memory isolation between guests and hosts (e.g. Xen/pKVM). Everyone it seems wants the moon on a stick.
So if we wanted to push forward with getting making Wayland over virttio-gpu less crosvm specific, I suppose the first step would be to figure out with the crosvm developers what functionality is missing in the vhost-user-gpu protocol. That would then make it possible to use crosvm's device (with the Wayland support) with other VMMs like QEMU.
(CCing my colleage Puck, who has also been working with me on getting Wayland over virtio-gpu up and running outside of Chrome OS.)
Thanks. I'm very much an outside observer when it comes to GPUs so welcome the expert input ;-)
[[End of PGP Signed Part]]
Alex Bennée via Stratos-dev stratos-dev@op-lists.linaro.org writes:
Alyssa Ross hi@alyssa.is writes:
As I understand it, virtio-wayland is effectively deprecated in favour of sending Wayland messages over cross-domain virtio-gpu contexts. It's possible to do this now with an upstream kernel, whereas virtio-wayland always required a custom driver in the Chromium kernel.
That's good to know. I guess there is nothing that prevents the final display of virtual GPUs from multiple guests being mapped onto the final presentation? The automotive use case seems to treat each individual VM with a UI as presenting a surface which the final console manager composites up together depending on safety rules to display to the user.
Well, in the Wayland use case, AIUI virtio-gpu is just a transport for the Wayland protocol + shared memory resources. The simplest case is just sending shared CPU memory buffers around (wl_shm), so there's not really any GPU involved in anything but name. Alternatively, it's possible to use dma-bufs, and graphics acceleration through the virtio-gpu devices, and yes, when doing that it's still possible for the host Wayland compositor to combine everything into one presentation — I think they're all just dma-bufs to it.
Does that make sense? I'm also no expert on this but hopefully I'm not too far off.
On Sat, Sep 03, 2022 at 07:43:08AM +0000, Alyssa Ross wrote:
Hi Alex and everyone else, just catching up on some mail and wanted to clarify some things:
Alex Bennée alex.bennee@linaro.org writes:
This email is driven by a brain storming session at a recent sprint where we considered what VirtIO devices we should look at implementing next. I ended up going through all the assigned device IDs hunting for missing spec discussion and existing drivers so I'd welcome feedback from anybody actively using them - especially as my suppositions about device types I'm not familiar with may be way off!
[...snip...]
GPU device / 16
This is now a fairly mature part of the spec and has implementations is the kernel, QEMU and a vhost-user backend. However as is commensurate with the complexity of GPUs there is ongoing development moving from the VirGL OpenGL encapsulation to a thing called GFXSTREAM which is meant to make some things easier.
A potential area of interest here is working out what the differences are in use cases between virtio-gpu and virtio-wayland. virtio-wayland is currently a ChromeOS only invention so hasn't seen any upstreaming or specification work but may make more sense where multiple VMs are drawing only elements of a final display which is composited by a master program. For further reading see Alyssa's write-up:
https://alyssa.is/using-virtio-wl/
I'm not sure how widely used the existing vhost-user backend is for virtio-gpu but it could present an opportunity for a more beefy rust-vmm backend implementation?
As I understand it, virtio-wayland is effectively deprecated in favour of sending Wayland messages over cross-domain virtio-gpu contexts. It's possible to do this now with an upstream kernel, whereas virtio-wayland always required a custom driver in the Chromium kernel.
But crosvm is still the only implementation of a virtio-gpu device that supports Wayland over cross-domain contexts, so it would be great to see a more generic implementation. Especially because, while crosvm can share its virtio-gpu device over vhost-user, it does so in a way that's incompatible with the standardised vhost-user-gpu as implemented by QEMU. When I asked the crosvm developers in their Matrix channel what it would take to use the standard vhost-user-gpu variant, they said that the standard variant was lacking functionality they needed, like mapping and unmapping GPU buffers into the guest.
That sounds somewhat similar to virtiofs and its DAX Window, which needs vhost-user protocol extensions because of how memory is handled. David Gilbert wrote the QEMU virtiofs DAX patches, which are under development.
I took a quick look at the virtio-gpu specs. If the crosvm behavior you mentioned is covered in the VIRTIO spec then I guess it's the "host visible memory region"?
(If it's not in the VIRTIO spec then a spec change needs to be proposed first and a vhost-user protocol spec change can then support that new virtio-gpu feature.)
The VIRTIO_GPU_CMD_RESOURCE_MAP_BLOB command maps the device's resource into the host visible memory region so that the driver can see it.
The virtiofs DAX window uses vhost-user slave channel messages to provide file descriptors and offsets for QEMU to mmap. QEMU mmaps the file pages into the shared memory region seen by the guest driver.
Maybe an equivalent mechanism is needed for virtio-gpu so a device resource file descriptor can be passed to QEMU and then mmapped so the guest driver can see the pages?
I think it's possible to unify the virtiofs and virtio-gpu extensions to the vhost-user protocol. Two new slave channel messages are needed: "map <fd, offset, len> to shared memory resource <n>" and "unmap <offset, len> from shared memory resource <n>". Both devices could use these messages to implement their respective DAX Window and Blob Resource functionality.
So if we wanted to push forward with getting making Wayland over virttio-gpu less crosvm specific, I suppose the first step would be to figure out with the crosvm developers what functionality is missing in the vhost-user-gpu protocol. That would then make it possible to use crosvm's device (with the Wayland support) with other VMMs like QEMU.
(CCing my colleage Puck, who has also been working with me on getting Wayland over virtio-gpu up and running outside of Chrome OS.)
I have CCed David Gilbert (virtiofs DAX Window) and Gurchetan Singh (virtio-gpu shared memory region).
Stefan
* Stefan Hajnoczi (stefanha@redhat.com) wrote:
On Sat, Sep 03, 2022 at 07:43:08AM +0000, Alyssa Ross wrote:
Hi Alex and everyone else, just catching up on some mail and wanted to clarify some things:
Alex Bennée alex.bennee@linaro.org writes:
This email is driven by a brain storming session at a recent sprint where we considered what VirtIO devices we should look at implementing next. I ended up going through all the assigned device IDs hunting for missing spec discussion and existing drivers so I'd welcome feedback from anybody actively using them - especially as my suppositions about device types I'm not familiar with may be way off!
[...snip...]
GPU device / 16
This is now a fairly mature part of the spec and has implementations is the kernel, QEMU and a vhost-user backend. However as is commensurate with the complexity of GPUs there is ongoing development moving from the VirGL OpenGL encapsulation to a thing called GFXSTREAM which is meant to make some things easier.
A potential area of interest here is working out what the differences are in use cases between virtio-gpu and virtio-wayland. virtio-wayland is currently a ChromeOS only invention so hasn't seen any upstreaming or specification work but may make more sense where multiple VMs are drawing only elements of a final display which is composited by a master program. For further reading see Alyssa's write-up:
https://alyssa.is/using-virtio-wl/
I'm not sure how widely used the existing vhost-user backend is for virtio-gpu but it could present an opportunity for a more beefy rust-vmm backend implementation?
As I understand it, virtio-wayland is effectively deprecated in favour of sending Wayland messages over cross-domain virtio-gpu contexts. It's possible to do this now with an upstream kernel, whereas virtio-wayland always required a custom driver in the Chromium kernel.
But crosvm is still the only implementation of a virtio-gpu device that supports Wayland over cross-domain contexts, so it would be great to see a more generic implementation. Especially because, while crosvm can share its virtio-gpu device over vhost-user, it does so in a way that's incompatible with the standardised vhost-user-gpu as implemented by QEMU. When I asked the crosvm developers in their Matrix channel what it would take to use the standard vhost-user-gpu variant, they said that the standard variant was lacking functionality they needed, like mapping and unmapping GPU buffers into the guest.
That sounds somewhat similar to virtiofs and its DAX Window, which needs vhost-user protocol extensions because of how memory is handled. David Gilbert wrote the QEMU virtiofs DAX patches, which are under development.
I took a quick look at the virtio-gpu specs. If the crosvm behavior you mentioned is covered in the VIRTIO spec then I guess it's the "host visible memory region"?
(If it's not in the VIRTIO spec then a spec change needs to be proposed first and a vhost-user protocol spec change can then support that new virtio-gpu feature.)
The VIRTIO_GPU_CMD_RESOURCE_MAP_BLOB command maps the device's resource into the host visible memory region so that the driver can see it.
The virtiofs DAX window uses vhost-user slave channel messages to provide file descriptors and offsets for QEMU to mmap. QEMU mmaps the file pages into the shared memory region seen by the guest driver.
Maybe an equivalent mechanism is needed for virtio-gpu so a device resource file descriptor can be passed to QEMU and then mmapped so the guest driver can see the pages?
I think it's possible to unify the virtiofs and virtio-gpu extensions to the vhost-user protocol. Two new slave channel messages are needed: "map <fd, offset, len> to shared memory resource <n>" and "unmap <offset, len> from shared memory resource <n>". Both devices could use these messages to implement their respective DAX Window and Blob Resource functionality.
It might be possible; but there's a bunch of lifetime/alignment/etc questions to be answered.
For virtiofs DAX we carve out a chunk of a BAR as a 'cache' (unfortunate name) that we can then do mappings into.
The VHOST_USER_SLAVE_FS_MAP/UNMAP commands can do the mapping: https://gitlab.com/virtio-fs/qemu/-/commit/7c29854da484afd7ca95acbd2e4acfc2c... https://gitlab.com/virtio-fs/qemu/-/commit/f32bc2524035931856aa218ce18efa029...
those might do what you want if you can figure out a way to generalise the bar to map them into.
There are some problems; KVM gets really really upset if you try and access an area that doesn't have a mapping or is mapped to a truncated file; do you want the guest to be able to crash like that?
Dave
So if we wanted to push forward with getting making Wayland over virttio-gpu less crosvm specific, I suppose the first step would be to figure out with the crosvm developers what functionality is missing in the vhost-user-gpu protocol. That would then make it possible to use crosvm's device (with the Wayland support) with other VMMs like QEMU.
(CCing my colleage Puck, who has also been working with me on getting Wayland over virtio-gpu up and running outside of Chrome OS.)
I have CCed David Gilbert (virtiofs DAX Window) and Gurchetan Singh (virtio-gpu shared memory region).
Stefan
On Tue, Sep 06, 2022 at 06:33:36PM +0100, Dr. David Alan Gilbert wrote:
- Stefan Hajnoczi (stefanha@redhat.com) wrote:
On Sat, Sep 03, 2022 at 07:43:08AM +0000, Alyssa Ross wrote:
Hi Alex and everyone else, just catching up on some mail and wanted to clarify some things:
Alex Bennée alex.bennee@linaro.org writes:
This email is driven by a brain storming session at a recent sprint where we considered what VirtIO devices we should look at implementing next. I ended up going through all the assigned device IDs hunting for missing spec discussion and existing drivers so I'd welcome feedback from anybody actively using them - especially as my suppositions about device types I'm not familiar with may be way off!
[...snip...]
GPU device / 16
This is now a fairly mature part of the spec and has implementations is the kernel, QEMU and a vhost-user backend. However as is commensurate with the complexity of GPUs there is ongoing development moving from the VirGL OpenGL encapsulation to a thing called GFXSTREAM which is meant to make some things easier.
A potential area of interest here is working out what the differences are in use cases between virtio-gpu and virtio-wayland. virtio-wayland is currently a ChromeOS only invention so hasn't seen any upstreaming or specification work but may make more sense where multiple VMs are drawing only elements of a final display which is composited by a master program. For further reading see Alyssa's write-up:
https://alyssa.is/using-virtio-wl/
I'm not sure how widely used the existing vhost-user backend is for virtio-gpu but it could present an opportunity for a more beefy rust-vmm backend implementation?
As I understand it, virtio-wayland is effectively deprecated in favour of sending Wayland messages over cross-domain virtio-gpu contexts. It's possible to do this now with an upstream kernel, whereas virtio-wayland always required a custom driver in the Chromium kernel.
But crosvm is still the only implementation of a virtio-gpu device that supports Wayland over cross-domain contexts, so it would be great to see a more generic implementation. Especially because, while crosvm can share its virtio-gpu device over vhost-user, it does so in a way that's incompatible with the standardised vhost-user-gpu as implemented by QEMU. When I asked the crosvm developers in their Matrix channel what it would take to use the standard vhost-user-gpu variant, they said that the standard variant was lacking functionality they needed, like mapping and unmapping GPU buffers into the guest.
That sounds somewhat similar to virtiofs and its DAX Window, which needs vhost-user protocol extensions because of how memory is handled. David Gilbert wrote the QEMU virtiofs DAX patches, which are under development.
I took a quick look at the virtio-gpu specs. If the crosvm behavior you mentioned is covered in the VIRTIO spec then I guess it's the "host visible memory region"?
(If it's not in the VIRTIO spec then a spec change needs to be proposed first and a vhost-user protocol spec change can then support that new virtio-gpu feature.)
The VIRTIO_GPU_CMD_RESOURCE_MAP_BLOB command maps the device's resource into the host visible memory region so that the driver can see it.
The virtiofs DAX window uses vhost-user slave channel messages to provide file descriptors and offsets for QEMU to mmap. QEMU mmaps the file pages into the shared memory region seen by the guest driver.
Maybe an equivalent mechanism is needed for virtio-gpu so a device resource file descriptor can be passed to QEMU and then mmapped so the guest driver can see the pages?
I think it's possible to unify the virtiofs and virtio-gpu extensions to the vhost-user protocol. Two new slave channel messages are needed: "map <fd, offset, len> to shared memory resource <n>" and "unmap <offset, len> from shared memory resource <n>". Both devices could use these messages to implement their respective DAX Window and Blob Resource functionality.
It might be possible; but there's a bunch of lifetime/alignment/etc questions to be answered.
For virtiofs DAX we carve out a chunk of a BAR as a 'cache' (unfortunate name) that we can then do mappings into.
The VHOST_USER_SLAVE_FS_MAP/UNMAP commands can do the mapping: https://gitlab.com/virtio-fs/qemu/-/commit/7c29854da484afd7ca95acbd2e4acfc2c... https://gitlab.com/virtio-fs/qemu/-/commit/f32bc2524035931856aa218ce18efa029...
those might do what you want if you can figure out a way to generalise the bar to map them into.
There are some problems; KVM gets really really upset if you try and access an area that doesn't have a mapping or is mapped to a truncated file; do you want the guest to be able to crash like that?
I think you are pointing out the existing problems with virtiofs map/unmap and not new issues related to virtio-gpu or generalizing the vhost-user messages?
There are a few possibilities for dealing with unmapped ranges in Shared Memory Regions:
1. Reserve the unused Shared Memory Region ranges with mmap(PROT_NONE) so that accesses to unmapped pages result in faults. 2. Map zero pages that are either: a. read-only b. read-write but discard stores c. private/anonymous memory
virtiofs does #1 and has trouble with accesses to unmapped areas because KVM's MMIO dispatch loop gets upset. On top of that virtiofs also needs a way to inject the fault into the guest so that the truncated mmap case can be detected in the guest.
The situation is probably easier for virtio-gpu than for virtiofs. I think the underlying host files won't be truncated and guest userspace processes cannot access unmapped pages. So virtio-gpu is less susceptible to unmapped accesses.
But we still need to implement unmapped access semantics. I don't know enough about CPU memory to suggest a solution for injecting unmapped access faults. Maybe you can find someone who can help. I wonder if pmem or CXL devices have similar requirements?
Stefan
* Stefan Hajnoczi (stefanha@redhat.com) wrote:
On Tue, Sep 06, 2022 at 06:33:36PM +0100, Dr. David Alan Gilbert wrote:
- Stefan Hajnoczi (stefanha@redhat.com) wrote:
On Sat, Sep 03, 2022 at 07:43:08AM +0000, Alyssa Ross wrote:
Hi Alex and everyone else, just catching up on some mail and wanted to clarify some things:
Alex Bennée alex.bennee@linaro.org writes:
This email is driven by a brain storming session at a recent sprint where we considered what VirtIO devices we should look at implementing next. I ended up going through all the assigned device IDs hunting for missing spec discussion and existing drivers so I'd welcome feedback from anybody actively using them - especially as my suppositions about device types I'm not familiar with may be way off!
[...snip...]
GPU device / 16
This is now a fairly mature part of the spec and has implementations is the kernel, QEMU and a vhost-user backend. However as is commensurate with the complexity of GPUs there is ongoing development moving from the VirGL OpenGL encapsulation to a thing called GFXSTREAM which is meant to make some things easier.
A potential area of interest here is working out what the differences are in use cases between virtio-gpu and virtio-wayland. virtio-wayland is currently a ChromeOS only invention so hasn't seen any upstreaming or specification work but may make more sense where multiple VMs are drawing only elements of a final display which is composited by a master program. For further reading see Alyssa's write-up:
https://alyssa.is/using-virtio-wl/
I'm not sure how widely used the existing vhost-user backend is for virtio-gpu but it could present an opportunity for a more beefy rust-vmm backend implementation?
As I understand it, virtio-wayland is effectively deprecated in favour of sending Wayland messages over cross-domain virtio-gpu contexts. It's possible to do this now with an upstream kernel, whereas virtio-wayland always required a custom driver in the Chromium kernel.
But crosvm is still the only implementation of a virtio-gpu device that supports Wayland over cross-domain contexts, so it would be great to see a more generic implementation. Especially because, while crosvm can share its virtio-gpu device over vhost-user, it does so in a way that's incompatible with the standardised vhost-user-gpu as implemented by QEMU. When I asked the crosvm developers in their Matrix channel what it would take to use the standard vhost-user-gpu variant, they said that the standard variant was lacking functionality they needed, like mapping and unmapping GPU buffers into the guest.
That sounds somewhat similar to virtiofs and its DAX Window, which needs vhost-user protocol extensions because of how memory is handled. David Gilbert wrote the QEMU virtiofs DAX patches, which are under development.
I took a quick look at the virtio-gpu specs. If the crosvm behavior you mentioned is covered in the VIRTIO spec then I guess it's the "host visible memory region"?
(If it's not in the VIRTIO spec then a spec change needs to be proposed first and a vhost-user protocol spec change can then support that new virtio-gpu feature.)
The VIRTIO_GPU_CMD_RESOURCE_MAP_BLOB command maps the device's resource into the host visible memory region so that the driver can see it.
The virtiofs DAX window uses vhost-user slave channel messages to provide file descriptors and offsets for QEMU to mmap. QEMU mmaps the file pages into the shared memory region seen by the guest driver.
Maybe an equivalent mechanism is needed for virtio-gpu so a device resource file descriptor can be passed to QEMU and then mmapped so the guest driver can see the pages?
I think it's possible to unify the virtiofs and virtio-gpu extensions to the vhost-user protocol. Two new slave channel messages are needed: "map <fd, offset, len> to shared memory resource <n>" and "unmap <offset, len> from shared memory resource <n>". Both devices could use these messages to implement their respective DAX Window and Blob Resource functionality.
It might be possible; but there's a bunch of lifetime/alignment/etc questions to be answered.
For virtiofs DAX we carve out a chunk of a BAR as a 'cache' (unfortunate name) that we can then do mappings into.
The VHOST_USER_SLAVE_FS_MAP/UNMAP commands can do the mapping: https://gitlab.com/virtio-fs/qemu/-/commit/7c29854da484afd7ca95acbd2e4acfc2c... https://gitlab.com/virtio-fs/qemu/-/commit/f32bc2524035931856aa218ce18efa029...
those might do what you want if you can figure out a way to generalise the bar to map them into.
There are some problems; KVM gets really really upset if you try and access an area that doesn't have a mapping or is mapped to a truncated file; do you want the guest to be able to crash like that?
I think you are pointing out the existing problems with virtiofs map/unmap and not new issues related to virtio-gpu or generalizing the vhost-user messages?
Right, although what I don't have a feel of here is the semantics of the things that are being mapped in the GPU case, and what possibility that the driver mapping them has to pick some bad offset.
Dave
There are a few possibilities for dealing with unmapped ranges in Shared Memory Regions:
- Reserve the unused Shared Memory Region ranges with mmap(PROT_NONE) so that accesses to unmapped pages result in faults.
- Map zero pages that are either: a. read-only b. read-write but discard stores c. private/anonymous memory
virtiofs does #1 and has trouble with accesses to unmapped areas because KVM's MMIO dispatch loop gets upset. On top of that virtiofs also needs a way to inject the fault into the guest so that the truncated mmap case can be detected in the guest.
The situation is probably easier for virtio-gpu than for virtiofs. I think the underlying host files won't be truncated and guest userspace processes cannot access unmapped pages. So virtio-gpu is less susceptible to unmapped accesses.
But we still need to implement unmapped access semantics. I don't know enough about CPU memory to suggest a solution for injecting unmapped access faults. Maybe you can find someone who can help. I wonder if pmem or CXL devices have similar requirements?
Stefan
On Wed, Sep 07, 2022 at 03:09:27PM +0100, Dr. David Alan Gilbert wrote:
- Stefan Hajnoczi (stefanha@redhat.com) wrote:
On Tue, Sep 06, 2022 at 06:33:36PM +0100, Dr. David Alan Gilbert wrote:
- Stefan Hajnoczi (stefanha@redhat.com) wrote:
On Sat, Sep 03, 2022 at 07:43:08AM +0000, Alyssa Ross wrote:
Hi Alex and everyone else, just catching up on some mail and wanted to clarify some things:
Alex Bennée alex.bennee@linaro.org writes:
This email is driven by a brain storming session at a recent sprint where we considered what VirtIO devices we should look at implementing next. I ended up going through all the assigned device IDs hunting for missing spec discussion and existing drivers so I'd welcome feedback from anybody actively using them - especially as my suppositions about device types I'm not familiar with may be way off!
[...snip...]
GPU device / 16
This is now a fairly mature part of the spec and has implementations is the kernel, QEMU and a vhost-user backend. However as is commensurate with the complexity of GPUs there is ongoing development moving from the VirGL OpenGL encapsulation to a thing called GFXSTREAM which is meant to make some things easier.
A potential area of interest here is working out what the differences are in use cases between virtio-gpu and virtio-wayland. virtio-wayland is currently a ChromeOS only invention so hasn't seen any upstreaming or specification work but may make more sense where multiple VMs are drawing only elements of a final display which is composited by a master program. For further reading see Alyssa's write-up:
https://alyssa.is/using-virtio-wl/
I'm not sure how widely used the existing vhost-user backend is for virtio-gpu but it could present an opportunity for a more beefy rust-vmm backend implementation?
As I understand it, virtio-wayland is effectively deprecated in favour of sending Wayland messages over cross-domain virtio-gpu contexts. It's possible to do this now with an upstream kernel, whereas virtio-wayland always required a custom driver in the Chromium kernel.
But crosvm is still the only implementation of a virtio-gpu device that supports Wayland over cross-domain contexts, so it would be great to see a more generic implementation. Especially because, while crosvm can share its virtio-gpu device over vhost-user, it does so in a way that's incompatible with the standardised vhost-user-gpu as implemented by QEMU. When I asked the crosvm developers in their Matrix channel what it would take to use the standard vhost-user-gpu variant, they said that the standard variant was lacking functionality they needed, like mapping and unmapping GPU buffers into the guest.
That sounds somewhat similar to virtiofs and its DAX Window, which needs vhost-user protocol extensions because of how memory is handled. David Gilbert wrote the QEMU virtiofs DAX patches, which are under development.
I took a quick look at the virtio-gpu specs. If the crosvm behavior you mentioned is covered in the VIRTIO spec then I guess it's the "host visible memory region"?
(If it's not in the VIRTIO spec then a spec change needs to be proposed first and a vhost-user protocol spec change can then support that new virtio-gpu feature.)
The VIRTIO_GPU_CMD_RESOURCE_MAP_BLOB command maps the device's resource into the host visible memory region so that the driver can see it.
The virtiofs DAX window uses vhost-user slave channel messages to provide file descriptors and offsets for QEMU to mmap. QEMU mmaps the file pages into the shared memory region seen by the guest driver.
Maybe an equivalent mechanism is needed for virtio-gpu so a device resource file descriptor can be passed to QEMU and then mmapped so the guest driver can see the pages?
I think it's possible to unify the virtiofs and virtio-gpu extensions to the vhost-user protocol. Two new slave channel messages are needed: "map <fd, offset, len> to shared memory resource <n>" and "unmap <offset, len> from shared memory resource <n>". Both devices could use these messages to implement their respective DAX Window and Blob Resource functionality.
It might be possible; but there's a bunch of lifetime/alignment/etc questions to be answered.
For virtiofs DAX we carve out a chunk of a BAR as a 'cache' (unfortunate name) that we can then do mappings into.
The VHOST_USER_SLAVE_FS_MAP/UNMAP commands can do the mapping: https://gitlab.com/virtio-fs/qemu/-/commit/7c29854da484afd7ca95acbd2e4acfc2c... https://gitlab.com/virtio-fs/qemu/-/commit/f32bc2524035931856aa218ce18efa029...
those might do what you want if you can figure out a way to generalise the bar to map them into.
There are some problems; KVM gets really really upset if you try and access an area that doesn't have a mapping or is mapped to a truncated file; do you want the guest to be able to crash like that?
I think you are pointing out the existing problems with virtiofs map/unmap and not new issues related to virtio-gpu or generalizing the vhost-user messages?
Right, although what I don't have a feel of here is the semantics of the things that are being mapped in the GPU case, and what possibility that the driver mapping them has to pick some bad offset.
I don't know either. I hope Gurchetan or Gerd can explain how the virtio-gpu Shared Memory Region is used and whether accesses to unmapped portions of the region are expected.
Stefan
stratos-dev@op-lists.linaro.org