Hi all,
I came across this work [1] from Oleksandr today while shuffling
through patches on LKML. I haven't looked at the details but from the
cover letter is seems to provide the same kind of functionality as
P-KVM. I will monitor the progress of this patchset.
Regards,
Mathieu
[1]. https://www.spinics.net/lists/arm-kernel/msg970906.html
Hi,
This email is driven by a brain storming session at a recent sprint
where we considered what VirtIO devices we should look at implementing
next. I ended up going through all the assigned device IDs hunting for
missing spec discussion and existing drivers so I'd welcome feedback
from anybody actively using them - especially as my suppositions about
device types I'm not familiar with may be way off!
Work so far
===========
The devices we've tackled so far have been relatively simple ones and
more focused on the embedded workloads. Both the i2c and gpio virtio
devices allow for a fairly simple backend which can multiplex multiple
client VM requests onto a set of real HW presented via the host OS.
We have also done some work on a vhost-user backend for virtio-video and
have a working PoC although it is a couple of iterations behind the
latest submission to the virtio spec. Continuing work on this is
currently paused while Peter works on libcamera related things (although
more on that later).
Upstream first
==============
We've been pretty clear about the need to do things in an upstream
compatible way which means devices should be:
- properly specified in the OASIS spec
- have at least one driver up-streamed (probably in Linux)
- have a working public backend
for Stratos I think we are pretty happy to implement all new backends in
Rust under the auspices of the rust-vmm project and the vhost-device
repository.
We obviously also need a reasonable use case for why abstracting a HW
type is useful. For example i2c was envisioned as useful on mobile
devices where a lot of disparate auxillary HW is often hanging of an i2c
bus.
Current reserved IDs
====================
Looking at the spec there are currently 42 listed device types in the
reserved ID table. While there are quite a lot that have Linux driver
implementations a number are nothing more than reserved numbers:
ioMemory / 6
------------
No idea what this was meant to be.
rpmsg / 7
---------
Not formalised in the specification but there is a driver in the Linux
kernel. AFAIUI I think it's a fairly simple wrapper around the existing
rpmsg bus. I think this has also been used for OpenAMP's hypervisor-less
VirtIO experiments to communicate between processor domains.
mac80211 wlan / 10
mac80211 hwsim wireless simulation device / 29
----------------------------------------------
When the discussion about a virtio-wifi come up there is inevitably a
debate about what the use case is. There are usually two potential use
cases:
- simulation environment
Here the desire is to have something that looks like a real WiFi
device in simulation so the rest of the stack (up from the driver)
can be the same as when running on real HW.
- abstraction environment
Devices with WiFi are different from fixed networking as they need
to deal with portability events like changing networks and reporting
connection status and quality. If the guest VM is responsible for
the UI it needs to gather this information and generally wants it's
userspace components to use the same kernel APIs to get it as it
would with real HW.
Neither of these have up-streamed the specification to OASIS but there
is an implementation of the mac80211_hwsim in the Linux kernel. I found
evidence of a plain 80211 virtio_wifi.c existing in the Android kernel
trees. So far I've been unable to find backends for these devices but I
assume they must exist if the drivers do!
Debates about what sort of features and control channels need to be
supported often run into questions about why existing specifications
can't be expanded (for example expand virtio-net with a control channel
to report additional wifi related metadata) or use pass through sockets
for talking to the host netlink channel.
rproc serial / 11
-----------------
Again this isn't documented in the standard. I'm not sure if this is
related to rpmsg but there is an implementation as part of the kernel
virtio_console code.
virtio CAIF / 12
----------------
Not documented in the specification although there is a driver in the
kernel as part of the orphaned CAIF networking subsystem. From the
kernel documentation this was a sub-system for talking to modem parts.
memory balloon / 13
-------------------
This seems like an abandoned attempt at a next generation version of the
memory ballooning interface.
Timer/Clock device / 17
-----------------------
This looks like a simple reservation with no proposed implementation.
I don't know if there is a case for this on most modern architectures
which usually have virtualised architected timers anyway.
Access to RTC information may be something that mediated by
firmware/system control buses. For emulation there are a fair number of
industry standard RTC chips modelled and RTC access tends not to be
performance critical.
Signal Distribution Module / 21
-------------------------------
This appears to be a intra-domain communication channel for which an RFC
was posted:
https://lists.oasis-open.org/archives/virtio-dev/201606/msg00030.html
it came with references to kernel and QEMU implementations. I don't know
if this approach has been obviated by other communcation channels like
vsock or scmi.
pstore device / 22
------------------
This appears to be a persistent storage device that was intended to
allow guests to dump information like crash dumps. There was a proposed
kernel driver:
https://lwn.net/Articles/698744/
and a proposed QEMU backend:
https://lore.kernel.org/all/1469632111-23260-1-git-send-email-namhyung@kern…
which were never merged. As far as I can tell no proposal for the virtio spec itself.
Video encoder device / 30
Video decoder device / 31
-------------------------
This is an ongoing development which has iterated several versions of
the spec and the kernel side driver.
NitroSecureModule / 33
----------------------
This is a stripped down Trusted Platform Module (TPM) intended to expose
TPM functionality such as cryptographic functions and attestation to
guests. This looks like it is closely tied with AWS's Nitro Enclaves.
I haven't been able to find any public definition of the spec or
implementation details. How would this interact with other TPM
functionality solutions?
Watchdog / 35
-------------
Discussion about this is usually conflated with reset functionality as
the two are intimately related.
An early interest in this was for providing a well specified reset
functionality firmware running on the -M virt machine model in QEMU. The
need has been reduced somewhat with the provision of the sbsa-ref model
which does have a defined reset pin.
Other questions that would need to be answered include how the
functionality would interact with the hypervisor given a vCPU could
easily not be scheduled by it and therefore miss its kick window.
Currently there have been no proposals for the spec or implementations.
CAN / 36
--------
This is a device of interest to the Automotive industry as it looks to
consolidate numerous ECUs into VM based work loads. There was a proposed
RFC last year:
https://markmail.org/message/hdxj35fsthypllkt?q=virtio-can+list:org%2Eoasis…
and it is presumed there are frontend and backend drivers in vendor
trees. At the last AGL virtualization expert meeting the Open Synergy
guys said they hoped to post new versions of the spec and kernel driver
soon:
https://confluence.automotivelinux.org/pages/viewpage.action?spaceKey=VE&ti…
During our discussion it became clear that while the message bus itself
was fairly simple real HW often has a vendor specific control plane to
enable specific features. Being able to present this flexibility via the
virtio interface without baking in a direct mapping of the HW would be
the challenge.
Parameter Server / 38
---------------------
This is a proposal for a key-value parameter store over virtio. The
exact use case is unclear but I suspect for Arm at least there is
overlap with what is already supported by DT and UEFI variables.
The proposal only seems to have been partially archived on the lists:
https://www.mail-archive.com/virtio-dev@lists.oasis-open.org/msg07201.html
It may be Android related?
Audio policy device / 39
------------------------
Again I think this stems from the Android world and provides a policy
and control device to work in concert with the virtio-sound device. The
initial proposal to the list is here:
https://www.mail-archive.com/virtio-dev@lists.oasis-open.org/msg07255.html
The idea seems to be to have a control layer for dealing with routing
and priority of multiple audio streams.
Bluetooth device / 40
---------------------
Bluetooth suffers from similar complexity problems as 802.11 WiFi.
However the virtio_bt driver in the kernel concentrates on providing a
pipe for a standardised Host Control Interface (HCI) albeit with support
for a selection of vendor specific commands.
I could not find any submission of the specification for standarisation.
Specified but missing backends?
===============================
GPU device / 16
---------------
This is now a fairly mature part of the spec and has implementations is
the kernel, QEMU and a vhost-user backend. However as is commensurate
with the complexity of GPUs there is ongoing development moving from the
VirGL OpenGL encapsulation to a thing called GFXSTREAM which is meant to
make some things easier.
A potential area of interest here is working out what the differences
are in use cases between virtio-gpu and virtio-wayland. virtio-wayland
is currently a ChromeOS only invention so hasn't seen any upstreaming or
specification work but may make more sense where multiple VMs are
drawing only elements of a final display which is composited by a master
program. For further reading see Alyssa's write-up:
https://alyssa.is/using-virtio-wl/
I'm not sure how widely used the existing vhost-user backend is for
virtio-gpu but it could present an opportunity for a more beefy rust-vmm
backend implementation?
Audio device / 25
-----------------
This has a specification and a working kernel driver. However there
isn't a working backend for QEMU although one has been proposed:
Subject: [RFC PATCH 00/27] Virtio sound card implementation
Date: Thu, 29 Apr 2021 17:34:18 +0530
Message-Id: <20210429120445.694420-1-chouhan.shreyansh2702(a)gmail.com>
this could be a candidate for a rust-vmm version?
Other suggestions
=================
When we started Project Stratos there was a survey amongst members on
where there was interest.
virtio-spi/virtio-greybus
-------------------------
Yet another serial bus. We chose to do i2c but doing another similar bus
wouldn't be pushing the state of the art. We could certainly
mentor/guide someone else who wants to get involved in rust-vmm though.
virtio-tuner/virtio-radio
-------------------------
These were early automotive requests. I don't know where these would sit
in relation to the existing virtio-sound and audio policy devices.
virtio-camera
-------------
We have a prototype of virtio-video but as the libcamera project shows
interfacing with modern cameras is quite a complex task these days.
Modern cameras have all sorts of features powered by complex IP blocks
including various amounts of AI. Perhaps it makes more sense to leave
this to see how the libcamera project progresses before seeing what
common features could be exposed.
Conclusion
==========
Considering the progress we've made so far and our growing confidence
with rust-vmm I think the next device we implement a backend for should
be a more complex device. Discussing this with Viresh and Mathieu
earlier today we thought it would be nice if the device was more demo
friendly as CLI's don't often excite.
My initial thoughts is that a rust-vmm backend for virtio-gpu would fit
the bill because:
- already up-streamed in specification and kernel
- known working implementations in QEMU and C based vhost-user daemon
- ongoing development would be a good test of Rust's flexibility
I think virtio-can would also be a useful target for the automotive use
case. Given there will be a new release of the spec soon we should
certainly keep an eye on it.
Anyway I welcome peoples thoughts.
--
Alex Bennée
See also:
Remaining Xen enabling work for rust-vmm - 87pmk472ii.fsf(a)linaro.org
vhost-device outstanding tasks - 87zgj87alq.fsf(a)linaro.org
I am pretty sure the reasons have to do with old x86 PV guests, so I am
CCing Juergen and Boris.
> Hi,
>
> While we've been working on the rust-vmm virtio backends on Xen we
> obviously have to map guest memory info the userspace of the daemon.
> However following the logic of what is going on is a little confusing.
> For example in the Linux backend we have this:
>
> void *osdep_xenforeignmemory_map(xenforeignmemory_handle *fmem,
> uint32_t dom, void *addr,
> int prot, int flags, size_t num,
> const xen_pfn_t arr[/*num*/], int err[/*num*/])
> {
> int fd = fmem->fd;
> privcmd_mmapbatch_v2_t ioctlx;
> size_t i;
> int rc;
>
> addr = mmap(addr, num << XC_PAGE_SHIFT, prot, flags | MAP_SHARED,
> fd, 0);
> if ( addr == MAP_FAILED )
> return NULL;
>
> ioctlx.num = num;
> ioctlx.dom = dom;
> ioctlx.addr = (unsigned long)addr;
> ioctlx.arr = arr;
> ioctlx.err = err;
>
> rc = ioctl(fd, IOCTL_PRIVCMD_MMAPBATCH_V2, &ioctlx);
>
> Where the fd passed down is associated with the /dev/xen/privcmd device
> for issuing hypercalls on userspaces behalf. What is confusing is why
> the function does it's own mmap - one would assume the passed addr would
> be associated with a anonymous or file backed mmap region already that
> the calling code has setup. Applying a mmap to a special device seems a
> little odd.
>
> Looking at the implementation on the kernel side it seems the mmap
> handler only sets a few flags:
>
> static int privcmd_mmap(struct file *file, struct vm_area_struct *vma)
> {
> /* DONTCOPY is essential for Xen because copy_page_range doesn't know
> * how to recreate these mappings */
> vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTCOPY |
> VM_DONTEXPAND | VM_DONTDUMP;
> vma->vm_ops = &privcmd_vm_ops;
> vma->vm_private_data = NULL;
>
> return 0;
> }
>
> So can I confirm that the mmap of /dev/xen/privcmd is being called for
> side effects? Is it so when the actual ioctl is called the correct flags
> are set of the pages associated with the user space virtual address
> range?
>
> Can I confirm there shouldn't be any limitation on where and how the
> userspace virtual address space is setup for the mapping in the guest
> memory?
>
> Is there a reason why this isn't done in the ioctl path itself?
>
> I'm trying to understand the differences between Xen and KVM in the API
> choices here. I think the equivalent is the KVM_SET_USER_MEMORY_REGION
> ioctl for KVM which brings a section of the guest physical address space
> into the userspaces vaddr range.
Hi All,
I'll be on holiday (moving house) next week so I won't be able to chair
the Stratos sync meeting. As it is the middle of summer I'm going to
propose we skip next weeks sync and re-convene on the 3rd of August. Any
objections?
While on the subject of sync meetings are there any topics to discuss.
We've had some discussions on next rust-vmm device to implement but
beyond a vague "maybe virtio-gpu to help with demos" I don't think we've
nailed it down. virtio-can also keeps getting mentioned and while useful
I'm wary it's not pushing our exploration of the possibilities of virtio
further.
I did a talk at GST22 last week which was an overview of VirtIO and what
we had done so far as well as discussing some future directions. You can
see the talk at:
https://huawei-events.de/en/gsts22-j83dco-vod.htm
(Day 2 stream, Chapter 6/TS 05:13:00)
In it potential future areas of exploration where:
Improve Xen API
═══════════════
• More standard mmap
• direct irqfd/eventfd routing
Memory Isolation
════════════════
• fat virtqueue
• iommu/grants vs regions
(x-over with pKVM/CCA?)
Bare metal rust
═══════════════
• re-use exiting VirtIO logic
• but without POSIX layer
I'd like to get a better steer on what we should focus on next after
we've demoed our existing rust-vmm daemons and the Xen vhost-master
work.
--
Alex Bennée