Hi all,
I came across this work [1] from Oleksandr today while shuffling
through patches on LKML. I haven't looked at the details but from the
cover letter is seems to provide the same kind of functionality as
P-KVM. I will monitor the progress of this patchset.
Regards,
Mathieu
[1]. https://www.spinics.net/lists/arm-kernel/msg970906.html
Hi,
This email is driven by a brain storming session at a recent sprint
where we considered what VirtIO devices we should look at implementing
next. I ended up going through all the assigned device IDs hunting for
missing spec discussion and existing drivers so I'd welcome feedback
from anybody actively using them - especially as my suppositions about
device types I'm not familiar with may be way off!
Work so far
===========
The devices we've tackled so far have been relatively simple ones and
more focused on the embedded workloads. Both the i2c and gpio virtio
devices allow for a fairly simple backend which can multiplex multiple
client VM requests onto a set of real HW presented via the host OS.
We have also done some work on a vhost-user backend for virtio-video and
have a working PoC although it is a couple of iterations behind the
latest submission to the virtio spec. Continuing work on this is
currently paused while Peter works on libcamera related things (although
more on that later).
Upstream first
==============
We've been pretty clear about the need to do things in an upstream
compatible way which means devices should be:
- properly specified in the OASIS spec
- have at least one driver up-streamed (probably in Linux)
- have a working public backend
for Stratos I think we are pretty happy to implement all new backends in
Rust under the auspices of the rust-vmm project and the vhost-device
repository.
We obviously also need a reasonable use case for why abstracting a HW
type is useful. For example i2c was envisioned as useful on mobile
devices where a lot of disparate auxillary HW is often hanging of an i2c
bus.
Current reserved IDs
====================
Looking at the spec there are currently 42 listed device types in the
reserved ID table. While there are quite a lot that have Linux driver
implementations a number are nothing more than reserved numbers:
ioMemory / 6
------------
No idea what this was meant to be.
rpmsg / 7
---------
Not formalised in the specification but there is a driver in the Linux
kernel. AFAIUI I think it's a fairly simple wrapper around the existing
rpmsg bus. I think this has also been used for OpenAMP's hypervisor-less
VirtIO experiments to communicate between processor domains.
mac80211 wlan / 10
mac80211 hwsim wireless simulation device / 29
----------------------------------------------
When the discussion about a virtio-wifi come up there is inevitably a
debate about what the use case is. There are usually two potential use
cases:
- simulation environment
Here the desire is to have something that looks like a real WiFi
device in simulation so the rest of the stack (up from the driver)
can be the same as when running on real HW.
- abstraction environment
Devices with WiFi are different from fixed networking as they need
to deal with portability events like changing networks and reporting
connection status and quality. If the guest VM is responsible for
the UI it needs to gather this information and generally wants it's
userspace components to use the same kernel APIs to get it as it
would with real HW.
Neither of these have up-streamed the specification to OASIS but there
is an implementation of the mac80211_hwsim in the Linux kernel. I found
evidence of a plain 80211 virtio_wifi.c existing in the Android kernel
trees. So far I've been unable to find backends for these devices but I
assume they must exist if the drivers do!
Debates about what sort of features and control channels need to be
supported often run into questions about why existing specifications
can't be expanded (for example expand virtio-net with a control channel
to report additional wifi related metadata) or use pass through sockets
for talking to the host netlink channel.
rproc serial / 11
-----------------
Again this isn't documented in the standard. I'm not sure if this is
related to rpmsg but there is an implementation as part of the kernel
virtio_console code.
virtio CAIF / 12
----------------
Not documented in the specification although there is a driver in the
kernel as part of the orphaned CAIF networking subsystem. From the
kernel documentation this was a sub-system for talking to modem parts.
memory balloon / 13
-------------------
This seems like an abandoned attempt at a next generation version of the
memory ballooning interface.
Timer/Clock device / 17
-----------------------
This looks like a simple reservation with no proposed implementation.
I don't know if there is a case for this on most modern architectures
which usually have virtualised architected timers anyway.
Access to RTC information may be something that mediated by
firmware/system control buses. For emulation there are a fair number of
industry standard RTC chips modelled and RTC access tends not to be
performance critical.
Signal Distribution Module / 21
-------------------------------
This appears to be a intra-domain communication channel for which an RFC
was posted:
https://lists.oasis-open.org/archives/virtio-dev/201606/msg00030.html
it came with references to kernel and QEMU implementations. I don't know
if this approach has been obviated by other communcation channels like
vsock or scmi.
pstore device / 22
------------------
This appears to be a persistent storage device that was intended to
allow guests to dump information like crash dumps. There was a proposed
kernel driver:
https://lwn.net/Articles/698744/
and a proposed QEMU backend:
https://lore.kernel.org/all/1469632111-23260-1-git-send-email-namhyung@kern…
which were never merged. As far as I can tell no proposal for the virtio spec itself.
Video encoder device / 30
Video decoder device / 31
-------------------------
This is an ongoing development which has iterated several versions of
the spec and the kernel side driver.
NitroSecureModule / 33
----------------------
This is a stripped down Trusted Platform Module (TPM) intended to expose
TPM functionality such as cryptographic functions and attestation to
guests. This looks like it is closely tied with AWS's Nitro Enclaves.
I haven't been able to find any public definition of the spec or
implementation details. How would this interact with other TPM
functionality solutions?
Watchdog / 35
-------------
Discussion about this is usually conflated with reset functionality as
the two are intimately related.
An early interest in this was for providing a well specified reset
functionality firmware running on the -M virt machine model in QEMU. The
need has been reduced somewhat with the provision of the sbsa-ref model
which does have a defined reset pin.
Other questions that would need to be answered include how the
functionality would interact with the hypervisor given a vCPU could
easily not be scheduled by it and therefore miss its kick window.
Currently there have been no proposals for the spec or implementations.
CAN / 36
--------
This is a device of interest to the Automotive industry as it looks to
consolidate numerous ECUs into VM based work loads. There was a proposed
RFC last year:
https://markmail.org/message/hdxj35fsthypllkt?q=virtio-can+list:org%2Eoasis…
and it is presumed there are frontend and backend drivers in vendor
trees. At the last AGL virtualization expert meeting the Open Synergy
guys said they hoped to post new versions of the spec and kernel driver
soon:
https://confluence.automotivelinux.org/pages/viewpage.action?spaceKey=VE&ti…
During our discussion it became clear that while the message bus itself
was fairly simple real HW often has a vendor specific control plane to
enable specific features. Being able to present this flexibility via the
virtio interface without baking in a direct mapping of the HW would be
the challenge.
Parameter Server / 38
---------------------
This is a proposal for a key-value parameter store over virtio. The
exact use case is unclear but I suspect for Arm at least there is
overlap with what is already supported by DT and UEFI variables.
The proposal only seems to have been partially archived on the lists:
https://www.mail-archive.com/virtio-dev@lists.oasis-open.org/msg07201.html
It may be Android related?
Audio policy device / 39
------------------------
Again I think this stems from the Android world and provides a policy
and control device to work in concert with the virtio-sound device. The
initial proposal to the list is here:
https://www.mail-archive.com/virtio-dev@lists.oasis-open.org/msg07255.html
The idea seems to be to have a control layer for dealing with routing
and priority of multiple audio streams.
Bluetooth device / 40
---------------------
Bluetooth suffers from similar complexity problems as 802.11 WiFi.
However the virtio_bt driver in the kernel concentrates on providing a
pipe for a standardised Host Control Interface (HCI) albeit with support
for a selection of vendor specific commands.
I could not find any submission of the specification for standarisation.
Specified but missing backends?
===============================
GPU device / 16
---------------
This is now a fairly mature part of the spec and has implementations is
the kernel, QEMU and a vhost-user backend. However as is commensurate
with the complexity of GPUs there is ongoing development moving from the
VirGL OpenGL encapsulation to a thing called GFXSTREAM which is meant to
make some things easier.
A potential area of interest here is working out what the differences
are in use cases between virtio-gpu and virtio-wayland. virtio-wayland
is currently a ChromeOS only invention so hasn't seen any upstreaming or
specification work but may make more sense where multiple VMs are
drawing only elements of a final display which is composited by a master
program. For further reading see Alyssa's write-up:
https://alyssa.is/using-virtio-wl/
I'm not sure how widely used the existing vhost-user backend is for
virtio-gpu but it could present an opportunity for a more beefy rust-vmm
backend implementation?
Audio device / 25
-----------------
This has a specification and a working kernel driver. However there
isn't a working backend for QEMU although one has been proposed:
Subject: [RFC PATCH 00/27] Virtio sound card implementation
Date: Thu, 29 Apr 2021 17:34:18 +0530
Message-Id: <20210429120445.694420-1-chouhan.shreyansh2702(a)gmail.com>
this could be a candidate for a rust-vmm version?
Other suggestions
=================
When we started Project Stratos there was a survey amongst members on
where there was interest.
virtio-spi/virtio-greybus
-------------------------
Yet another serial bus. We chose to do i2c but doing another similar bus
wouldn't be pushing the state of the art. We could certainly
mentor/guide someone else who wants to get involved in rust-vmm though.
virtio-tuner/virtio-radio
-------------------------
These were early automotive requests. I don't know where these would sit
in relation to the existing virtio-sound and audio policy devices.
virtio-camera
-------------
We have a prototype of virtio-video but as the libcamera project shows
interfacing with modern cameras is quite a complex task these days.
Modern cameras have all sorts of features powered by complex IP blocks
including various amounts of AI. Perhaps it makes more sense to leave
this to see how the libcamera project progresses before seeing what
common features could be exposed.
Conclusion
==========
Considering the progress we've made so far and our growing confidence
with rust-vmm I think the next device we implement a backend for should
be a more complex device. Discussing this with Viresh and Mathieu
earlier today we thought it would be nice if the device was more demo
friendly as CLI's don't often excite.
My initial thoughts is that a rust-vmm backend for virtio-gpu would fit
the bill because:
- already up-streamed in specification and kernel
- known working implementations in QEMU and C based vhost-user daemon
- ongoing development would be a good test of Rust's flexibility
I think virtio-can would also be a useful target for the automotive use
case. Given there will be a new release of the spec soon we should
certainly keep an eye on it.
Anyway I welcome peoples thoughts.
--
Alex Bennée
See also:
Remaining Xen enabling work for rust-vmm - 87pmk472ii.fsf(a)linaro.org
vhost-device outstanding tasks - 87zgj87alq.fsf(a)linaro.org
I am pretty sure the reasons have to do with old x86 PV guests, so I am
CCing Juergen and Boris.
> Hi,
>
> While we've been working on the rust-vmm virtio backends on Xen we
> obviously have to map guest memory info the userspace of the daemon.
> However following the logic of what is going on is a little confusing.
> For example in the Linux backend we have this:
>
> void *osdep_xenforeignmemory_map(xenforeignmemory_handle *fmem,
> uint32_t dom, void *addr,
> int prot, int flags, size_t num,
> const xen_pfn_t arr[/*num*/], int err[/*num*/])
> {
> int fd = fmem->fd;
> privcmd_mmapbatch_v2_t ioctlx;
> size_t i;
> int rc;
>
> addr = mmap(addr, num << XC_PAGE_SHIFT, prot, flags | MAP_SHARED,
> fd, 0);
> if ( addr == MAP_FAILED )
> return NULL;
>
> ioctlx.num = num;
> ioctlx.dom = dom;
> ioctlx.addr = (unsigned long)addr;
> ioctlx.arr = arr;
> ioctlx.err = err;
>
> rc = ioctl(fd, IOCTL_PRIVCMD_MMAPBATCH_V2, &ioctlx);
>
> Where the fd passed down is associated with the /dev/xen/privcmd device
> for issuing hypercalls on userspaces behalf. What is confusing is why
> the function does it's own mmap - one would assume the passed addr would
> be associated with a anonymous or file backed mmap region already that
> the calling code has setup. Applying a mmap to a special device seems a
> little odd.
>
> Looking at the implementation on the kernel side it seems the mmap
> handler only sets a few flags:
>
> static int privcmd_mmap(struct file *file, struct vm_area_struct *vma)
> {
> /* DONTCOPY is essential for Xen because copy_page_range doesn't know
> * how to recreate these mappings */
> vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTCOPY |
> VM_DONTEXPAND | VM_DONTDUMP;
> vma->vm_ops = &privcmd_vm_ops;
> vma->vm_private_data = NULL;
>
> return 0;
> }
>
> So can I confirm that the mmap of /dev/xen/privcmd is being called for
> side effects? Is it so when the actual ioctl is called the correct flags
> are set of the pages associated with the user space virtual address
> range?
>
> Can I confirm there shouldn't be any limitation on where and how the
> userspace virtual address space is setup for the mapping in the guest
> memory?
>
> Is there a reason why this isn't done in the ioctl path itself?
>
> I'm trying to understand the differences between Xen and KVM in the API
> choices here. I think the equivalent is the KVM_SET_USER_MEMORY_REGION
> ioctl for KVM which brings a section of the guest physical address space
> into the userspaces vaddr range.
Hello,
We verified our hypervisor-agnostic Rust based vhost-user backends with Qemu
based setup earlier, and there was growing concern if they were truly
hypervisor-agnostic.
In order to prove that, we decided to give it a try with Xen, a type-1
bare-metal hypervisor.
We are happy to announce that we were able to make progress on that front and
have a working setup where we can test our existing Rust based backends, like
I2C, GPIO, RNG (though only I2C is tested as of now) over Xen.
Key components:
--------------
- Xen: https://github.com/vireshk/xen
Xen requires MMIO and device specific support in order to populate the
required devices at the guest. This tree contains four patches on the top of
mainline Xen, two from Oleksandr (mmio/disk) and two from me (I2C).
- libxen-sys: https://github.com/vireshk/libxen-sys
We currently depend on the userspace tools/libraries provided by Xen, like
xendevicemodel, xenevtchn, xenforeignmemory, etc. This crates provides Rust
wrappers over those calls, generated automatically with help of bindgen
utility in Rust, that allow us to use the installed Xen libraries. Though we
plan to replace this with Rust based "oxerun" (find below) in longer run.
- oxerun (WIP): https://gitlab.com/mathieupoirier/oxerun/-/tree/xen-ioctls
This is Rust based implementations for Ioctl and hypercalls to Xen. This is WIP
and should eventually replace "libxen-sys" crate entirely (which are C based
implementation of the same).
- vhost-device: https://github.com/vireshk/vhost-device
These are Rust based vhost-user backends, maintained inside the rust-vmm
project. This already contain support for I2C and RNG, while GPIO is under
review. These are not required to be modified based on hypervisor and are
truly hypervisor-agnostic.
Ideally the backends are hypervisor agnostic, as explained earlier, but
because of the way Xen maps the guest memory currently, we need a minor update
for the backends to work. Xen maps the memory via a kernel file
/dev/xen/privcmd, which needs calls to mmap() followed by an ioctl() to make
it work. For this a hack has been added to one of the rust-vmm crates,
vm-virtio, which is used by vhost-user.
https://github.com/vireshk/vm-memory/commit/54b56c4dd7293428edbd7731c4dbe57…
The update to vm-memory is responsible to do ioctl() after the already present
mmap().
- vhost-user-master (WIP): https://github.com/vireshk/vhost-user-master
This implements the master side interface of the vhost protocol, and is like
the vhost-user-backend (https://github.com/rust-vmm/vhost-user-backend) crate
maintained inside the rust-vmm project, which provides similar infrastructure
for the backends to use. This shall be hypervisor independent and provide APIs
for the hypervisor specific implementations. This will eventually be
maintained inside the rust-vmm project and used by all Rust based hypervisors.
- xen-vhost-master (WIP): https://github.com/vireshk/xen-vhost-master
This is the Xen specific implementation and uses the APIs provided by
"vhost-user-master", "oxerun" and "libxen-sys" crates for its functioning.
This is designed based on the EPAM's "virtio-disk" repository
(https://github.com/xen-troops/virtio-disk/) and is pretty much similar to it.
One can see the analogy as:
Virtio-disk == "Xen-vhost-master" + "vhost-user-master" + "oxerun" + "libxen-sys" + "vhost-device".
Test setup:
----------
1. Build Xen:
$ ./configure --libdir=/usr/lib --build=x86_64-unknown-linux-gnu --host=aarch64-linux-gnu --disable-docs --disable-golang --disable-ocamltools --with-system-qemu=/root/qemu/build/i386-softmmu/qemu-system-i386;
$ make -j9 debball CROSS_COMPILE=aarch64-linux-gnu- XEN_TARGET_ARCH=arm64
2. Run Xen via Qemu on X86 machine:
$ qemu-system-aarch64 -machine virt,virtualization=on -cpu cortex-a57 -serial mon:stdio \
-device virtio-net-pci,netdev=net0 -netdev user,id=net0,hostfwd=tcp::8022-:22 \
-device virtio-scsi-pci -drive file=/home/vireshk/virtio/debian-bullseye-arm64.qcow2,index=0,id=hd0,if=none,format=qcow2 -device scsi-hd,drive=hd0 \
-display none -m 8192 -smp 8 -kernel /home/vireshk/virtio/xen/xen \
-append "dom0_mem=5G,max:5G dom0_max_vcpus=7 loglvl=all guest_loglvl=all" \
-device guest-loader,addr=0x46000000,kernel=/home/vireshk/kernel/barm64/arch/arm64/boot/Image,bootargs="root=/dev/sda2 console=hvc0 earlyprintk=xen" \
-device ds1338,address=0x20 # This is required to create a virtual I2C based RTC device on Dom0.
This should get Dom0 up and running.
3. Build rust crates:
$ cd /root/
$ git clone https://github.com/vireshk/xen-vhost-master
$ cd xen-vhost-master
$ cargo build
$ cd ../
$ git clone https://github.com/vireshk/vhost-device
$ cd vhost-device
$ cargo build
4. Setup I2C based RTC device
$ echo ds1338 0x20 > /sys/bus/i2c/devices/i2c-0/new_device; echo 0-0020 > /sys/bus/i2c/devices/0-0020/driver/unbind
5. Lets run everything now
# Start the I2C backend in one terminal (open new terminal with "ssh
# root@localhost -p8022"). This tells the I2C backend to hook up to
# "/root/vi2c.sock0" socket and wait for the master to start transacting.
$ /root/vhost-device/target/debug/vhost-device-i2c -s /root/vi2c.sock -c 1 -l 0:32
# Start the xen-vhost-master in another terminal. This provides the path of
# the socket to the master side and the device to look from Xen, which is I2C
# here.
$ /root/xen-vhost-master/target/debug/xen-vhost-master --socket-path /root/vi2c.sock0 --name i2c
# Start guest in another terminal, i2c_domu.conf is attached. The guest kernel
# should have Virtio related config options enabled, along with i2c-virtio
# driver.
$ xl create -c i2c_domu.conf
# The guest should boot fine now. Once the guest is up, you can create the I2C
# RTC device and use it. Following will create /dev/rtc0 in the guest, which
# you can configure with 'hwclock' utility.
$ echo ds1338 0x20 > /sys/bus/i2c/devices/i2c-0/new_device
Hope this helps.
--
viresh
Hi,
In my last survey of assigned device numbers I went through all the
currently assigned device numbers and attempted to glean their current
status. However we currently don't have any devices that might be useful
in a Cloud Native development environment.
To define terms cloud native is the idea you can build a workload
processing element as a VM and run it in the cloud. It consumes data
from virtio-devices and processes it in someway. This VM can then be
moved from being hosted in the cloud and into a real platform which
still provides it's data via a virtio device. The idea being you get the
same behaviour (as well as allowing for data to be recorded so future
debugging/tuning work can be done in the cloud).
Currently most of the virtio devices are actually data sinks - for
example for virtio-video the guest pushes data to the video device for
it to process. What we need is a device(s?) to be a source of data to
feed to these workloads.
Why virtio-media-source? Well rather than creating a device for every
data type maybe it would make more sense to have a generic device which
can advertise the data stream info in it's configuration space. This
would allow the kernel driver to then route the data to the appropriate
kernel subsystem (e.g. v4l or alsa).
Would having a virtio driver potentially feeding different sub-systems
based on configuration be a problem?
What do people think?
--
Alex Bennée