Matias Ezequiel Vara Larsen matiasevara@gmail.com writes:
Hello Alex,
I can tell you my experience from working on a PoC (library) to allow the implementation of virtio-devices that are hypervisor/OS agnostic. I focused on two use cases:
- type-I hypervisor in which the backend is running as a VM. This
is an in-house hypervisor that does not support VMExits. 2. Linux user-space. In this case, the library is just used to communicate threads. The goal of this use case is merely testing.
I have chosen virtio-mmio as the way to exchange information between the frontend and backend. I found it hard to synchronize the access to the virtio-mmio layout without VMExits. I had to add some extra bits to allow the front-end and back-end to synchronize, which is required during the device-status initialization. These extra bits would not be needed in case the hypervisor supports VMExits, e.g., KVM.
The support for a vmexit seems rather fundamental to type-2 hypervisors (like KVM) as the VMM is intrinsically linked to a vCPUs run loop. This makes handling a condition like a bit of MMIO fairly natural to implement. For type-1 cases the line of execution between "guest accesses MMIO" and "something services that request" is a little trickier to pin down. Ultimately at that point you are relying on the hypervisor itself to make the scheduling decision to stop executing the guest and allow the backend to do it's thing. We don't really want to expose the exact details about that as it probably varies a lot between hypervisors. However would a backend API semantic that expresses:
- guest has done some MMIO - hypervisor has stopped execution of guest - guest will be restarted when response conditions are set by backend
cover the needs of a virtio backend and could the userspace facing portion of that be agnostic?
Each guest has a memory region that is shared with the backend. This memory region is used by the frontend to allocate the io-buffers. This region also maps the virtio-mmio layout that is initialized by the backend. For the moment, this region is defined when the guest is created. One limitation is that the memory for io-buffers is fixed. At some point, the guest shall be able to balloon this region. Notifications between the frontend and the backend are implemented by using an hypercall. The hypercall mechanism and the memory allocation are abstracted away by a platform layer that exposes an interface that is hypervisor/os agnostic.
I split the backend into a virtio-device driver and a backend driver. The virtio-device driver is the virtqueues and the backend driver gets packets from the virtqueue for post-processing. For example, in the case of virtio-net, the backend driver would decide if the packet goes to the hardware or to another virtio-net device. The virtio-device drivers may be implemented in different ways like by using a single thread, multiple threads, or one thread for all the virtio-devices.
In this PoC, I just tackled two very simple use-cases. These use-cases allowed me to extract some requirements for an hypervisor to support virtio.
Matias
<snip>