Hi Alex,
On Mon, Apr 12, 2021 at 08:34:54AM +0000, Alex Benn??e via Stratos-dev wrote:
Alex Bennée via Stratos-dev stratos-dev@op-lists.linaro.org writes:
Hi All,
We've been discussing various ideas for Stratos in and around STR-7 (common virtio library). I'd originally de-emphasised the STR-7 work because I wasn't sure if this was duplicate effort given we already had libvhost-user as well as interest in rust-vmm for portable backends in user-space. However we have seen from the Windriver hypervisor-less virtio, NXP's Zephyr/Jailhouse and the requirements for the SCMI server that there is a use-case for a small, liberally licensed C library that is suitable for embedding in lightweight backends without a full Linux stack behind it. These workloads would run in either simple command loops, RTOSes or Unikernels.
Given the multiple interested parties I'm hoping we have enough people who can devote time to collaborate on the project to make the following realistic over the next cycle and culminate in the following demo in 6 months:
Components
portable virtio backend library
- source based library (so you include directly in your project)
- liberally licensed (Apache? to facilitate above)
- tailored for non-POSIX, limited resource setups
- e.g. avoid malloc/free, provide abstraction hooks where needed
- not assume OS facilities (so embeddable in RTOS or Unikernel)
- static virtio configuration supplied from outside library (location of queues etc)
- hypervisor agnostic
- provide a hook to signal when queues need processing
Following on from a discussion with Vincent and Akashi-san last week we need to think more about how to make this hypervisor agnostic. There will always be some code that has to live outside the library but if it ends up being the same amount again what have we gained?
I suspect this should be a from scratch implementation but it's certainly worth investigating the BSD implementation as Windriver have suggested.
SCMI server
This is the work product of STR-4, embedded in an RTOS. I'm suggesting Zephyr makes the most sense here given the previous work done by Peng and Akashi-san but I guess an argument could be made for a unikernel. I would suggest demonstrating portability to a unikernel would be a stretch goal for the demo.
The server would be *build* time selectable to deploy either in a Jailhouse partition or a Xen Dom U deployment. Either way there will need to be an out-of-band communication of location of virtual queues and memory maps - I assume via DTB.
From our discussion last week Zephry's DTB support is very limited and not designed to cope with dynamic setups. So rather than having two build time selectable configurations we should opt for a single binary with a fixed expectation of where the remote guests memory and virtqueues will exist in it's memory space.
I'm still a bit skeptical about "single binary with a fixed expectation" concept. - Is it feasible to expect that all the hypervisors would configure a BE domain in the same way (in respect of assigning a memory region or an interrupt number)? - what if we want a BE domain to - provide different type of virtio devices - support more than one frontend demains at the same time? How can a single binary without ability of dynamic configuration deal with those requirements?
It will then be up to the VMM setting things up to ensure everything is mapped in the appropriate place in the RTOS memory map. There would also be a fixed IRQ map for signalling when things are changes to the RTOS and a fixed doorbell for signalling the other way.
We should think of two different phases: 1) virtio setup/negotiation via MMIO configuration space 2) (virtio device specific) operation via virtqueue
Anyway, signaling mechanism can be different from hypervisor to hypervisor; On Xen, for example, - the notification of MMIO's to the configuration space by FE will be trapped and delivered via an event channel + dedicated "shared IO page" - the notification of virtqueue update from BE to FE will be done via another event channel, not by interrupt.
Another topic is "virtio device specific configuration parameters," for instance, a file path as backend storage for a virtio block. We might need out-of-band(side-band?) communication channel to feed those information to a BE domain. (For Xen, xenstore is used for this purpose in EPAM's virtio-disk implemenetation.)
I'm unfamiliar with the RTOS build process but I guess this would be a single git repository with the RTOS and glue code and git sub-projects for the virtio and scmi libraries?
I think that Zephyr's build process (cmake) allows us to import a library from an external repository.
Deployments
To demonstrate portability we would have:
- Xen hypervisor
- Dom0 with Linux/xen tools
- DomU with Linux with a virtio-scmi front-end
- DomU with RTOS/SCMI server with virtio-scmi back-end
The Dom0 in this case is just for convenience of the demo as we don't currently have a fully working dom0-less setup. The key communication is between the two DomU guests.
However the Dom0 will need also the glue code to setup the communication and memory mapping between the two DomU guests.
I initially thought so, but after looking into Xen api's, I found that we have to call IOREQ-related hypervisor calls directly on a BE domain. At least under the current implementation, dom0 cannot call them on behalf of a BE domain.
This could happily link with the existing Xen library for setting up the guest table mappings.
- Jailhouse
- Linux kernel partition with virtio-scmi front-end
- RTOS/SCMI server partition with a virtio-scmi back-end
The RTOS/SCMI server would be the same binary blob as for Xen. Again some glue setup code would be required. I'm still unsure on how this would work for Jailhouse so if we don't have any Jailhouse expertise joining the project we could do this with KVM instead.
Linux/KVM host - setup code in main host (kvmtool/QEMU launch) - KVM guest with Linux with a virtio-scmi front-end - KVM guest with RTOS/SCMI server with virtio-scmi back-end
The easiest way of implementing BE for kvm is to utilize vhost-user library, but please note that this library internally uses socket(AF_UNIX) eventfd and mmap(), which are in some sense hypervisor-specific interfaces given that linux works as type-2 hypervisor :) Then it's not quite straightforward to port it to RTOS like Zephyr and I don't think a single binary would work both on Xen and kvm.
-Takahiro Akashi
This is closer to Windrivers' hypervisor-less virtio deployment as Jailhouse is not a "proper" hypervisor in this case just a way of partitioning up the resources. There will need to be some way for the kernel and server partitions to signal each other when queues are updated.
Platform
We know we have working Xen on Synquacer and Jailhouse on the iMX. Should we target those as well as a QEMU -M virt for those who wish to play without hardware?
Stretch Goals
Integrate Arnd's fat virtqueues
Hopefully this will be ready soon enough in the cycle that we can add this to the library and prototype the minimal memory cross section.
This is dependant on having something at least sketched out early in the cycle. It would allow us to simplify the shared memory mapping to just plain virtqueus.
Port the server/library to another RTOS/unikernel
This would demonstrate the core code hasn't grown any assumptions about what it is running in.
Run the server blob on another hypervisor
Running in KVM is probably boring at this point. Maybe investigate having it Hafnium? Or in a R-profile safety island setup?
So what do people think? Thoughts? Comments? Volunteers?
-- Alex Bennée -- Stratos-dev mailing list Stratos-dev@op-lists.linaro.org https://op-lists.linaro.org/mailman/listinfo/stratos-dev