AKASHI Takahiro takahiro.akashi@linaro.org writes:
Does anybody have any comments, agree or not agree? Any better ideas? I'm willing to give you technical details behind my thoughts if you like?
I too would like to know if anyone else has any thoughts.
-Takahiro Akashi
On Fri, Apr 16, 2021 at 09:18:37PM +0900, AKASHI Takahiro wrote:
On Thu, Apr 15, 2021 at 11:42:04AM +0100, Alex Benn??e wrote:
AKASHI Takahiro takahiro.akashi@linaro.org writes:
Hi Alex,
On Mon, Apr 12, 2021 at 08:34:54AM +0000, Alex Benn??e via Stratos-dev wrote:
Alex Bennée via Stratos-dev stratos-dev@op-lists.linaro.org writes:
Hi All,
We've been discussing various ideas for Stratos in and around STR-7 (common virtio library). I'd originally de-emphasised the STR-7 work because I wasn't sure if this was duplicate effort given we already had libvhost-user as well as interest in rust-vmm for portable backends in user-space. However we have seen from the Windriver hypervisor-less virtio, NXP's Zephyr/Jailhouse and the requirements for the SCMI server that there is a use-case for a small, liberally licensed C library that is suitable for embedding in lightweight backends without a full Linux stack behind it. These workloads would run in either simple command loops, RTOSes or Unikernels.
Given the multiple interested parties I'm hoping we have enough people who can devote time to collaborate on the project to make the following realistic over the next cycle and culminate in the following demo in 6 months:
Components
portable virtio backend library
- source based library (so you include directly in your project)
- liberally licensed (Apache? to facilitate above)
- tailored for non-POSIX, limited resource setups
- e.g. avoid malloc/free, provide abstraction hooks where needed
- not assume OS facilities (so embeddable in RTOS or Unikernel)
- static virtio configuration supplied from outside library (location of queues etc)
- hypervisor agnostic
- provide a hook to signal when queues need processing
Following on from a discussion with Vincent and Akashi-san last week we need to think more about how to make this hypervisor agnostic. There will always be some code that has to live outside the library but if it ends up being the same amount again what have we gained?
<snip>
The server would be *build* time selectable to deploy either in a Jailhouse partition or a Xen Dom U deployment. Either way there will need to be an out-of-band communication of location of virtual queues and memory maps - I assume via DTB.
From our discussion last week Zephry's DTB support is very limited and not designed to cope with dynamic setups. So rather than having two build time selectable configurations we should opt for a single binary with a fixed expectation of where the remote guests memory and virtqueues will exist in it's memory space.
I'm still a bit skeptical about "single binary with a fixed expectation" concept.
- Is it feasible to expect that all the hypervisors would configure a BE domain in the same way (in respect of assigning a memory region or an interrupt number)?
I think the configuration mechanism will be different but surely it's possible to give the same guest view to the BE from any hypervisor.
- what if we want a BE domain to
at the same time?
- provide different type of virtio devices
- support more than one frontend demains
That is certainly out of scope for this proposed demo which is a single statically configured device servicing a single frontend domain.
How can a single binary without ability of dynamic configuration deal with those requirements?
I don't think it can. For complex toplogies of devices and backends I think you will need a degree of flexibility so while layouts will be static on the device the components will need be flexible/portable. This isn't really a topic we've explored in detail yet but would be further work under STR-10 (hypervisor boot orchestration).
I'd like to see a 'big picture' of system/device configuration.
It will then be up to the VMM setting things up to ensure everything is mapped in the appropriate place in the RTOS memory map. There would also be a fixed IRQ map for signalling when things are changes to the RTOS and a fixed doorbell for signalling the other way.
We should think of two different phases:
- virtio setup/negotiation via MMIO configuration space
There is a stage before this which is knowing there is a MMIO device in the first place (on PCI this is simplified a little by the PCI probe).
- (virtio device specific) operation via virtqueue
Anyway, signaling mechanism can be different from hypervisor to hypervisor; On Xen, for example,
- the notification of MMIO's to the configuration space by FE will be trapped and delivered via an event channel + dedicated "shared IO
page"
If we want to keep hypervisor specifics out of the BE can't the VMM or equivalent then trigger an IRQ in the BE domain as a result of the signal?
For MMIO configuraiton, it's not enough. BE needs to know details of MMMIO requests:
- address (or offset) in the configuration space
- IO size (mostly 4bytes)
- type of access (read or write)
- value (in case of write)
On Xen, those information is exposed to BE through a dedicated page ("shared IO page"). So when BE is notified of such an event via an event channel, BE is expected to access to that page. Once BE has recognized and completed a MMIO request, it will issue another event channel to notify FE. The emulation mechanism is quite hypervisor specific. How can we handle that with a single binary?
Is there a halfway house - can we keep as much of the hypervisor specifics in the host side and just bring in the minimal needed hypervisor interface for each hypervisor. It could be runtime selectable or it sounds like we need to fall back to a build time selectable interface.
What I'm trying to avoid though is a binary that has more hypervisor specific code in it the generic virtio backend handling code.