On Fri, 11 Feb 2022, Alex Bennée wrote:
FYI, a good and promising approach to handle both SCMI and SCPI is the series recently submitted by EPAM to mediate SCMI and SCPI requests in Xen: https://marc.info/?l=xen-devel&m=163947444032590
(Another "special" virtio backend is virtio-iommu for similar reasons: the guest p2m address mappings and also the IOMMU drivers are in Xen. It is not immediately clear whether a virtio-iommu backend would need to be in Xen or run as a process in dom0/domU.)
On the other hand, for all the other "normal" protocols (e.g. virtio-net, virtio-block, etc.) the backend would naturally run as a process in dom0 or domU (e.g. QEMU in Dom0) as one would expect.
Can domU's not be given particular access to HW they might want to tweak? I assume at some point a block device backend needs to actually talk to real HW to store the blocks (even if in most cases it would be a kernel doing the HW access on it's behalf).
Yes, it would. Block and network are subsystems with limited visibility, access, and harmful capabilities (assuming IOMMU).
If the block device goes down or is misused, block might not work but everything else is expected to work. Block only requires visibility of the block device for it to work. The same is true for network, GPU, USB, etc.
SCMI is different. If SCMI is misused the whole platform is affected. SCMI implies visibility of everything in the system. It is not much about emulating SCMI but more about mediating SCMI calls.
In other words, SCMI is not a device, it is a core interface. In a Xen model, Xen virtualizes CPU and memory and other core features/interfaces (timers, interrupt controller, IOMMU, etc). The PCI root complex is handled by Xen too. Individual (PCI and non-PCI) devices are assigned to guests.
These are the reasons why I think the best way to enable SCMI in upstream Xen is with a mediator in the hypervisor as it is currently in development. Any chances you could combine your efforts with EPAM's outstanding series? You might be able to spot gaps if any, and might even have already code to fill those gaps. It would be fantastic to have your reviews and/or contributions on xen-devel.
Otherwise, if you have to run the virtio-scmi backend in userspace, why not try to get it to work on Xen :-) It might not be the ideal solution, but it could be a good learning experience and pave the way for the other virtio backends which definitely will be in userspace (virtio-block, virtio-gpu, etc).
Currently the demo setup is intermediated by a double-ended vhost-user daemon running on the devbox acting as a go between a number of QEMU instances representing the front and back-ends. You can view the architecture with Vincents diagram here:
https://docs.google.com/drawings/d/1YSuJUSjEdTi2oEUq4oG4A9pBKSEJTAp6hhcHKKhm...
The key virtq handling is done over the special carve outs of shared memory between the front end and guest. However the signalling is currently over a virtio device on the backend. This is useful for the PoC but obviously in a real system we don't have a hidden POSIX system acting as a go between not to mention the additional latency it causes with all those context switches.
I was hoping we could get some more of the Xen experts to the next Stratos sync (17th Feb) to go over approaches for a properly hosted on Xen approach. From my recollection (Vincent please correct me if I'm wrong) of last week the issues that need solving are:
Unfortunately I have a regular conflict which prevents me from being able to join the Stratos calls. However, I can certainly make myself available for one call (unless something unexpected comes up).
- How to handle configuration steps as FE guests come up
The SCMI server will be a long running persistent backend because it is managing real HW resources. However the guests may be ephemeral (or just restarted) so we can't just hard-code everything in a DTB. While the virtio-negotiation in the config space covers most things we still need information like where in the guests address space the shared memory lives and at what offset into that the queues are created. As far as I'm aware the canonical source of domain information is XenStore (https://wiki.xenproject.org/wiki/XenStore) but this relies on a Dom0 type approach. Is there an alternative for dom0less systems or do we need a dom0-light approach, for example using STR-21 (Ensure Zephyr can run cleanly as a Dom0 guest) providing just enough services for FE's to register metadata and BE's to read it?
I'll try to answer the question for a generic virtio frontend and backend instead (not SCMI because SCMI is unique due to the reasons above.)
Yes, xenstore is the easiest way to exchange configuration information between domains. I think EPAM used xenstore to exchange the configuration information in their virtio-block demo. There is a way to use xenstore even between dom0less VMs: https://marc.info/?l=xen-devel&m=164340547602391 Not just xenstore but full PV drivers too. However, in the dom0less case xenstore is going to become available some time after boot, not immediately at startup time. That's because you need to wait until xenstored is up and running.
There are other ways to send data from one VM to another which are available immediately at boot, such as Argo and static shared memory.
But dom0less is all about static partitioning, so it makes sense to exploit the build-time tools to the fullest. In the dom0less case, we already know what is going to run on the target before it is even turned on. As an example, we might have already prepared an environment with 3 VMs using Yocto and ImageBuilder. We could also generate all configurations needed and place them inside each VMs using Yocto's standard tools and ImageBuilder. So for dom0less, I recommend to go via a different route and pre-generate the configuration directly where needed instead of doing dynamic discovery.
Even in a full dom0less setup you still need to manage lifetimes somehow if a guest reboots.
Sure but that's not a problem: all the info and configuration related to rebooting the guest can also be pre-generated in Yocto or ImageBuilder.
As an example, it is already possible (although rudimental) in ImageBuilder to generate the dom0less configuration and also the domU xl config file for the same domU with passthrough devices.