On Tue, 20 Oct 2020, AKASHI Takahiro via Stratos-dev wrote:
> On Tue, Oct 20, 2020 at 10:52:08AM +0000, Fran??ois Ozog via Stratos-dev wrote:
> > To fuel discussion on inter-VM communication over virtio...
> > I think virtio end-points may be in normal world, secure world (Secure
> > Media Path?) or "real time" world (RTOS on cortex-M).
> > I believe the FF-A backend for the topic was also a key element of the
> > discussion (Is it correct Azzedine?)
>
> FYI,
> I'm currently trying to run zephyr in a guest vm on Xen/qemu(arm64) and
> now confirmed that a simple application, samples/hello_world,
> successfully ran though we need a couple of tweaks :)
>
> qemu-arm64 -> TF-A -> U-Boot -> grub -> Xen(4.15+) -> Debian testing
> -> U-Boot(!, domU) -> Zephyr/hello_world
>
> I started this task just for myself to learn more about Xen,
> but if people are interested in it (for demo, evaluation or testing),
> I'm keen to continue to work on this.
Would you be up for writing a new wiki page on wiki.xenproject.org
(linking the page from
https://wiki.xenproject.org/wiki/Xen_ARM_with_Virtualization_Extensions)
on how to run Zephyr on Xen as domU? That would be very helpful!
Cheers,
Stefano
On Fri, 16 Oct 2020, Alex Bennée via Stratos-dev wrote:
> Masami Hiramatsu <masami.hiramatsu(a)linaro.org> writes:
>
> > Hi Alex,
> >
> > 2020年10月16日(金) 2:01 Alex Bennée <alex.bennee(a)linaro.org>:
> >>
> >>
> >> Masami Hiramatsu <masami.hiramatsu(a)linaro.org> writes:
> >>
> >> > Hi,
> >> >
> >> > I've succeeded to make the X.org running on Dom0.
> >> > It seems that Xorg's nouveau driver caused SIGBUS issue. Custom
> >> > nouveau kernel driver + Xorg fbdev driver seems stable.
> >> > (Even if it doesn't work again, I'll try to use USB-HDMI adaptor next time)
> >> >
> >> > So, I would like to test the virtio-video for the next step.
> >> > Alex, how can I help you to test it?
> >>
> >> In one window you need the vhost-user gpu daemon:
> >>
> >> ./vhost-user-gpu --socket-path=vgpu.sock -v
> >
> > Hmm, I couldn't find vhost-user-gpu (I've installed xen tools under
> > /usr/local, but I can not find vhost* tools)
>
> The vhost-user-gpu tool is part of the QEMU source tree (contrib/vhost-user-gpu).
>
> >>
> >> and then on the QEMU command line you need the memory sharing and the
> >> socket connection as well as the device:
> >>
> >> $QEMU $ARGS \
> >> -object memory-backend-file,id=mem,size=4G,mem-path=/dev/shm,share=on \
> >> -numa node,memdev=mem \
> >> -chardev socket,path=vgpu.sock,id=vgpu \
> >> -device vhost-user-gpu-pci,chardev=vgpu
> >
> > I'm using xl command ( xl create -dc CONFIGFILE ) to boot up guest
> > domain. Can I boot it up via qemu too?
>
> Hmm so this is where we might need some extra tooling. I'm not sure how
> QEMU gets invoked by the xl tooling but QEMU for Xen is a fairly
> different beast to the normal hypervisor interaction as rather than
> handling vmexits from the hypervisor it just gets commands via the Xen
> control interface to service emulation and I/O requests.
>
> The above QEMU commands ensure that:
>
> - the guest memory space is shared wit the vhost-user-gpu daemon
> - a control path is wired up so vhost-user messages can be sent
> during setup (inititalise the device etc)
> - the same socket path is fed eventfd messages as the guest triggers
> virtio events. These can all come from QEMU if the kernel isn't
> translating mmio access to the virtqueues to eventfd events.
>
> Steffano, how does the XL tooling invoke QEMU and can the command line
> be modified?
Yes, it can, in a couple of different ways. You can add arguments to the
QEMU command line by specifying:
device_model_args=["arg1", "arg2", "etc."]
in the vm config file. You can also choose a different QEMU binary to
run with:
device_model_override="/path/to/your/qemu"
You can use it to have your own qemu script that adds additional
arguments before calling the actual qemu binary (by default is
/usr/lib/xen/bin/qemu-system-i386 even on ARM.)
On Tue, Oct 20, 2020 at 10:52:08AM +0000, Fran??ois Ozog via Stratos-dev wrote:
> To fuel discussion on inter-VM communication over virtio...
> I think virtio end-points may be in normal world, secure world (Secure
> Media Path?) or "real time" world (RTOS on cortex-M).
> I believe the FF-A backend for the topic was also a key element of the
> discussion (Is it correct Azzedine?)
FYI,
I'm currently trying to run zephyr in a guest vm on Xen/qemu(arm64) and
now confirmed that a simple application, samples/hello_world,
successfully ran though we need a couple of tweaks :)
qemu-arm64 -> TF-A -> U-Boot -> grub -> Xen(4.15+) -> Debian testing
-> U-Boot(!, domU) -> Zephyr/hello_world
I started this task just for myself to learn more about Xen,
but if people are interested in it (for demo, evaluation or testing),
I'm keen to continue to work on this.
Thanks,
-Takahiro Akashi
> Cheers
>
> FF
> ---------- Forwarded message ---------
> From: Joao Alves via Hafnium <hafnium(a)lists.trustedfirmware.org>
> Date: Tue, 20 Oct 2020 at 12:18
> Subject: [Hafnium] FFA Memory Management Interfaces
> To: hafnium(a)lists.trustedfirmware.org <hafnium(a)lists.trustedfirmware.org>,
> Andrew Walbran <qwandor(a)google.com>
>
>
> Hello All,
>
> We have been working on enabling the use memory management interfaces
> between SWd and NWd, on Hafnium as SPMC. We identified a few points that
> would benefit from a general discussion/clarification through the Mailing
> List for this work to progress on time for the first release.
>
> The whole memory has been mapped to a VM representative of the NWd FFA
> endpoints (in the code can be found as a global variable called
> 'other_world_vm') in this patch<
> https://review.trustedfirmware.org/c/hafnium/hafnium/+/6008>.
> All the memory is configured as RW memory, to be permissive enough to ease
> the validation of the sender's original permissions vs the requested
> sharing permissions.
> This approach seems valid given the following assumptions: the
> 'other_world_vm' is never scheduled, which means that there is no risk of
> memory leakage; the hypervisor should do the necessary validation of the
> permissions that a given NWd FFA endpoint has on a given memory region,
> before forwarding the handling of the operation to the SPMC.
>
> The first point to bring to discussion is regarding the RW configuration.
> Is this really the best configuration we would expect for memory to be
> shared from NWd to the SWd? Or is this excluding any use-case we are not
> aware of (e.g. at this point we are restricting instruction permissions to
> NX)?
>
> Whilst testing the memory management interfaces, I noticed that if the
> instruction access on the memory region descriptor (in the spec has been
> described in section 5.12, and in the code defined as 'struct
> ffa_memory_region') is set as 'not specified' for the Lend and Donate
> operations, the receiver would acquire permission to execute the memory
> region. This is due to the implementation of the function
> 'ffa_memory_permissions_to_mode<
> https://git.trustedfirmware.org/hafnium/hafnium.git/tree/src/ffa_memory.c#n…
> ><
> https://git.trustedfirmware.org/hafnium/hafnium.git/tree/src/ffa_memory.c#n…>',
> which sets the memory mode to MM_MODE_X for case instruction access
> FFA_INSTRUCTION_ACCESS_NOT_SPECIFIED. My understanding is that in such case
> MM_MODE_X shouldn't be set. Please check this change<
> https://review.trustedfirmware.org/c/hafnium/hafnium/+/6008>.
> @Andrew Walbran<mailto:qwandor@google.com>, is the change I provided
> correct? Or is there a reason for the current implementation of
> 'ffa_memory_permissions_to_mode'?
>
> I also noticed that after a memory reclaim, the sender/owner of a memory
> region acquires RW and X permissions, even if these were not part of the
> original permissions when sending the memory region. This is due to
> definition of the variable 'memory_to_attributes' in the function
> 'ffa_memory_reclaim<
> https://git.trustedfirmware.org/hafnium/hafnium.git/tree/src/ffa_memory.c#n…
> >':
> uint32_t memory_to_attributes = MM_MODE_R | MM_MODE_W | MM_MODE_X;
> Down the call stack, the value of this variable is used to update the
> variable 'to_mode' in function 'ffa_retrieve_check_transition<
> https://git.trustedfirmware.org/hafnium/hafnium.git/tree/src/ffa_memory.c#n…>'.
> The value of variable 'to_mode' is later passed to
> 'ffa_region_group_identity_map'<
> https://git.trustedfirmware.org/hafnium/hafnium.git/tree/src/ffa_memory.c#n…>
> to update the owner's memory mapping. My understanding is that the sender's
> memory mode should be reset to the mode it had prior to the original memory
> send operation, and that this is going to be a problem for memory
> management operations (even between between VMs and between SPs). I have
> seen this behavior on my set, however I would like to validate my analysis.
> @Andrew Walbran<mailto:qwandor@google.com> would you be able to provide any
> comments on this?
>
> Please let me know if anyone has any comments/questions.
>
> Best regards,
> João Alves
>
>
>
>
>
>
>
>
>
> --
> Hafnium mailing list
> Hafnium(a)lists.trustedfirmware.org
> https://lists.trustedfirmware.org/mailman/listinfo/hafnium
>
>
> --
> François-Frédéric Ozog | *Director Linaro Edge & Fog Computing Group*
> T: +33.67221.6485
> francois.ozog(a)linaro.org | Skype: ffozog
> --
> Stratos-dev mailing list
> Stratos-dev(a)op-lists.linaro.org
> https://op-lists.linaro.org/mailman/listinfo/stratos-dev
To fuel discussion on inter-VM communication over virtio...
I think virtio end-points may be in normal world, secure world (Secure
Media Path?) or "real time" world (RTOS on cortex-M).
I believe the FF-A backend for the topic was also a key element of the
discussion (Is it correct Azzedine?)
Cheers
FF
---------- Forwarded message ---------
From: Joao Alves via Hafnium <hafnium(a)lists.trustedfirmware.org>
Date: Tue, 20 Oct 2020 at 12:18
Subject: [Hafnium] FFA Memory Management Interfaces
To: hafnium(a)lists.trustedfirmware.org <hafnium(a)lists.trustedfirmware.org>,
Andrew Walbran <qwandor(a)google.com>
Hello All,
We have been working on enabling the use memory management interfaces
between SWd and NWd, on Hafnium as SPMC. We identified a few points that
would benefit from a general discussion/clarification through the Mailing
List for this work to progress on time for the first release.
The whole memory has been mapped to a VM representative of the NWd FFA
endpoints (in the code can be found as a global variable called
'other_world_vm') in this patch<
https://review.trustedfirmware.org/c/hafnium/hafnium/+/6008>.
All the memory is configured as RW memory, to be permissive enough to ease
the validation of the sender's original permissions vs the requested
sharing permissions.
This approach seems valid given the following assumptions: the
'other_world_vm' is never scheduled, which means that there is no risk of
memory leakage; the hypervisor should do the necessary validation of the
permissions that a given NWd FFA endpoint has on a given memory region,
before forwarding the handling of the operation to the SPMC.
The first point to bring to discussion is regarding the RW configuration.
Is this really the best configuration we would expect for memory to be
shared from NWd to the SWd? Or is this excluding any use-case we are not
aware of (e.g. at this point we are restricting instruction permissions to
NX)?
Whilst testing the memory management interfaces, I noticed that if the
instruction access on the memory region descriptor (in the spec has been
described in section 5.12, and in the code defined as 'struct
ffa_memory_region') is set as 'not specified' for the Lend and Donate
operations, the receiver would acquire permission to execute the memory
region. This is due to the implementation of the function
'ffa_memory_permissions_to_mode<
https://git.trustedfirmware.org/hafnium/hafnium.git/tree/src/ffa_memory.c#n…
><
https://git.trustedfirmware.org/hafnium/hafnium.git/tree/src/ffa_memory.c#n…>',
which sets the memory mode to MM_MODE_X for case instruction access
FFA_INSTRUCTION_ACCESS_NOT_SPECIFIED. My understanding is that in such case
MM_MODE_X shouldn't be set. Please check this change<
https://review.trustedfirmware.org/c/hafnium/hafnium/+/6008>.
@Andrew Walbran<mailto:qwandor@google.com>, is the change I provided
correct? Or is there a reason for the current implementation of
'ffa_memory_permissions_to_mode'?
I also noticed that after a memory reclaim, the sender/owner of a memory
region acquires RW and X permissions, even if these were not part of the
original permissions when sending the memory region. This is due to
definition of the variable 'memory_to_attributes' in the function
'ffa_memory_reclaim<
https://git.trustedfirmware.org/hafnium/hafnium.git/tree/src/ffa_memory.c#n…
>':
uint32_t memory_to_attributes = MM_MODE_R | MM_MODE_W | MM_MODE_X;
Down the call stack, the value of this variable is used to update the
variable 'to_mode' in function 'ffa_retrieve_check_transition<
https://git.trustedfirmware.org/hafnium/hafnium.git/tree/src/ffa_memory.c#n…>'.
The value of variable 'to_mode' is later passed to
'ffa_region_group_identity_map'<
https://git.trustedfirmware.org/hafnium/hafnium.git/tree/src/ffa_memory.c#n…>
to update the owner's memory mapping. My understanding is that the sender's
memory mode should be reset to the mode it had prior to the original memory
send operation, and that this is going to be a problem for memory
management operations (even between between VMs and between SPs). I have
seen this behavior on my set, however I would like to validate my analysis.
@Andrew Walbran<mailto:qwandor@google.com> would you be able to provide any
comments on this?
Please let me know if anyone has any comments/questions.
Best regards,
João Alves
--
Hafnium mailing list
Hafnium(a)lists.trustedfirmware.org
https://lists.trustedfirmware.org/mailman/listinfo/hafnium
--
François-Frédéric Ozog | *Director Linaro Edge & Fog Computing Group*
T: +33.67221.6485
francois.ozog(a)linaro.org | Skype: ffozog
Hi
Today/tommorow is NXP Connects, there are many sessions relevant to Stratos
but I just watch this one:
https://experience.nxp.com/connects/session/225/225
And it fuels the discussion of AGL virtualization.
Cheers
FF
--
François-Frédéric Ozog | *Director Linaro Edge & Fog Computing Group*
T: +33.67221.6485
francois.ozog(a)linaro.org | Skype: ffozog
On Fri, 16 Oct 2020, Arnd Bergmann via Stratos-dev wrote:
> On Fri, Oct 16, 2020 at 9:19 AM Jean-Philippe Brucker
> <jean-philippe(a)linaro.org> wrote:
> > On Fri, Oct 02, 2020 at 04:26:10PM +0200, Arnd Bergmann wrote:
> > > > At the moment the bounce buffer is allocated from a global pool in the low
> > > > physical pages. However a recent proposal by Chromium would add support
> > > > for per-device swiotlb pools:
> > > >
> > > > https://lore.kernel.org/linux-iommu/20200728050140.996974-1-tientzu@chromiu…
> > > >
> > > > And quoting Tomasz from the discussion on patch 4:
> > > >
> > > > For this, I'd like to propose a "restricted-dma-region" (feel free
> > > > to suggest a better name) binding, which is explicitly specified
> > > > to be the only DMA-able memory for this device and make Linux use
> > > > the given pool for coherent DMA allocations and bouncing
> > > > non-coherent DMA.
> > >
> > > Right, I think this can work, but there are very substantial downsides to it:
> > >
> > > - it is a fairly substantial departure from the virtio specification, which
> > > defines that transfers can be made to any part of the guest physical
> > > address space
> >
> > Coming back to this point, that was originally true but prevented
> > implementing hardware virtio devices or putting a vIOMMU in front of the
> > device. The spec now defines feature bit VIRTIO_F_ACCESS_PLATFORM
> > (previously called VIRTIO_F_IOMMU_PLATFORM):
> >
> > A device SHOULD offer VIRTIO_F_ACCESS_PLATFORM if its access to memory
> > is through bus addresses distinct from and translated by the platform to
> > physical addresses used by the driver, and/or if it can only access
> > certain memory addresses with said access specified and/or granted by
> > the platform. A device MAY fail to operate further if
> > VIRTIO_F_ACCESS_PLATFORM is not accepted.
> >
> > With this the driver has to follow the DMA layout given by the platform,
> > in our case given by the device tree.
>
> Ok, got it.
>
> > Another point about solution #1: since the backend (secondary VM) accesses
> > virtqueue directly (we probably want to avoid a scatter-gather translation
> > step in the backend) the pointers written by the frontent in the virtqueue
> > have to be guest-physical addresses of the backend. So the static memory
> > region needs to have identical guest-physical address in primary and
> > secondary VM. In practice I doubt this would be a problem.
>
> I believe this is different from what Srivatsa explained in yesterday's call
> about Qualcomm's current work, which makes all pointer values relative
> to the start of the shared memory area, making them relocatable between
> frontend and backend guest physical address spaces.
Of course, as long as the backend is aware of the start address of the
shared buffer at the frontend side, it can very easily add/subtract any
offsets as required.
Disambiguation:
- frontend: Linux virtio drivers, drivers/net/virtio_net.c
- backend: QEMU
Hi,
I've looked in more details at limited memory sharing (STR-6, STR-8,
STR-15), mainly from the Linux guest perspective. Here are a few thoughts.
Problem
-------
We have a primary VM running a guest, and a secondary one running a
backend that manages one hardware resource (for example network access).
They communicate with virtio (for example virtio-net). The guest
implements a virtio driver, the backend a virtio device. Problem is, how
to ensure that the backend and guest only share the memory required for
the virtio communication, and that the backend cannot access any other
memory from the guest?
Static shared region
--------------------
Let's first look at static DMA regions. The hypervisor allocates a subset
of memory to be shared between guest and backend. The hypervisor
communicates this per-device DMA restriction to the guest during boot. It
could be using a firmware property, or a discovery protocol. I did start
drafting such a protocol in virtio-iommu, but I now think the
reserved-memory mechanism in device-tree, below, is preferable. Would we
need an equivalent for ACPI, though?
How would we implement this in a Linux guest? The virtqueue of a virtio
device has two components. Static ring buffers, allocated at boot with
dma_alloc_coherent(), and the actual data payload, mapped with
dma_map_page() and dma_map_single(). Linux calls the former "coherent"
DMA, and the latter "streaming" DMA.
Coherent DMA can already obtain its pages from a per-device memory pool.
dma_init_coherent_memory() defines a range of physical memory usable by a
device. Importantly this region has to be distinct from system memory RAM
and reserved by the platform. It is mapped non-cacheable. If it exists,
dma_alloc_coherent() will only get its pages from that region.
On the other hand streaming DMA doesn't allocate memory. The virtio
drivers don't control where that memory comes from, since the pages are
fed to them by an upper layer of the subsystem, specific to the device
type (net, block, video, etc). Often they are pages from the slab cache
that contain other unrelated objects, a notorious problem for DMA
isolation. If the page is not accessible by the device, swiotlb_map()
allocates a bounce buffer somewhere more convenient and copies the data
when needed.
At the moment the bounce buffer is allocated from a global pool in the low
physical pages. However a recent proposal by Chromium would add support
for per-device swiotlb pools:
https://lore.kernel.org/linux-iommu/20200728050140.996974-1-tientzu@chromiu…
And quoting Tomasz from the discussion on patch 4:
For this, I'd like to propose a "restricted-dma-region" (feel free
to suggest a better name) binding, which is explicitly specified
to be the only DMA-able memory for this device and make Linux use
the given pool for coherent DMA allocations and bouncing
non-coherent DMA.
That seems to be precisely what we need. Even when using the virtio-pci
transport, it is still possible to define per-device properties in the
device-tree, for example:
/* PCI root complex node */
pcie@10000000 {
compatible = "pci-host-ecam-generic";
/* Add properties to endpoint with BDF 00:01.0 */
ep@0008 {
reg = <0x00000800 0 0 0 0>;
restricted-dma-region = <&dma_region_1>;
};
};
reserved-memory {
/* Define 64MB reserved region at address 0x50400000 */
dma_region_1: restricted_dma_region {
reg = <0x50400000 0x4000000>;
};
};
Dynamic regions
---------------
In a previous discussion [1], several people suggested using a vIOMMU to
dynamically update the mappings rather than statically setting a memory
region usable by the backend. I believe that approach is still worth
considering because it satisfies the security requirement and doesn't
necessarily have worse performance. There is a trade-off between bounce
buffers on one hand, and map notifications on the other.
The problem with static regions is that all of the traffic will require
copying. Sub-page payloads will need bounce buffering anyway, for proper
isolation. But for large payloads bounce buffering might be prohibitive,
and using a virtual IOMMU might actually be more efficient. Instead of
copying large buffers the guest would send a MAP request to the
hypervisor, which would then map the pages into the backend. Despite
taking plenty of cycles for context switching and setting up the maps, it
might be less costly than copying.
Since it depends on the device, I guess we'll need a survey of memory
access patterns by the different virtio devices that we're considering.
In the end a mix of both solutions might be necessary.
Thanks,
Jean
[1] https://lists.oasis-open.org/archives/virtio-dev/202006/msg00037.html
Alex Bennée via Stratos-dev <stratos-dev(a)op-lists.linaro.org> writes:
> Hi Ulf,
>
> Ilias told me you were the eMMC guru so your might be best placed to
> advise on this problem.
>
Ping?
<snip>
> However I'm wary of adding an open ended pass-through definition,
> especially anything that might tempt an implementer to start trying to
> read and write data using eMMC commands instead of the proper virtio
> commands.
>
> - What eMMC commands are needed for a probe?
> - What are the bounds of frame sizes for those commands?
> - I'm currently overloading the wasted stuff[196] bytes to the
> encapsulation and it would get complicated if we extended into the
> used fields.
> - Should we mandate certain responses?
> - e.g. 0 "normal" size, eMMC version 4.1 etc
>
> Thanks,
>
> --
> Alex Bennée
--
Alex Bennée
Hi,
2020年10月16日(金) 15:10 Masami Hiramatsu via Stratos-dev
<stratos-dev(a)op-lists.linaro.org>:
>
> Hi Alex,
>
> 2020年10月16日(金) 2:01 Alex Bennée <alex.bennee(a)linaro.org>:
> >
> >
> > Masami Hiramatsu <masami.hiramatsu(a)linaro.org> writes:
> >
> > > Hi,
> > >
> > > I've succeeded to make the X.org running on Dom0.
> > > It seems that Xorg's nouveau driver caused SIGBUS issue. Custom
> > > nouveau kernel driver + Xorg fbdev driver seems stable.
> > > (Even if it doesn't work again, I'll try to use USB-HDMI adaptor next time)
BTW, the Xorg broke the screen again... I'll try with USB-HDMI again.
Also when I try to update the kernel from 5.9-rc4+ to 5.9, netsec ethernet
driver doesn't work. I'm trying to figure out the root cause by bisecting.
Thank you,
--
Masami Hiramatsu
Stefano Stabellini via Stratos-dev <stratos-dev(a)op-lists.linaro.org> writes:
> On Thu, 15 Oct 2020, Stefano Stabellini via Stratos-dev wrote:
>> On Thu, 15 Oct 2020, Masami Hiramatsu via Stratos-dev wrote:
>> > Hi,
>> >
>> > I've succeeded to make the X.org running on Dom0.
>> > It seems that Xorg's nouveau driver caused SIGBUS issue. Custom
>> > nouveau kernel driver + Xorg fbdev driver seems stable.
>> > (Even if it doesn't work again, I'll try to use USB-HDMI adaptor next time)
>> >
>> > So, I would like to test the virtio-video for the next step.
>> > Alex, how can I help you to test it?
>>
>> FYI in case it is helpful the last version of the patch series to enable
>> virtio (specifically virtio-mmio) in Xen by EPAM is here:
>>
>> https://marc.info/?l=xen-devel&m=159976941026226
>
> And it is funny because while I was sending this email, they sent a new
> version to xen-devel!
>
> https://marc.info/?l=xen-devel&m=160278030131796
I'll have a look. I wanted to check if I could test ACPI without a
firmware blob first but I guess I'll just stick to FDT for now.
I don't suppose there is an easy way of running the tools directly out
of the source tree? The Debian package seems to have some scripting to
deal with different versions but nothing that I could easily re-use to
point at a custom install directory.
--
Alex Bennée