Hi Jonathan,
On 21/10/2024 09:54, Jonathan Cameron via Linaro-open-discussions wrote:
I'm checking the availability of people for the MPAM topic.
In the meantime one other topic might be good to discuss briefly.
Flush by PA range - need for a mini subsystem...
This is needed for CXL and we've jumped through various possible ways to handle it.
- PSCI call (not currently in spec) Doesn't work well as 'only' solution as some hardware provides direct memory mapped IO interfaces for this.
I have the code for this - but have been unable to (re)test it. I really want to kill the stop_machine() 'rendevzevous' in the kernel cases, but equally the firmware people don't want to do it in firmware!
- ACPI wrapping of both PSCI call and memory mapped. Unfortunately you can wrap SCMI (via a PCC channel) but not (as I understand it SMCCC) so I don't think an ACPI wrapper in general works.
I thought the opposite - the FFH "Operation Region" allows SMC calls, and linux supports this via acpi_ffh_address_space_arch_handler(). Its restricted to a few SMC ranges - but we should be able to do this in the 'standard' space as I bet a hypervisor is going to have to emulate this one day in the future... (not touching CXL hardware directly - but providing an encryption key to unlock a device)
I was thinking of a device that is all firmware in the DSDT that has _DSM to do this - and a driver to poke it in linux.
- Current option: Face up to the fact we need to just do this kernel first with drivers for the various hardware that surfaces. We aren't particularly far into a design yet so maybe email is fine for now if people are busy.
For this I'm thinking a really small subsystem / class. Lots of design options but one might be
- Driver registers with class and provides PA ranges for which it needs to be notified (simpler option is it gets notified of everything) Class is there for userspace to be able to see what is involved...
- Notification chain follows similar design to mmu notifiers, just on physical addresses and global rather than tied to each mm.
Requirements for design:
- Multiple entities may need to be told to flush for a given range (interleaving etc may be going on).
- Flush should be range. Any hardware that doesn't support range will need to flush everything.
This needs advertising to the OS to avoid repeatedly flushing everything!
- Support rich set of flushes. Drivers may have to upgrade the lighter ones to what they support (cache coherency protocols all allow this so should be technically fine to do rather than reject the flush)
Is this just clean/clean+invalidated, or is there another option?
Other things I've forgotten?
Asynchronous completion? We may want to do some other work while the hardware does its thing.
In particular I'd like any insights people have on plausible hardware designs that this approach might not work for.
When thrashing out the PSCI spec proposal, the "don't build that" corner case we had was "system caches" after the PoC that lack a mechanism to clean+invalidate while running, and can't be power-cycled to reset them if they share a power-domain with something important. Even by-VA clean+invalidate would be insufficient on such a platform...
(Is it plausible? I hope not ...)
A sticky corner is how to know that all drivers for architecturally necessary flushes are present. Easy to check if there is one covering the range, not so easy to check if one instance of an interleaves solution is missing. We can probably solve this at the individual driver level, but it's ugly.
Wouldn't this 'just' need to cover the CXL.Mem decoders address windows?
Thanks,
James