Hi Hesham,
On 12/01/2023 10:34, Hesham Almatary wrote:
Thanks for getting back to me on this. I have done some changes to my ACPI tables and got your code working fine without this patch. In particular, I matched the proximity domain field with the NUMA node ID for each memory controller. If they differ, the code won't work (as it has the assumption that the proximity domain is the same as NUMA id,
Right, if there is an extra level of indirection in there, its something I wasn't aware of. I'll need to dig into it. I agree this explains what you were seeing.
from which the affinity/accessibility is set). This leaves me with a few questions regarding the design and implementation of the driver. I'd appreciate your input on that.
- What does a memory MSC correspond to? A class (with a unique ID) or a
component? From the code, it seems like it maps to a component to me.
An MSC is the device, it has registers and generates interrupts. If its part of your memory controller, it gets described like that in the ACPI tables, which lets linux guess that this MSC (or the RIS within it) control some policy in the memory controller.
Components exist to group devices that should be configured the same. This happens where designs are sliced up, but this slicing makes no sense to the software. Classes are a group of components that do the same thing, but not to the same resource. e.g. they control memory controllers.
The ACPI tables should describe the MSC, its up to the driver to build the class and component structures from what it can infer from the other ACPI tables.
- Could we have a use case in which we have different class IDs with
the same class type? If yes could you please give an example?
Your L2 and L3 are both caches, but use the level number as the id. I doubt anyone builds a system with MSC on both, but its possible by the architecture, and we could expose both via resctrl.
- What should a component ID for a memory MSC be/represent? The code
assumes it's a (NUMA?) node ID.
The component-ids are some number that makes sense to linux, and matches something in the ACPI tables. These are exposed via the schema file to user-space. For the caches, its the cache-id property from the PPTT table. This is exposed to user-space via /sys/devices/system/cpu/cpu0/cache/index3/id or equivalent.
Its important that user-space can work out which CPUs share a component/domain in the schema. Using a sensible id is the pre-requisite for that.
Intel's memory bandwidth control appears to be implemented on the L3, so they re-use the id of the L3 cache. These seem to correspond to NUMA nodes already.
For MPAM - we have no idea if the memory controllers map 1:1 with any level in the cache. Instead, the driver expects to use the numa node number directly.
(I'll put this on the list of KNOWN_ISSUES, the Intel side of this ought to be cleaned up so it doesn't break if they build a SoC where L3 doesn't map 1:1 with Numa nodes. It looks like they are getting away with it because Atom doesn't support L3 or memory bandwidth)
- What should a class ID represent for a memory MSC? Which is different
from the class type itself.
The class id is private to the driver, for the caches it needs to be the cache level. Because of that memory is shoved at the end, on the assumption no-one has an L255 cache, and 'unknown' devices are shoved at the beginning... L0 caches probably do exist, but I doubt anyone would add an MSC to them.
Classes can't be arbitrarily created, as the resctrl picking code needs to know how they map to resctrl schemas, as we can't invent new schemas without messing up user-space.
- How would 4 memory MSCs (with different proximity domains) map to
classes and components?
Each MSC would be a device. There would be one device per component, because each proximity domain is different. They would all be the same class, as you'd described them all with a memory type in the ACPI tables.
If you see a problem with this, let me know! The folk who write the ACPI specs didn't find any systems where this would lead to problems... that doesn't mean you haven't build something that looks quite different.
- How would 2 Memory MSCs with(in) the same proximity domain and/or
same NUMA node work, if at all?
If you build this, I bet your hardware people say those two MSC must be programmed the same for the regulation to work. (if not - how is software expected to understand the hashing scheme used to map physical-addresses to memory controllers?!)
Each MSC would be a device. They would both be part of the same component as they have the same proximity domain.
Configuration is applied to the component, so each device/MSC within the component is always configured the same.
- Should the ACPI/MPAM MSC's "identifier" field be mapped to class IDs
or component IDs at all?
Classes, no - these are just for the driver to keep track of the groups. Components, probably ... but another number may make more sense. This should line up with something that is already exposed to user-space via sysfs.
Thanks,
James