Hi all,
This is a fusion of Mike's notes and my own. Please add anything I missed!
People may well be misidentified (sorry about that). Was very good active discussion.
Thanks to all involved and Mike in particular for organizing it and taking live notes.
General request
- Slides for all topics next time to introduce topics as not everyone on call will have necessary
background (and those that do might need reminding!)
Hanjun is sending Mike his slides (uncore DVFS) to add to the collaborate page.
IORT - Reserve memory regions (RMR)
===================================
* Shameer gave summary
- IORT Revision E (https://developer.arm.com/documentation/den0049/latest/) introduced new node type.
- A way to describe memory regions that should have unity mapping in the SMMU.
- Use case is a PCIe RAID card that has FW that uses a pool of host memory (hidden from OS).
* Status
- Patches out for ACPICA
- Question raised by ACPICA reviewers on whether spec is final
- Spec appears final (Lorenzo to check) but may be minor unrelated fix in doc to come (Sami).
- Patches out for kernel on relevant lists.
- Mail from Steven Price (Arm) - (Sami Mujawar who was on the call also involved) interested for EFI
framebuffer use case.
* Open questions
- Equivalent from AMD has flag to indicate that unity mapping only needed until driver has taken over
(end of kernel boot assumed). Avoids and issue of holes in address space for VMs.
- Huawei not raising this as a requirement, but Lorenzo observed interesting and deserves discussion.
- Kexec interaction needs discussions. Steve looking at this an will bring to list.
- Lorenzo brought up issue of IORT spec using PCI BDF (stream ID?) which may be reenumerated.
- Noted x86 doesn't do this but ARM traditionally does.
- There is a DSM that tells the kernel not to reenumerate the PCI bus which ACPI obeys.
- Jonathan suggestion was potentially opportunity to cache original stream ID before doing the
reenumeration in kernel.
- Lorenzo observed we may need a universal solution for all OSes on this.
Lorenzo took AI to go away and think about it before next call.
- Stalling issues on patch? Probably only Kexec though should be careful around possible future
regressions on the BDF issue (not a blocker)
- Related DT story. Huawei server team not interested as no DT support and can't test.
Lorenzo suggested looping in Thierry Reding and reference a patch set
(probably https://lore.kernel.org/linux-iommu/20200904130000.691933-1-thierry.reding@…)
Huawei more than happy to have others add the DT support :)
AI summary:
* Kexec discussion - on list.
* Use of BDF discussion - revisit here next time.
* DT alignment. Don't want different solutions for each firmware type.
* Lorenzo / Sami to check IORT revision E is final.
SVA
===
Zangfei gave summary:
- Huawei has devices that are not PCIe but are presented as such.
- They support stall mode for SVA (spec violation)
- Resistance from kernel maintainers to maintaining a white list for any quirk. Fine to fix
it once (JPB), but not to keep doing so.
- Note that stall mode not yet supported at all (JPB to send out this cycle).
- If longer term fix need add can't be done via PCISIG etc then need to convince
PCI and SMMU maintainers. Noted that quirk is very little code.
* Other SVA topics.
- Mentioned virtual SVA (no actually problems just expressing interest!)
- Would need Eric Auger, wasn't on topic list so Eric not on call.
AI: Nothing planned until after JPB has upstreamed stall mode. Hard to have discussion before that.
DVFS
====
guohanjun
Solutions exist for
* CPU DVFS (voltage + frequency scaling)
* PCIe device power states etc
No standard way of controlling Uncore voltage and frequency for ACPI based systems.
3 options:
1. MMIO / kernel driver.
2. PSCI via trusted firmware and system management controller.
3. ACPI (wrapping up an op region and SCMI)
Clarifications / discussions.
* Vincent G: Power states, or voltage frequency of interest? Ans Voltage Freq
* Considered SCMI? Ans: Works only for DT as SCMI under ACPI is wrapped up in AML
so looks like an ACPI interface.
* Sudeep H: Necessary to trace CPU freq? Yes.
* Sudeep H: Why not do it in firmware entirely? Ans. Not just CPU. For example PCI device accessing
memory may well need the ring bus to be fast.
* Vincent G: Bandwidth affected? Yes. VG: mobile does this by specifying a BW requirement (via SCMI.-
* Sudeep H Observed need to expose it via ACPI spec. (option 3 above).
* Sudeep H: Does PCI also need fine-grain control? We might need to add to the spec.
* Sudeep H: What are the requirements? gaohanjun: Now we just frequency scaling.
* Jonathan C: Noted PCI power state is not enough. It's workload dependent.
* Sudeep H: We need to gather all the info, need to talk in ASWG about DVFS
* Jonathan C: For now direct control probably makes sense. Whilst it would be nice to have
a detailed enough system description in a standard way to make general software that is a
big spec job.
* Jonathan C: Seems like true standard SW will not happen any time soon.
AI: RFC to the linux-pm / linux-acpi Rafael and those in this discussion to ask about
interest in adding per device DVFS to ACPI spec. Possibly pursue code first ACPI
approach.
If I've miss listed or "volunteered" anyone for AIs they didn't agree to then please
correct that.
Thanks all for contributions. I for one found it a very useful call!
Jonathan
On 2020/11/2 17:47, Mike Holmes via Linaro-open-discussions wrote:
> Topic: Linaro Open Discussions [1] - Kernel related
> Time: Nov 4, 2020 02:00 PM London
> ------
> Agenda [2]
>
> - Hanjun/Shameer - IORT reserved memory support
> - Zhangfei/Wangzhou - SVA support for SMMU stall mode
> - Hanjun - Uncore DVFS and how to support it (needs spec update, either
> ARM specs or ACPI)
Please see attached slide for the discussion.
Thanks
Hanjun
>
>
> [1] Linaro Open Discussions Home
> <https://collaborate.linaro.org/display/LOD/Linaro+Open+Discussions+Home>
> For all open meeting schedules
> [2] 2020-11-04 Proposed Meeting Agenda
> <https://collaborate.linaro.org/display/LOD/2020-11-04+Proposed+Meeting+Meet…>
>
> ----------
> Join Zoom Meeting
> https://linaro-org.zoom.us/j/98027304997
>
> Meeting ID: 980 2730 4997
> One tap mobile
> +16699009128,,98027304997# US (San Jose)
> +12532158782,,98027304997# US (Tacoma)
>
> Dial by your location
> +1 669 900 9128 US (San Jose)
> +1 253 215 8782 US (Tacoma)
> +1 301 715 8592 US (Germantown)
> +1 312 626 6799 US (Chicago)
> +1 346 248 7799 US (Houston)
> +1 646 558 8656 US (New York)
> 888 788 0099 US Toll-free
> 877 853 5247 US Toll-free
> Meeting ID: 980 2730 4997
> Find your local number: https://linaro-org.zoom.us/u/aehAhwidV2
>
Hi All,
This one made it onto the list of topics to discuss (now marked as no need
to discuss). I've been meaning to give a status update by email including
what is outstanding here. Please let me know if this fails to cover
some aspect of interest.
Background:
https://github.com/hisilicon/acpi-numa-whitepaper/releases/tag/v0.93 chapter 3.
Generic initiators are a concept in ACPI 6.3 (sec 5.2.16.6) to plug a hole
in the definition of proximity domains.
Proximity domains in ACPI (NUMA nodes in kernel) are defined by entries in SRAT
table. There are a whole range of different types of SRAT entry but before
ACPI 6.3 this more or less in practice meant that a proximity domain only
existed if it contained either (or both) memory and CPUs. Other initiators
of memory transactions such as network cards can be assigned to an existing
proximity domain via _PXM in ACPI DSDT. This restricted them to sharing a domain
with either memory or processors.
That doesn't always reflect system architecture, particularly with the addition
of richer descriptions of access characteristics (latency / bandwidth) brought
in by HMAT. Hence Generic Initiator domains to allow you to specify a
proximity domain with some other type of device in it (such as a network card)
and get all of the descriptive capability available for CPU / memory nodes.
Note that this was brought in prior to CXL becoming public but 1.1 CXL spec
states that initiators on CXL should be described using Generic Initiator nodes.
This should accelerate the number of users of this feature considerably.
It is also useful in some existing systems.
What support was needed in kernel:
1) Parsing of the SRAT Generic Initiator Affinity Structure
2) Instantiating the NUMA nodes that map to the GI PXM nodes to ensure stuff
like fallback lists for memory allocation work as normal.
3) Richer use of HMAT access characteristics to differentiate nearest CPU
to memory from nearest initiator to memory.
4) PXM assignment from the SRAT record rather than _PXM (not yet done).
5) PCI PXM assignment (not yet done)
Status:
The kernel patches sat on the list (with rebases) for well over a year
failing to get the architecture review needed (as there was significant
risk of breakage in both ARM64 and x86). It was to break this blockage
that we were interested in an open discussion on this. However, they did
recently get x86 review this and Rafael queued them for 5.10 (now merged)
The PCI PXM issue has been long standing due to some buggy firmware
on certain X86 boards and the need for a clarification in the ACPI spec
(added in 6.3). To make this safe, needed to ensure that NUMA nodes on
ACPI systems can only be instantiated during the main parse of SRAT.
https://lore.kernel.org/linux-mm/20200818142430.1156547-1-Jonathan.Cameron@…
That fix is now in place, and we'll resend the PCI fix shortly.
Note it may be "interesting" to support nodes from CXL CDAT tables at runtime
but that is another topic.
( https://uefi.org/node/4093https://lore.kernel.org/linux-cxl/20201102183428.00005f4f@Huawei.com/T/#m52… )
For a Generic Initiator Nodes, there are two ways a device an be assigned
to the proximity domain. Conventional _PXM in DSDT can be used and
that is now supported. The SRAT entry itself also contains an address
(PCI seg + BDF or Platform UID / HID based). There is no obligation to
provide both. The SRAT based method will require some level of alternative
infrastructure to that used for _PXM. We may look at this at some stage.
So a few outstanding things but probably not worth discussing on a call
at this stage unless anyone is seeing problems with the stuff already merged.
Thanks,
Jonathan
Topic: Linaro Open Discussions [1] - Kernel related
Time: Nov 4, 2020 02:00 PM London
------
Agenda [2]
- Hanjun/Shameer - IORT reserved memory support
- Zhangfei/Wangzhou - SVA support for SMMU stall mode
- Hanjun - Uncore DVFS and how to support it (needs spec update, either
ARM specs or ACPI)
[1] Linaro Open Discussions Home
<https://collaborate.linaro.org/display/LOD/Linaro+Open+Discussions+Home>
For all open meeting schedules
[2] 2020-11-04 Proposed Meeting Agenda
<https://collaborate.linaro.org/display/LOD/2020-11-04+Proposed+Meeting+Meet…>
----------
Join Zoom Meeting
https://linaro-org.zoom.us/j/98027304997
Meeting ID: 980 2730 4997
One tap mobile
+16699009128,,98027304997# US (San Jose)
+12532158782,,98027304997# US (Tacoma)
Dial by your location
+1 669 900 9128 US (San Jose)
+1 253 215 8782 US (Tacoma)
+1 301 715 8592 US (Germantown)
+1 312 626 6799 US (Chicago)
+1 346 248 7799 US (Houston)
+1 646 558 8656 US (New York)
888 788 0099 US Toll-free
877 853 5247 US Toll-free
Meeting ID: 980 2730 4997
Find your local number: https://linaro-org.zoom.us/u/aehAhwidV2
--
Mike Holmes | Director, Foundation Technologies, Linaro
Mike.Holmes(a)linaro.org <mike.holmes(a)linaro.org>
"Work should be fun and collaborative, the rest follows"