Hi James,
> > Qemu isn't responding with PSCI_DENIED when CPUs are forbidden. ('SUCCESS' means you
> > hit a 5 second timeout in the guest, on each CPU)
I have tested the straight forward case and it works.
Could you please elaborate on this so that I can look into the issue?
Thanks
Salil
Hello!
v0.1? Yeah, this is v0 from gitlab, rebased onto v6.0-rc3.
This series has only been lightly tested....
---
Hello!
This series adds what looks like cpuhotplug support to arm64 for use in
virtual machines. It does this by moving the cpu_register() calls for
architectures that support ACPI out of the arch code by using
GENERIC_CPU_DEVICES, then into the ACPI processor driver.
The kubernetes folk really want to be able to add CPUs to an existing VM,
in exactly the same way they do on x86. The use-case is pre-booting guests
with one CPU, then adding the number that were actually needed when the
workload is provisioned.
Wait? Doesn't arm64 support cpuhotplug already!?
In the arm world, cpuhotplug gets used to mean removing the power from a CPU.
The CPU is offline, and remains present. For x86, and ACPI, cpuhotplug
has the additional step of physically removing the CPU, so that it isn't
present anymore.
Arm64 doesn't support this, and can't support it: CPUs are really a slice
of the SoC, and there is not enough information in the existing ACPI tables
to describe which bits of the slice also got removed. Without a reference
machine adding this support to the spec is a wild goose chase.
Critically: everything described in the firmware tables must remain present.
For a virtual machine this is easy as all the other bits of 'virtual SoC'
are emulated, so they can (and do) remain present when a vCPU is 'removed'.
On a system that supports cpuhotplug the MADT has to describe every possible
CPU at boot. Under KVM, the vGIC needs to know about every possible vCPU before
the guest is started.
With these constraints, virtual-cpuhotplug is really just a hypervisor/firmware
policy about which CPUs can be brought online.
This series adds support for virtual-cpuhotplug as exactly that: firmware
policy. This may even work on a physical machine too; for a guest the part of
firmware is played by the VMM. (typically Qemu).
PSCI support is modified to return 'DENIED' if the CPU can't be brought
online/enabled yet. The CPU object's _STA method's enabled bit is used to
indicate firmware's current disposition. If the CPU has its enabled bit clear,
it will not be registered with sysfs, and attempts to bring it online will
fail. The notifications that _STA has changed its value then work in the same
way, and firmware can cause the CPU to be registered some time later, allowing
it to be brought online.
This creates something that looks like cpuhotplug to user-space, as the sysfs
files appear and disappear, and the udev notifications look the same.
One notable difference is the CPU present mask, which is exposed via sysfs.
Because the CPUs remain present throughout, they can still be seen in that mask.
This value does get used by webbrowsers to estimate the number of CPUs
as the CPU online mask is constantly changed on mobile phones.
Linux is tolerant of PSCI returning errors, as its always been allowed to do
that. To avoid confusing OS that can't tolerate this, we'd need an additional
bit in the MADT GICC flags. This series copies ACPI_MADT_ONLINE_CAPABLE, which
appears to be for this purpose, but calls it ACPI_MADT_GICC_CPU_CAPABLE as it
has a different bit position in the GICC.
I assume all x86 firmware vendors set the ENABLED bit in the CPU object's _STA
method. This code is unconditionally enabled for all ACPI architectures.
If there are problems with firmware tables on some devices, the CPUs will
already be online by the time the acpi_processor_make_enabled() is called.
A mismatch here causes a firmware-bug message and kernel taint. This should
only affect people with broken firmware who also boot with maxcpus=1, and
bring CPUs online later.
I had a go at switching the remaining architectures over to GENERIC_CPU_DEVICES,
so that the Kconfig symbol can be removed, but I got stuck with powerpc
and s390.
Thanks,
James Morse (22):
ACPI: Move ACPI_HOTPLUG_CPU to be enabled per architecture
drivers: base: Use present CPUs in GENERIC_CPU_DEVICES
drivers: base: Allow parts of GENERIC_CPU_DEVICES to be overridden
drivers: base: Move node_dev_init() before cpu_dev_init()
arm64: setup: Switch over to GENERIC_CPU_DEVICES using
arch_register_cpu()
ia64/topology: Switch over to GENERIC_CPU_DEVICES
x86/topology: Switch over to GENERIC_CPU_DEVICES
LoongArch: Switch over to GENERIC_CPU_DEVICES
ACPI: processor: Register all CPUs from acpi_processor_get_info()
ACPI: Rename ACPI_HOTPLUG_CPU to include 'present'
ACPI: Rename acpi_processor_hotadd_init and remove pre-processor
guards
ACPI: Check _STA present bit before making CPUs not present
ACPI: Warn when the present bit changes but the feature is not enabled
drivers: base: Implement weak arch_unregister_cpu()
LoongArch: Use the __weak version of arch_unregister_cpu()
arm64: acpi: Move get_cpu_for_acpi_id() to a header
ACPICA: Add new MADT GICC flags fields [code first?]
arm64, irqchip/gic-v3, ACPI: Move MADT GICC enabled check into a
helper
irqchip/gic-v3: Don't return errors from gic_acpi_match_gicc()
irqchip/gic-v3: Add support for ACPI's disabled but 'online capable'
CPUs
ACPI: add support to register CPUs based on the _STA enabled bit
arm64: document virtual CPU hotplug's expectations
Jean-Philippe Brucker (3):
arm64: psci: Ignore DENIED CPUs
KVM: arm64: Pass hypercalls to userspace
KVM: arm64: Pass PSCI calls to userspace
Documentation/arm64/cpu-hotplug.rst | 79 ++++++++++++++++++
Documentation/arm64/index.rst | 1 +
Documentation/virt/kvm/api.rst | 31 ++++++-
Documentation/virt/kvm/arm/hypercalls.rst | 1 +
arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/acpi.h | 11 +++
arch/arm64/include/asm/cpu.h | 1 -
arch/arm64/include/asm/kvm_host.h | 2 +
arch/arm64/kernel/acpi_numa.c | 11 ---
arch/arm64/kernel/psci.c | 2 +-
arch/arm64/kernel/setup.c | 13 +--
arch/arm64/kernel/smp.c | 5 +-
arch/arm64/kvm/arm.c | 15 +++-
arch/arm64/kvm/hypercalls.c | 28 ++++++-
arch/arm64/kvm/psci.c | 13 +++
arch/ia64/Kconfig | 2 +
arch/ia64/include/asm/acpi.h | 2 +-
arch/ia64/include/asm/cpu.h | 11 ---
arch/ia64/kernel/acpi.c | 6 +-
arch/ia64/kernel/setup.c | 2 +-
arch/ia64/kernel/topology.c | 35 ++------
arch/loongarch/Kconfig | 2 +
arch/loongarch/kernel/topology.c | 31 +------
arch/x86/Kconfig | 2 +
arch/x86/include/asm/cpu.h | 6 --
arch/x86/kernel/acpi/boot.c | 4 +-
arch/x86/kernel/topology.c | 19 +----
drivers/acpi/Kconfig | 5 +-
drivers/acpi/acpi_processor.c | 99 ++++++++++++++++-------
drivers/acpi/processor_core.c | 2 +-
drivers/base/cpu.c | 21 +++--
drivers/base/init.c | 2 +-
drivers/firmware/psci/psci.c | 2 +
drivers/irqchip/irq-gic-v3.c | 38 +++++----
include/acpi/actbl2.h | 1 +
include/kvm/arm_hypercalls.h | 1 +
include/kvm/arm_psci.h | 4 +
include/linux/acpi.h | 10 ++-
include/linux/cpu.h | 6 ++
include/uapi/linux/kvm.h | 2 +
40 files changed, 339 insertions(+), 190 deletions(-)
create mode 100644 Documentation/arm64/cpu-hotplug.rst
--
2.30.2
OpenEuler / Bigtop discussion
Wednesday Feb 1, 2023 ⋅ 10am – 10:50am
China Standard Time - Shanghai
Location
https://linaro-org.zoom.us/j/99098081330https://www.google.com/url?q=https%3A%2F%2Flinaro-org.zoom.us%2Fj%2F9909808…
Hi, AllI would like to call a meeting to discuss our progress in OpenEuler
/ Bigtop enablement. You are welcomed to dial in. As a topic, we proposed
the following text as a presentation to the bigtop's dev-maillist. Please
help review. Your comments of any kind are highly appreciated.==1.
IntroductionOpenEuler is an innovative open source OS platform built on
kernel innovations and a solid cloud base to cover all scenarios.(1) The
OpenEuler operating system has coverd mainstream technical architectures,
including ARM/X86/RISC-V/NPU/GPU/DPU etc.(2) All scenarios application
support and development tool chain.More information can be found on the
official website: https://www.openeuler.org/en/More and more manufacturers
use OpenEuler OS, so we are ready to support OpenEuler OS for Bigtop.2.
Related work(1) Support the build of puppet and slaves docker, the base
docker can pull from the OpenEuler web.PR
address: https://github.com/apache/bigtop/pull/1051Issue
address: https://issues.apache.org/jira/browse/BIGTOP-3873(2) All
components has compiled.(Bigtop master branch, ARM/X86)(3) All components
smoke test passed,except ambari component(fix the problem to support
python3).(4) JIRA: BIGTOP-3875 introduced the adaptation work content,
including 11 sub JIRAs.https://issues.apache.org/jira/browse/BIGTOP-38753.
Next(1) A discussion on introducing Bigtop into OpenEuler community has
been initiated, as a community maintainer of openEuler Bigdata SIG.(2) We
will support and maintain the openEuler version for a long
time.==──────────Guodong Xu is inviting you to a scheduled Zoom
meeting.Join Zoom Meetinghttps://linaro-org.zoom.us/j/99098081330Meeting
ID: 990 9808 1330One tap mobile+16694449171,,99098081330#
US+16699009128,,99098081330# US (San Jose)Dial by your location +1
669 444 9171 US +1 669 900 9128 US (San Jose) +1 346 248 7799
US (Houston) +1 719 359 4580 US +1 253 205 0468 US +1
253 215 8782 US (Tacoma) +1 386 347 5053 US +1 507 473 4847
US +1 564 217 2000 US +1 646 558 8656 US (New York) +1
646 931 3860 US +1 689 278 1000 US +1 301 715 8592 US
(Washington DC) +1 305 224 1968 US +1 309 205 3325 US
+1 312 626 6799 US (Chicago) +1 360 209 5623 US 877 853 5247
US Toll-free 888 788 0099 US Toll-freeMeeting ID: 990 9808 1330Find
your local number: https://linaro-org.zoom.us/u/adCzz3i18X──────────
Organizer
guodong.xu(a)linaro.org
guodong.xu(a)linaro.org
Guests
guodong.xu(a)linaro.org - organizer
linaro-open-discussions(a)op-lists.linaro.org
View all guest info
https://calendar.google.com/calendar/event?action=VIEW&eid=MWY5YW40bmYwN2E0…
Reply for linaro-open-discussions(a)op-lists.linaro.org and view more details
https://calendar.google.com/calendar/event?action=VIEW&eid=MWY5YW40bmYwN2E0…
Your attendance is optional.
~~//~~
Invitation from Google Calendar: https://calendar.google.com/calendar/
You are receiving this email because you are an attendee on the event. To
stop receiving future updates for this event, decline this event.
Forwarding this invitation could allow any recipient to send a response to
the organizer, be added to the guest list, invite others regardless of
their own invitation status, or modify your RSVP.
Learn more https://support.google.com/calendar/answer/37135#forwarding
Linaro Open Discussions monthly meeting
Friday 3 Feb 2023 ⋅ 22:00 – 23:00
Hong Kong Standard Time
Location
https://linaro-org.zoom.us/j/95682500341https://www.google.com/url?q=https%3A%2F%2Flinaro-org.zoom.us%2Fj%2F9568250…
Joyce QI 邀请您参加预先安排的 Zoom 会议。
加入 Zoom 会议
https://linaro-org.zoom.us/j/95682500341
会议号:956 8250 0341
手机一键拨号
+16699009128,,95682500341# 美国 (San Jose)
+13462487799,,95682500341# 美国 (Houston)
根据您的位置拨号
+1 669 900 9128 美国 (San Jose)
+1 346 248 7799 美国 (Houston)
+1 253 215 8782 美国 (Tacoma)
+1 646 558 8656 美国 (New York)
+1 301 715 8592 美国 (Washington DC)
+1 312 626 6799 美国 (Chicago)
888 788 0099 美国 免费
877 853 5247 美国 免费
会议号:956 8250 0341
查找本地号码:https://linaro-org.zoom.us/u/ady2J9Zn7t
Organiser
joyce.qi(a)linaro.org
joyce.qi(a)linaro.org
Guests
joyce.qi(a)linaro.org- organiser
Mike Holmes
jonathan.cameron(a)huawei.com
lorenzo.pieralisi(a)arm.com
james.morse(a)arm.com
shameerali.kolothum.thodi(a)huawei.com
linaro-open-discussions(a)op-lists.linaro.org
linux(a)armlinux.org.uk
View all guest info
https://calendar.google.com/calendar/event?action=VIEW&eid=cmlrbjNnOTdlODdu…
Reply for linaro-open-discussions(a)op-lists.linaro.org and view more details
https://calendar.google.com/calendar/event?action=VIEW&eid=cmlrbjNnOTdlODdu…
Your attendance is optional.
~~//~~
Invitation from Google Calendar: https://calendar.google.com/calendar/
You are receiving this email because you are an attendee of the event. To
stop receiving future updates for this event, decline this event.
Forwarding this invitation could allow any recipient to send a response to
the organiser, be added to the guest list, invite others regardless of
their own invitation status or modify your RSVP.
Learn more https://support.google.com/calendar/answer/37135#forwarding
Hi all,
Any topics needs to be discussed on next Tuesday?
Thanks:)
Joyce
> 在 2023年1月17日,上午8:00,linaro-open-discussions-request@op-lists.linaro.org 写道:
>
> Send Linaro-open-discussions mailing list submissions to
> linaro-open-discussions(a)op-lists.linaro.org
>
> To subscribe or unsubscribe via email, send a message with subject or
> body 'help' to
> linaro-open-discussions-request(a)op-lists.linaro.org
>
> You can reach the person managing the list at
> linaro-open-discussions-owner(a)op-lists.linaro.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Linaro-open-discussions digest..."
>
> Today's Topics:
>
> 1. Re: [RFC PATCH 2/2] MPAM/resctrl: allocate a domain per component
> (Hesham Almatary)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 16 Jan 2023 11:31:06 +0000
> From: Hesham Almatary <hesham.almatary(a)huawei.com>
> Subject: [Linaro-open-discussions] Re: [RFC PATCH 2/2] MPAM/resctrl:
> allocate a domain per component
> To: James Morse <james.morse(a)arm.com>
> Cc: linaro-open-discussions(a)op-lists.linaro.org, linuxarm(a)huawei.com
> Message-ID: <20230116113106.00001a2d(a)huawei.com>
> Content-Type: text/plain; charset="US-ASCII"
>
> Hello James,
>
> Many thanks for clarifying things up, that definitely helps with my
> understanding.
>
> On Thu, 12 Jan 2023 13:38:17 +0000
> James Morse <james.morse(a)arm.com> wrote:
>
>> Hi Hesham,
>>
>> On 12/01/2023 10:34, Hesham Almatary wrote:
>>> Thanks for getting back to me on this. I have done some changes to
>>> my ACPI tables and got your code working fine without this patch. In
>>> particular, I matched the proximity domain field with the NUMA node
>>> ID for each memory controller. If they differ, the code won't work
>>> (as it has the assumption that the proximity domain is the same as
>>> NUMA id,
>>
>> Right, if there is an extra level of indirection in there, its
>> something I wasn't aware of. I'll need to dig into it. I agree this
>> explains what you were seeing.
>>
>>
>>> from which the affinity/accessibility is set). This leaves me with a
>>> few questions regarding the design and implementation of the driver.
>>> I'd appreciate your input on that.
>>
>>> 1) What does a memory MSC correspond to? A class (with a unique ID)
>>> or a component? From the code, it seems like it maps to a component
>>> to me.
>>
>> An MSC is the device, it has registers and generates interrupts. If
>> its part of your memory controller, it gets described like that in
>> the ACPI tables, which lets linux guess that this MSC (or the RIS
>> within it) control some policy in the memory controller.
>>
>> Components exist to group devices that should be configured the same.
>> This happens where designs are sliced up, but this slicing makes no
>> sense to the software. Classes are a group of components that do the
>> same thing, but not to the same resource. e.g. they control memory
>> controllers.
>>
>> The ACPI tables should describe the MSC, its up to the driver to
>> build the class and component structures from what it can infer from
>> the other ACPI tables.
>>
>>
>>> 2) Could we have a use case in which we have different class IDs
>>> with the same class type? If yes could you please give an example?
>>
>> Your L2 and L3 are both caches, but use the level number as the id.
>> I doubt anyone builds a system with MSC on both, but its possible by
>> the architecture, and we could expose both via resctrl.
>>
>>
>>> 3) What should a component ID for a memory MSC be/represent? The
>>> code assumes it's a (NUMA?) node ID.
>>
>> The component-ids are some number that makes sense to linux, and
>> matches something in the ACPI tables. These are exposed via the
>> schema file to user-space. For the caches, its the cache-id property
>> from the PPTT table. This is exposed to user-space via
>> /sys/devices/system/cpu/cpu0/cache/index3/id or equivalent.
>>
>> Its important that user-space can work out which CPUs share a
>> component/domain in the schema. Using a sensible id is the
>> pre-requisite for that.
>>
>> Intel's memory bandwidth control appears to be implemented on the L3,
>> so they re-use the id of the L3 cache. These seem to correspond to
>> NUMA nodes already.
>>
>> For MPAM - we have no idea if the memory controllers map 1:1 with any
>> level in the cache. Instead, the driver expects to use the numa node
>> number directly.
>>
>> (I'll put this on the list of KNOWN_ISSUES, the Intel side of this
>> ought to be cleaned up so it doesn't break if they build a SoC where
>> L3 doesn't map 1:1 with Numa nodes. It looks like they are getting
>> away with it because Atom doesn't support L3 or memory bandwidth)
>>
> That's very useful and informative. Some form of documentation (in
> KNOWN_ISSUES, comments, or a README) would be quite useful for such
> assumptions as the ACPI/MPAM spec doesn't mention that (i.e., the
> logical ID assignments from OS point of view).
>
>>
>>> 4) What should a class ID represent for a memory MSC? Which is
>>> different from the class type itself.
>>
>> The class id is private to the driver, for the caches it needs to be
>> the cache level. Because of that memory is shoved at the end, on the
>> assumption no-one has an L255 cache, and 'unknown' devices are shoved
>> at the beginning... L0 caches probably do exist, but I doubt anyone
>> would add an MSC to them.
>>
>> Classes can't be arbitrarily created, as the resctrl picking code
>> needs to know how they map to resctrl schemas, as we can't invent new
>> schemas without messing up user-space.
>>
>>
>>> 5) How would 4 memory MSCs (with different proximity domains) map to
>>> classes and components?
>>
>> Each MSC would be a device. There would be one device per component,
>> because each proximity domain is different. They would all be the
>> same class, as you'd described them all with a memory type in the
>> ACPI tables.
>>
>> If you see a problem with this, let me know! The folk who write the
>> ACPI specs didn't find any systems where this would lead to
>> problems... that doesn't mean you haven't build something that looks
>> quite different.
>>
>>
>>> 6) How would 2 Memory MSCs with(in) the same proximity domain and/or
>>> same NUMA node work, if at all?
>>
>> If you build this, I bet your hardware people say those two MSC must
>> be programmed the same for the regulation to work. (if not - how is
>> software expected to understand the hashing scheme used to map
>> physical-addresses to memory controllers?!)
>>
>> Each MSC would be a device. They would both be part of the same
>> component as they have the same proximity domain.
>>
>> Configuration is applied to the component, so each device/MSC within
>> the component is always configured the same.
>>
>>
>>> 7) Should the ACPI/MPAM MSC's "identifier" field be mapped to class
>>> IDs or component IDs at all?
>>
>> Classes, no - these are just for the driver to keep track of the
>> groups. Components, probably ... but another number may make more
>> sense. This should line up with something that is already exposed to
>> user-space via sysfs.
>>
>>
>>
>> Thanks,
>>
>> James
>>
>>
>
>
> ------------------------------
>
> Subject: Digest Footer
>
> Linaro-open-discussions mailing list -- linaro-open-discussions(a)op-lists.linaro.org
> To unsubscribe send an email to linaro-open-discussions-leave(a)op-lists.linaro.org
>
>
> ------------------------------
>
> End of Linaro-open-discussions Digest, Vol 28, Issue 3
> ******************************************************
Hi,
It's soon time for another LOC monthly meeting. For time and connection
details see the calendar at https://www.trustedfirmware.org/meetings/
I have one topic to discuss:
OP-TEE 3.20.0 is just released and a PR [1] to upgrade from TEE Internal
Core API version 1.1 to 1.3.1 is soon to be merged. This may bump the
major OP-TEE version in the next release. There's in particular the
define TEE_ALG_SM2_PKE which is tricky to keep compatible when
upgrading. We have some time until the next release to determine if this
is reason enough to bump the major version.
Any other topics?
[1] https://github.com/OP-TEE/optee_os/pull/5688
Thanks,
Jens
Potential issues (and fixes to them) for the current MPAM supprot by Arm [1].
[1] https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git/log/?h=mpam…
Hesham Almatary (2):
MPAM: Fix calculating the bandwidth granularity
MPAM/resctrl: allocate a domain per component
drivers/platform/mpam/mpam_resctrl.c | 30 +++++++++++++---------------
1 file changed, 14 insertions(+), 16 deletions(-)
--
2.33.0