[Linaro-open-discussions] Re: [RFC PATCH v0.1 22/25] ACPI: add support to register CPUs based on the _STA enabled bit

26 Sep 2022


      Hi Salil,
On 12/09/2022 19:09, Salil Mehta wrote:
...
...
From: James Morse [mailto:james.morse@arm.com]
Sent: Friday, September 9, 2022 5:53 PM
To: Salil Mehta salil.mehta@huawei.com;
linaro-open-discussions@op-lists.linaro.org
Cc: lorenzo.pieralisi@linaro.org; Jean-Philippe Brucker
jean-philippe@linaro.org; Jonathan Cameron jonathan.cameron@huawei.com
Subject: Re: [RFC PATCH v0.1 22/25] ACPI: add support to register CPUs based
on the _STA enabled bit
...
...
On 09/09/2022 15:53, Salil Mehta wrote:
...
...
From: James Morse [mailto:james.morse@arm.com]
Sent: Wednesday, August 31, 2022 12:09 PM
To: linaro-open-discussions@op-lists.linaro.org
Cc: Salil Mehta salil.mehta@huawei.com; james.morse@arm.com;
lorenzo.pieralisi@linaro.org; Jean-Philippe Brucker
jean-philippe@linaro.org
Subject: [RFC PATCH v0.1 22/25] ACPI: add support to register CPUs based on
the
...
...
_STA enabled bit
acpi_processor_get_info() registers all present CPUs. Registering a
CPU is what creates the sysfs entries and triggers the udev
notifications.
arm64 virtual machines that support 'virtual cpu hotplug' use the
enabled bit to indicate whether the CPU can be brought online, as
the existing ACPI tables require all hardware to be described and
present.
If firmware describes a CPU as present, but disabled, skip the
registration. Such CPUs are present, but can't be brought online for
whatever reason. (e.g. firmware/hypervisor policy).
Once firmware sets the enabled bit, the CPU can be registered and
brought online by user-space. Online CPUs, or CPUs that are missing
an _STA method must always be registered.
...
...
...
This change and setting all possible cpus as *present* in smp_prepare_cpus()
will always cause all present == possible in the guest kernel.
This is quite deliberate. I don't want to redefine present without a machine
that actually
supports hotplug/package-hotadd. This stuff is the tip of an ill-defined iceberg
in the
ACPI spec. Once there is hardware that supports this, we will have a better idea
of what
needs changing. Until then: everything described by ACPI must be present.
...
Present mask operates on the logical cpuids. Later are more closely related to
the Linux abstract model. I see no problem in masking certain available devices(in
this case cpus) from upper user. This is done at many places inside the kernel to
intentionally not/conditionally expose certain devices to user even after getting
discovered at the boot time or later.
And that is what this series does via sysfs.
But! The cpu numbers are primarily for making 0x4 and 0x100 contiguous so that the percpu
allocator and other kernel data structures can use 'cpu' as an index.
I am strongly against touching the cpu present mask as these CPUs really are present. We
depend on everything described in ACPI being present. Having a ugly hack where we removed
the CPUs in this case means we need a new word for 'present' when we get machines that
really do have not-present CPUs. (simon-says-present?). That will be hard enough, without
muddying the waters with things like this.
Diverging the definitions of linux:present and acpi:present will put the maintainer in an
asylum.
Part of the problem here is selling this as 'virtual CPU hotplug', while it is really
"firmware CPU online policy", which is the problem description the Kubernetes folk had.
...
As such, this change can co-exists irrespective of whether Hotplug or Hotadd will
ever exist in the system.
I agree with the ACPI part and maybe interface is broken but then you have used
ACPI_STA_DEVICE_ENABLED which has not been used yet in acpi_processor.c code
which is ACPI related. How can you make sure this bit is being set by firmware
of other architectures, especially legacy?
I assume that this is broken on many platforms. The cover letter describes how the
workaround works, if the CPUs are online because firmware isn't enforcing the policy, then
the CPUs still get registered. From memory you get a warning and a kernel taint.
[..]
...
...
...
This shall ensure that we correctly reflect only present vcpus to the linux
kernel although the sizing and initialization of the GICC/GICR would have
already happened for the complete set for possible vcpus i.e. the ones with
[1] _STA[0] is set & _STA[1] bit is set and
[2] Either GICC_flag_Intf_Flag.Enabled set  OR GICC_flag.online_capable set
so effectively we are only deferring populating the cpu present mask for the
disabled cpus but which are now online capable(or Hotplug capable in future?)
What is the user observable effect of the kernel knowing this CPUs are really
present?
...
User Interface  looks inconsistent and can break existing scripts.
As you can see, user requested max possible cpus(=6) and cold booted cpus(=4)
Hence, the number of cpus directories correctly being shown are 4 but then
total number of cpus present are being shown as 6 (i.e. 0-5).
If we can defer the registration of the disabled cpus (but are online capable
i.e. for possible - present) then I don’t see why we can't mask availability
of these cpus by not marking them as present to user so that the entries
are consistent. With this scripts/utils using these values can go horribly
wrong.
At Guest Kernel
estuary:/$ ls -al /sys/devices/system/cpu/
total 0
drwxr-xr-x   12 root     0                0 Sep  9 19:19 .
drwxr-xr-x    8 root     0                0 Sep  9 19:19 ..
drwxr-xr-x    7 root     0                0 Sep  9 19:19 cpu0
drwxr-xr-x    7 root     0                0 Sep  9 19:19 cpu1
drwxr-xr-x    7 root     0                0 Sep  9 19:19 cpu2
drwxr-xr-x    7 root     0                0 Sep  9 19:19 cpu3
drwxr-xr-x    2 root     0                0 Sep  9 19:19 cpufreq
drwxr-xr-x    2 root     0                0 Sep  9 19:19 cpuidle
[...]
estuary:/$ cat /sys/devices/system/cpu/possible
0-5
estuary:/$ cat /sys/devices/system/cpu/present
0-5
estuary:/$ cat /sys/devices/system/cpu/offline
4-5
estuary:/$
At Qemu
$QEMUBIN --enable-kvm -machine virt,gic-version=3 -cpu host -smp cpus=4,maxcpus=6
-append "console=ttyAMA0 root=/dev/ram earlycon rdinit=/init maxcpus=4 acpi=force"
I allege there is no user-space that does this.
About a year ago I trawled through the debian codesearch for use of the cpu present mask
in sysfs. All I found were web browsers that are using it as a hack because they can't
trust the enabled list on android. This fed into the memory allocator, so the outcome
would be more memory use on a system where not all the CPUs are online.
The present and possible masks really shouldn't be exposed to user-space at all.
...
...
The intention of this series is to do this as pure policy.
I anticipate pressure on the "use the MADT GICR" line, even though ACPI doesn't
say
anything about the presence of MADT GICC's redistributor entry. If this happens,
we'd
depend on present meaning present.
If we are confident that flag ACPI_STA_DEVICE_ENABLED is being set properly by
ARM and other architecture firmware,
I don't follow. I expect this to be wrong on many, many, x86 laptops. The warning and the
kernel taint exist to spot such platforms and keep them running.
...
then Qemu can take care of that policy. It
has all the information of the vcpus which are possible and disabled (but are
online capable). We can use this info to conditionally return appropriate status
when _STA ACPI method is evaluated.
I intentionally refrained to use the this approach in my first RFC[1] as the
default code in the acpi_processor.c was only making use of the
ACPI_STA_DEVICE_PRESENT bit after evaluation of _STA method. Qemu was also
setting only present bit in the returned status value. Plus, I wanted to
minimize the changes in the kernel in the first version of the RFC.
We must avoid undermining what present means. The whole ACPI edifice is standing on the
assumption that everything in ACPI is present.
Once we get systems that really do support physical hotadd, arm will have to specify
whether the cpu present bit implies the presence of the associated GICR, ITS, PMU, SMMU
etc etc. Until then, any change to the way linux handles the present bit is creating
problems for the future.
I have tried to get some guidance from the folk who write the specs on this, but as there
are no systems that support the feature, it isn't possible to know what the hardware
constraints are.
...
...
All the hotplug/package-hotadd machinery is triggered by udev. We don't need
to hack the
cpu present mask to make that work.
May I know what exactly are your apprehensions with 'udev'?
I have no apprehensions. As far as I can see the 'requirement' is that the udev event
corresponding to cpu-register is seen by user-space.
The Kubernetes folk don't want to put their online/offline policy in a user-space agent.
(or to manage the _physical_ resources that are being consumed in the hypervisor). But
they must have some user-space agent to bring CPUs online because the kernel doesn't do
it. I assume those are udev rules. (if not - why not?)
With this series, the output for 'udevadm monitor' should be the same for x86 doing
physical hot-add of a package (including all the things that come with a CPU), and
notifying arm64 of a change in the firmware online/offline policy.
Can we get the Qemu parts done so we can check with the Kubernetes folk that this works
for them?
...
As such 'udev' should make use of the Linux device model and it is not necessary
to present 1:1 picture of the hardware to the abstract model(and which by the way
we are not doing by not registering the disabled cpus). It will just expose that
limited picture of the hardware to the user whatever is being presented by the
kernel.
AFAICS it should work just fine but we need to limit the present cpus.
which would be an ugly hack. They really are present - we need them to be present because
the irqchip driver has to access all the GICR to find the system wide supported features.
The GIC maintainer said he would not consider bringing GICR online late until there is
hardware that needs it. Leaving the present bit alone means we can spot that hardware when
it comes.
...
...
...
Question:
Q1: Current acpi_processor.c code is not using ACPI_STA_DEVICE_{ENABLED, UI}
bits. Could it break other architecture if we use these bits but some of their
legacy devices or firmware does not initialize these bits to their defaults?
Almost certainly! I'm pretty confident some vendors generate their ACPI tables
using
markov-models. (It boots! Ship it!)
The approach that used the UI bit to mean sysfs had to be hidden behind a Kconfig
symbol,
which is only marginally better than #ifdef CONFIG_ARM64.
If there are problems in using the ACPI_STA_DEVICE_UI Bit because it might
conflict with the legacy firmware of other architectures then let us drop that.
...
We can alternatively use the ACPI_STA_DEVICE_ENABLED Bit in the _STA method
which can be conditionally set by the Qemu?
Yes ... this corresponds with the enabled bit in the MADT GICC flags that was left clear
at boot. Sorry, I thought this MADT:GICC:online_capable thing had been discussed on one of
the LOD calls.
The expectation is firmware clears the MADT:GICC:enabled bit, and sets 'online capable'
instead, to indicate this CPU is present, but subject to some kind of policy. The OS can
read _STA to find the current policy, and try to online it if the policy has changed.
Linux keys this on the _STA:Enabled bit, as that has the same name as the bit that was
left clear in the MADT:GICC flags field. The _STA:Present bit has to remain set throughout
as the MADT:GICC entry exists - and everything we have today in ACPI assumes everything is
present.
I did ask to replicated the _STA flags in the MADT:GICC flags field, but it didn't happen.
...
...
This new version walks a fine line described in the cover-letter: any platform
with
firmware tables that get this wrong should get the same user-experience as there
is no
policy enforcement on x86, so the !online_capable CPUs can be detected as being
online,
and the policy stuff gets ignored.
...
Yes, I do understand your predicament, but ideally user experience is dictated
by what *end* user sees. Here, by not masking the disabled cpus in the cpu present
mask user will not have similar experience on ARM64 and x86_64 platforms and that
is undeniable and will in the end matter the most since this feature will mostly
be used on the servers.
But the CPUs are present. If they are not, we don't know how to describe the system in
ACPI - the whole edifice stands on the assumption that everything is present.
I agree this is a different solution to the "I wanna a webpage to add CPUs to vms" problem
than x86's physical package hot add. x86 can do that because they had these physical
machines before they had virtual machines.
Thanks,
James

2025

2024

2023

2022

2021

2020

[Linaro-open-discussions] Re: [RFC PATCH v0.1 22/25] ACPI: add support to register CPUs based on the _STA enabled bit

At Guest Kernel

At Qemu