Hi Salil,
ok,I will postpone this call to next Tuesday same time firstly.
Welcome to join.
Thanks:) Joyce
在 2022年9月27日,上午12:02,Salil Mehta salil.mehta@huawei.com 写道:
Hi Joyce, Lorenzo,
From: Lorenzo Pieralisi [mailto:lorenzo.pieralisi@linaro.org] Sent: Monday, September 26, 2022 9:06 AM To: Joyce Qi joyce.qi@linaro.org Cc: Jonathan Cameron jonathan.cameron@huawei.com; Ilkka Koskinen ilkka@os.amperecomputing.com; Salil Mehta salil.mehta@huawei.com; James Morse james.morse@arm.com; Jonathan Cameron via Linaro-open-discussions linaro-open-discussions@op-lists.linaro.org Subject: Re: Linaro-open-discussions Digest, Vol 24, Issue 4
On Sat, 24 Sept 2022 at 14:58, Joyce Qi joyce.qi@linaro.org wrote:
Hi Jonathan,Lorenzo,all,
Do we have any topic to sync next week?
I am just back from a week off, I don't know if it is worth having a sync-up on virt CPU HP, I could not check what's the current status.
I have few updates but I will not be available this week as I might be on/off the office due to some medical issues and related appointments.
I have experimented a bit with James patches and found some issues with the approach especially usage of _STA.ENA. Few others I have mentioned in the earlier email discussions. I have fixed some trivial memory corruption issues in my earlier QEMU patches as well which now have been forward ported. These changes still need to be properly tested before I can share.
If you would like to discuss above further, could we have a call somewhere next week or any other time whichever works for everyone involved/interested?
Many thanks Salil
Thanks, Lorenzo
Thanks:) Joyce
在 2022年9月13日,上午8:00,
linaro-open-discussions-request@op-lists.linaro.org 写道:
Send Linaro-open-discussions mailing list submissions to linaro-open-discussions@op-lists.linaro.org
To subscribe or unsubscribe via email, send a message with subject or body 'help' to linaro-open-discussions-request@op-lists.linaro.org
You can reach the person managing the list at linaro-open-discussions-owner@op-lists.linaro.org
When replying, please edit your Subject line so it is more specific than "Re: Contents of Linaro-open-discussions digest..."
Today's Topics:
- Re: [RFC PATCH v0.1 22/25] ACPI: add support to register CPUs based
on the _STA enabled bit
(Salil Mehta)
Message: 1 Date: Mon, 12 Sep 2022 18:09:34 +0000 From: Salil Mehta salil.mehta@huawei.com Subject: [Linaro-open-discussions] Re: [RFC PATCH v0.1 22/25] ACPI: add support to register CPUs based on the _STA enabled bit To: James Morse james.morse@arm.com, "linaro-open-discussions@op-lists.linaro.org" linaro-open-discussions@op-lists.linaro.org Cc: "lorenzo.pieralisi@linaro.org" lorenzo.pieralisi@linaro.org Message-ID: 65c52e8ba75e4cc59ec4b88a44c8a13b@huawei.com Content-Type: text/plain; charset="utf-8"
Hi James
From: James Morse [mailto:james.morse@arm.com] Sent: Friday, September 9, 2022 5:53 PM To: Salil Mehta salil.mehta@huawei.com; linaro-open-discussions@op-lists.linaro.org Cc: lorenzo.pieralisi@linaro.org; Jean-Philippe Brucker jean-philippe@linaro.org; Jonathan Cameron
Subject: Re: [RFC PATCH v0.1 22/25] ACPI: add support to register CPUs based on the _STA enabled bit
Hi Salil,
On 09/09/2022 15:53, Salil Mehta wrote:
> From: James Morse [mailto:james.morse@arm.com] > Sent: Wednesday, August 31, 2022 12:09 PM > To: linaro-open-discussions@op-lists.linaro.org > Cc: Salil Mehta salil.mehta@huawei.com; james.morse@arm.com; > lorenzo.pieralisi@linaro.org; Jean-Philippe Brucker > jean-philippe@linaro.org > Subject: [RFC PATCH v0.1 22/25] ACPI: add support to register CPUs based
on
the
> _STA enabled bit > > acpi_processor_get_info() registers all present CPUs. Registering a > CPU is what creates the sysfs entries and triggers the udev > notifications. > > arm64 virtual machines that support 'virtual cpu hotplug' use the > enabled bit to indicate whether the CPU can be brought online, as > the existing ACPI tables require all hardware to be described and > present. > > If firmware describes a CPU as present, but disabled, skip the > registration. Such CPUs are present, but can't be brought online for > whatever reason. (e.g. firmware/hypervisor policy). > > Once firmware sets the enabled bit, the CPU can be registered and > brought online by user-space. Online CPUs, or CPUs that are missing > an _STA method must always be registered.
> diff --git a/drivers/acpi/acpi_processor.c
b/drivers/acpi/acpi_processor.c
> index 1bd6e4b8ab66..42521d89c378 100644 > --- a/drivers/acpi/acpi_processor.c > +++ b/drivers/acpi/acpi_processor.c > @@ -194,6 +194,32 @@ static int acpi_processor_make_present(struct > acpi_processor *pr) > return ret; > } > > +static int acpi_processor_make_enabled(struct acpi_processor *pr) > +{ > + unsigned long long sta; > + acpi_status status; > + bool present, enabled; > + > + if (!acpi_has_method(pr->handle, "_STA")) > + return arch_register_cpu(pr->id); > + > + status = acpi_evaluate_integer(pr->handle, "_STA", NULL, &sta); > + if (ACPI_FAILURE(status)) > + return -ENODEV; > + > + present = sta & ACPI_STA_DEVICE_PRESENT; > + enabled = sta & ACPI_STA_DEVICE_ENABLED; > + > + if (cpu_online(pr->id) && (!present || !enabled)) { > + pr_err_once(FW_BUG "CPU %u is online, but described as not
present
or
> disabled!\n", pr->id); > + add_taint(TAINT_FIRMWARE_WORKAROUND, LOCKDEP_STILL_OK); > + } else if (!present || !enabled) { > + return -ENODEV; > + }
This change and setting all possible cpus as *present* in smp_prepare_cpus() will always cause all present == possible in the guest kernel.
This is quite deliberate. I don't want to redefine present without a machine that actually supports hotplug/package-hotadd. This stuff is the tip of an ill-defined
iceberg
in the ACPI spec. Once there is hardware that supports this, we will have a better
idea
of what needs changing. Until then: everything described by ACPI must be present.
Present mask operates on the logical cpuids. Later are more closely related
to
the Linux abstract model. I see no problem in masking certain available
devices(in
this case cpus) from upper user. This is done at many places inside the kernel
to
intentionally not/conditionally expose certain devices to user even after
getting
discovered at the boot time or later.
As such, this change can co-exists irrespective of whether Hotplug or Hotadd
will
ever exist in the system.
I agree with the ACPI part and maybe interface is broken but then you have
used
ACPI_STA_DEVICE_ENABLED which has not been used yet in acpi_processor.c code which is ACPI related. How can you make sure this bit is being set by firmware of other architectures, especially legacy?
I think we can avoid that by the trick which Jean-Phillipe exploited in his patch-set[1] sent earlier last year.
That was the other side of this:
https://gitlab.arm.com/linux-arm/linux-jm/-/commit/3106cccf5b9f01f44789b748
aaee3a95fee99a97
This was an attempt to do all this without changes to the ACPI spec - it
doesn't
touch the present cpumask.
Yes, I did refer those but the idea was not to use that change as it is.
[..]
This shall ensure that we correctly reflect only present vcpus to the linux kernel although the sizing and initialization of the GICC/GICR would have already happened for the complete set for possible vcpus i.e. the ones
with
[1] _STA[0] is set & _STA[1] bit is set and [2] Either GICC_flag_Intf_Flag.Enabled set OR GICC_flag.online_capable
set
so effectively we are only deferring populating the cpu present mask for
the
disabled cpus but which are now online capable(or Hotplug capable in future?)
What is the user observable effect of the kernel knowing this CPUs are really present?
User Interface looks inconsistent and can break existing scripts.
As you can see, user requested max possible cpus(=6) and cold booted cpus(=4) Hence, the number of cpus directories correctly being shown are 4 but then total number of cpus present are being shown as 6 (i.e. 0-5).
If we can defer the registration of the disabled cpus (but are online capable i.e. for possible - present) then I don’t see why we can't mask availability of these cpus by not marking them as present to user so that the entries are consistent. With this scripts/utils using these values can go horribly wrong.
At Guest Kernel
estuary:/$ ls -al /sys/devices/system/cpu/ total 0 drwxr-xr-x 12 root 0 0 Sep 9 19:19 . drwxr-xr-x 8 root 0 0 Sep 9 19:19 .. drwxr-xr-x 7 root 0 0 Sep 9 19:19 cpu0 drwxr-xr-x 7 root 0 0 Sep 9 19:19 cpu1 drwxr-xr-x 7 root 0 0 Sep 9 19:19 cpu2 drwxr-xr-x 7 root 0 0 Sep 9 19:19 cpu3 drwxr-xr-x 2 root 0 0 Sep 9 19:19 cpufreq drwxr-xr-x 2 root 0 0 Sep 9 19:19 cpuidle [...]
estuary:/$ cat /sys/devices/system/cpu/possible 0-5 estuary:/$ cat /sys/devices/system/cpu/present 0-5 estuary:/$ cat /sys/devices/system/cpu/offline 4-5 estuary:/$
At Qemu
$QEMUBIN --enable-kvm -machine virt,gic-version=3 -cpu host -smp
cpus=4,maxcpus=6
-append "console=ttyAMA0 root=/dev/ram earlycon rdinit=/init maxcpus=4
acpi=force"
The intention of this series is to do this as pure policy.
I anticipate pressure on the "use the MADT GICR" line, even though ACPI
doesn't
say anything about the presence of MADT GICC's redistributor entry. If this
happens,
we'd depend on present meaning present.
If we are confident that flag ACPI_STA_DEVICE_ENABLED is being set properly
by
ARM and other architecture firmware, then Qemu can take care of that policy.
It
has all the information of the vcpus which are possible and disabled (but
are
online capable). We can use this info to conditionally return appropriate
status
when _STA ACPI method is evaluated.
I intentionally refrained to use the this approach in my first RFC[1] as
the
default code in the acpi_processor.c was only making use of the ACPI_STA_DEVICE_PRESENT bit after evaluation of _STA method. Qemu was also setting only present bit in the returned status value. Plus, I wanted to minimize the changes in the kernel in the first version of the RFC.
[1]
https://lore.kernel.org/qemu-devel/38a034f82da78b8861af6d25a83fddea@kernel. org/T/#m4586668cb8e3acf0426c1de2c520f85cea78f142
All the hotplug/package-hotadd machinery is triggered by udev. We don't
need
to hack the cpu present mask to make that work.
May I know what exactly are your apprehensions with 'udev'?
As such 'udev' should make use of the Linux device model and it is not necessary to present 1:1 picture of the hardware to the abstract model(and which by
the way
we are not doing by not registering the disabled cpus). It will just expose
that
limited picture of the hardware to the user whatever is being presented by
the
kernel.
AFAICS it should work just fine but we need to limit the present cpus.
Question: Q1: Current acpi_processor.c code is not using ACPI_STA_DEVICE_{ENABLED,
UI}
bits. Could it break other architecture if we use these bits but some of
their
legacy devices or firmware does not initialize these bits to their defaults?
Almost certainly! I'm pretty confident some vendors generate their ACPI
tables
using markov-models. (It boots! Ship it!)
The approach that used the UI bit to mean sysfs had to be hidden behind
a Kconfig
symbol, which is only marginally better than #ifdef CONFIG_ARM64.
If there are problems in using the ACPI_STA_DEVICE_UI Bit because it might conflict with the legacy firmware of other architectures then let us drop
that.
We can alternatively use the ACPI_STA_DEVICE_ENABLED Bit in the _STA method which can be conditionally set by the Qemu?
This new version walks a fine line described in the cover-letter: any platform with firmware tables that get this wrong should get the same user-experience
as there
is no policy enforcement on x86, so the !online_capable CPUs can be detected as
being
online, and the policy stuff gets ignored.
Yes, I do understand your predicament, but ideally user experience is dictated by what *end* user sees. Here, by not masking the disabled cpus in the cpu
present
mask user will not have similar experience on ARM64 and x86_64 platforms
and that
is undeniable and will in the end matter the most since this feature will
mostly
be used on the servers.
Thanks Salil
Subject: Digest Footer
Linaro-open-discussions mailing list --
linaro-open-discussions@op-lists.linaro.org
To unsubscribe send an email to
linaro-open-discussions-leave@op-lists.linaro.org
End of Linaro-open-discussions Digest, Vol 24, Issue 4