[Linaro-open-discussions] Re: Invitation: Linaro Open Discussions monthly meeting @ Wed 5 Oct 2022 19:00 - 20:00 (HKT) (linaro-open-discussions@op-lists.linaro.org)

1 Nov 2022


      Hi James,
...
From: James Morse [mailto:james.morse@arm.com]
Sent: Monday, October 31, 2022 5:51 PM
To: Salil Mehta salil.mehta@huawei.com; joyce.qi@linaro.org
Cc: linaro-open-discussions@op-lists.linaro.org;
lorenzo.pieralisi@linaro.org; ilkka@os.amperecomputing.com; Jean-Philippe
Brucker jean-philippe.brucker@arm.com; salil.mehta@opnsrc.net
Subject: Re: [Linaro-open-discussions] Re: Invitation: Linaro Open Discussions
monthly meeting @ Wed 5 Oct 2022 19:00 - 20:00 (HKT)
(linaro-open-discussions@op-lists.linaro.org)
Hi Salil,
On 31/10/2022 16:48, James Morse via Linaro-open-discussions wrote:
...
On 31/10/2022 12:40, Salil Mehta wrote:
...
...
From: James Morse [mailto:james.morse@arm.com]
Sent: Friday, October 28, 2022 5:17 PM
To: Salil Mehta salil.mehta@huawei.com; joyce.qi@linaro.org
Cc: linaro-open-discussions@op-lists.linaro.org;
lorenzo.pieralisi@linaro.org; ilkka@os.amperecomputing.com; Jean-Philippe
Brucker jean-philippe.brucker@arm.com; salil.mehta@opnsrc.net
Subject: Re: [Linaro-open-discussions] Re: Invitation: Linaro Open Discussions
monthly meeting @ Wed 5 Oct 2022 19:00 - 20:00 (HKT)
(linaro-open-discussions@op-lists.linaro.org)
...
...
...
Still using KVM, firmware should clear the enabled bit from _STA for any call after the
eject-request. I see this is still 0xF before _EJx is called.
This is how the existing handshake protocol works. Firmware intimates the
OS about ejection request which confirms its status about ejection-in-progress.
Firmware/VMM then sends the ACPI event to remove the CPU from the kernel(which
effectively is try-to-offline process and making not-present), Until
this point _STA.ENABLED=1 (which is a correct behavior since firmware/VMM
cannot disable the CPU before it has been made offline by the OS - can we
snatch the physical resources before completing the housekeeping at OS?).
Once offline'ing/removal is done, OS then informs the firmware/VMM about
this completion using _EJ0 method (Which is effective way of giving green
signal to the firmware/VMM to go ahead and remove the physical resources).
Okay, the scope is narrower than after the eject-request... The problem is
apci_processor's remove method calls _STA, and finds "no change", so it does nothing. But
it appears this is what Qemu does on x86 too.
Once _EJ0 has been called, I'd expect _STA to reflect the conclusion of the eject-request.
Linux needs to see _STA change, as otherwise it can't know which bits in _STA have changed.
...
Firmware then removes the device and any further evaluation of _STA would
show _STA.ENABLED=0 and ideally should show is not present(but this is
something which I have not changed in the QEMU - maybe we need to do or
does it even matter at QEMU level?).
No, it looks like linux is calling these in the wrong order.
The ACPI processor driver needs to have its remove call made once the CPU is offline and
_EJ0 has been called,...
(I thought this is what qemu was doing on x86, but now that I test it again...!)
[..]
It looks like a post_eject callback on the scan handler is the right way to do this.
This works - but now it looks like you are returning 0 from _STA after _EJ0.
Please don't do this, the present bit needs to remain set as while the CPU has been
disabled, the redistributor, any RAS ERR nodes, and anything else that hangs from the CPU
is still present.
Sure, as mentioned in the earlier update, I intentionally deferred this change in the QEMU.
I will introduce this i.e. after _EJ0, flags would be _STA.Enabled=0 AND _STA.PRESENT=1
...
Does anyone think we need _OSI strings for this? If so, now is the time to define them.
My suggestions would be "ACPI0007 present updates" and "ACPI0007 enabled updates", which
makes it fairly clear this is about toggling bits in the processor object.
It sounds like kubernetes might have some kind of gui that shows something based on the
cpu present mask. If its needed, my current thinking is to have a cpu_enabled_mask in the
kernel, (that defaults to be the cpu_present_mask on all but arm64), and use that to
filter cpus out of the sysfs 'offline' file. This avoids creating new files in sysfs, and
may even fix whatever it is that kubernetes is doing...
Maybe, but if possible, I would still prefer to use cpu_present_mask.
It is dynamic.  I do understand, and as you mentioned earlier, that right
now present mask might be tied up with the presence of CPU in the ACPI.
File: https://elixir.bootlin.com/linux/latest/source/include/linux/cpumask.h
[...]
 *  If HOTPLUG is enabled, then cpu_present_mask varies dynamically,
 *  depending on what ACPI reports as currently plugged in, otherwise
 *  cpu_present_mask is just a copy of cpu_possible_mask.
{...]
Any upper layer using present mask should already know that it is dynamic
and should have proper handling when it changes. It's about the association
between the CPU being present in the Linux and being present at the ACPI.
Can we make this association conditional?
After all, we are already making presence of CPU devices conditional(by
calling arch_{un}register_cpu() selectively) although CPU will always
be present at the ACPI level?
AFAICS nothing breaks inside the kernel?
But yes, maybe it might be more safe to introduce a new mask. If we
decide to use another mask then there would be more changes around the
corner than just hiding sysfs entries/conditionally making CPU devices
appear/disappear.
For example, any upper layer which now rely on the present mask might
need to refer to this new enabled/available mask as well like CPUHP
state machine in kernel?
https://elixir.bootlin.com/linux/latest/source/kernel/cpu.c#L1335
static int _cpu_up(unsigned int cpu, int tasks_frozen, enum cpuhp_state target)
{
    [...]
if (!cpu_present(cpu)) { /* and similar check in _cpu_down() */
    	ret = -EINVAL;
    	goto out;
    }
[...]
}
Also, all future changes would now need to be careful with this new meaning
of vCPU 'being enabled' while it is present inside the kernel.
Thanks
Salil

2025

2024

2023

2022

2021

2020

[Linaro-open-discussions] Re: Invitation: Linaro Open Discussions monthly meeting @ Wed 5 Oct 2022 19:00 - 20:00 (HKT) (linaro-open-discussions@op-lists.linaro.org)