Hi Salil,
On 31/10/2022 16:48, James Morse via Linaro-open-discussions wrote:
On 31/10/2022 12:40, Salil Mehta wrote:
From: James Morse [mailto:james.morse@arm.com] Sent: Friday, October 28, 2022 5:17 PM To: Salil Mehta salil.mehta@huawei.com; joyce.qi@linaro.org Cc: linaro-open-discussions@op-lists.linaro.org; lorenzo.pieralisi@linaro.org; ilkka@os.amperecomputing.com; Jean-Philippe Brucker jean-philippe.brucker@arm.com; salil.mehta@opnsrc.net Subject: Re: [Linaro-open-discussions] Re: Invitation: Linaro Open Discussions monthly meeting @ Wed 5 Oct 2022 19:00 - 20:00 (HKT) (linaro-open-discussions@op-lists.linaro.org)
Still using KVM, firmware should clear the enabled bit from _STA for any call after the eject-request. I see this is still 0xF before _EJx is called.
This is how the existing handshake protocol works. Firmware intimates the OS about ejection request which confirms its status about ejection-in-progress. Firmware/VMM then sends the ACPI event to remove the CPU from the kernel(which effectively is try-to-offline process and making not-present), Until this point _STA.ENABLED=1 (which is a correct behavior since firmware/VMM cannot disable the CPU before it has been made offline by the OS - can we snatch the physical resources before completing the housekeeping at OS?). Once offline'ing/removal is done, OS then informs the firmware/VMM about this completion using _EJ0 method (Which is effective way of giving green signal to the firmware/VMM to go ahead and remove the physical resources).
Okay, the scope is narrower than after the eject-request... The problem is apci_processor's remove method calls _STA, and finds "no change", so it does nothing. But it appears this is what Qemu does on x86 too. Once _EJ0 has been called, I'd expect _STA to reflect the conclusion of the eject-request.
Linux needs to see _STA change, as otherwise it can't know which bits in _STA have changed.
Firmware then removes the device and any further evaluation of _STA would show _STA.ENABLED=0 and ideally should show is not present(but this is something which I have not changed in the QEMU - maybe we need to do or does it even matter at QEMU level?).
No, it looks like linux is calling these in the wrong order. The ACPI processor driver needs to have its remove call made once the CPU is offline and _EJ0 has been called,...
(I thought this is what qemu was doing on x86, but now that I test it again...!)
[..]
It looks like a post_eject callback on the scan handler is the right way to do this.
This works - but now it looks like you are returning 0 from _STA after _EJ0. Please don't do this, the present bit needs to remain set as while the CPU has been disabled, the redistributor, any RAS ERR nodes, and anything else that hangs from the CPU is still present.
Clearing that bit takes extra work, which has not been defined for the firmware or the kernel. The only way to win is not to play!
Does anyone think we need _OSI strings for this? If so, now is the time to define them. My suggestions would be "ACPI0007 present updates" and "ACPI0007 enabled updates", which makes it fairly clear this is about toggling bits in the processor object.
It sounds like kubernetes might have some kind of gui that shows something based on the cpu present mask. If its needed, my current thinking is to have a cpu_enabled_mask in the kernel, (that defaults to be the cpu_present_mask on all but arm64), and use that to filter cpus out of the sysfs 'offline' file. This avoids creating new files in sysfs, and may even fix whatever it is that kubernetes is doing...
Thanks,
James IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.