Hi James,
From: James Morse james.morse@arm.com Sent: Thursday, June 8, 2023 5:47 PM Hi Salil,
On 07/06/2023 12:29, Salil Mehta wrote:
From: James Morse james.morse@arm.com Sent: Wednesday, June 7, 2023 11:00 AM On 06/06/2023 15:49, Salil Mehta wrote:
From: James Morse james.morse@arm.com Sent: Tuesday, June 6, 2023 11:32 AM On 05/06/2023 11:26, Salil Mehta wrote:
Few questions related to new _OSC capabilities to negotiate the type of CPU hotplug(if any):
Q1: ACPI Standard Update: Do we have any Bugzilla-id for the change related to these new _OSC capabilities for the type of CPU hotplug support (i.e. Enabled/Present cpu)?
I don't think so. I was intending on this being code-first, as and when this is on the mailing list I'd poke the arm people that look after the ACPI stuff to do ... what ever is necessary.
Ok. When do you plan to send the 'RFC vN' with this _OSC change?
Once I can test it! How are the qemu changes going?
There is no point posting it without testing the changes Oliver proposed. I've done bits of that with the kvmtool version I shared, but no-one can test this thing end to end until the Qemu side of it is updateed to match.
Sure, but to be able to test Qemu patches one need your rebased version of the kernel. It is up to you whether you want to make it public or not. I did wait for that and then proceeded to create my own version of your 24th Feb 2023 Kernel changes.
This:? https://op-lists.linaro.org/archives/list/linaro-open-discussions@op-lists.l...
Yes - that was to allow the Qemu changes to be tested. I'm aware of the chicken/egg problem.
I didn't keep rebasing it as it didn't look like any progress was being made.
Ok, yes. I was in mid of testing/fixing various other bugs you and others had reported at that time and testing the migration part as agreed on the 3th Feb 2023 LOD call.
But earlier in March I got pulled into some higher priority project with immediate deadline (prioritized by management internally) so had to focus totally on that during March and April and only picked-up in May when I realized Oliver's patches had already become part of the mainline at the start of April.
Plus, the branch mentioned within the messages in above link you shared
https://gitlab.arm.com/linux-arm/linux-jm/-/tree/virtual_cpu_hotplug/snapsho...
was missing some fixes reported by Miguel to me through a different channel. These fixes were present in your similar but other RFC branch of 24th Feb 2024 (below link).
https://gitlab.arm.com/linux-arm/linux-jm/-/tree/virtual_cpu_hotplug/rfc/sna...
Hence, I had to create a new kernel version in any case to test all the reported fixes as no single kernel branch had all the fixes in it.
But yes, I agree it is a chicken-egg problem, Anyways it’s a history now and we do have latest working Qemu and kernel branches here. Vishnu and Miguel have lightly tested it and basic working seems to be okay and further testing is in progress.
Qemu Repo of the latest port (with Oliver's Uptons Patches): Link: https://github.com/salil-mehta/qemu.git virt-cpuhp-armv8/rfc-v1-port11052023.dev-1
Thanks for this,
Maybe, you can share a quick port of your kernel changes on gitlab so that people can get untied although I can share my own version of the port as well?
There is no point doing it quickly if its not done correctly. I'll try and post an RFC before kvm-forum but I can't promise anything.
Agreed. You can take your time. No rush. Testing is being done using the kernel branch I had shared. Please share your branch once you are convinced and done your part of testing. Your branch can be used later for final testing.
Q2: Have you intentionally omitted below function in the _OSC related kernel change shared in the repository [1]? bool acpi_processor_hotplug_enabled_supported(void)
Is this a build error? Or do you mean the lack of symmetry between handshaking support for the present and enabled bits?
There is no reason to forbid the enable/disabled support if firmware didn't do the handshake. The handshake is there for firmwares benefit, not the OS.
Do we always need to support any form of hotplug in kernel? Can we have below:
- Physical Hotplug Support
- Virtual Hotplug Support
The kernel can't tell these two apart. What we actually have is:
- Dynamically making a CPU present/not-present
- Dynamically making a CPU enabled/not-enabled
I see no reason not to enable (2) for all architectures, unless it turns out x86 has platforms which randomly change that bit assuming its ignored.
Agreed. So the check for both capabilities is a requirement.
Not sure where that came from. The OS should advertise which of these its supports. On arm64, firmware can rely on both being advertised, as neither are supported today. On x86, firmware can only rely on the enabled version, as existing kernels support toggling present, but don't use the _OSC.
Ah, yes, I missed the last point.
Could we have a case where Hotplug is not supported at all?
Yes, that is what arm64 has today.
Should we put a print in the kernel checking nothing else but these capabilities to detect any such random event in case hotplug is not supported at all?
The series adds something like that for the present case: https://gitlab.arm.com/linux-arm/linux-jm/-/commit/8e3e891c8fff7260f14a2c21a...
I still don't see a reason to disable the 'enabled' version.
- No Hotplug Support at all i.e. defaults to earlier kernel behavior without using any other information except capabilities from firmware/VMM
I'm not sure what your question is here. Why would you want to disable OS support for toggling the enabled bit? Blocking eject-requests is the only reason I can think of ... but there is no-way to do that today... why do we need one?
Sorry, I just meant to disable Hotplug support by detecting these _OSC capability Bits.
In firmware, sure - that is what the _OSC bits are for. In the OS ... that would be extra work, and for what reason? If the OS doesn't support it, the device-check/eject-request stuff will never be triggered.
The _OSC is for firmware's benefit, I don't see a need to try and double- check that the firmware implemented it correctly.
In contrast toggling the present bit is forbidden if the OS said it didn't support it, as the architecture code expects the machine to catch fire if you do this.
I do understand and we have an agreement on this. We are not manipulating _STA.PRES bit.
Q3: I am sure you would want to extend usage of these APIs in the acpi_processor.c as well to decide what kind of hotplug (if any) is enabled? Is code related to _OSC changes complete?
This is a leading question, but I can't tell what it is...
I haven't been able to test that _OSC code beyond what I can hack up with the FVP, so there are probably bits missing. Did you have something specific in mind?
Ok.
Why not use the 'osc_sb_hotplug_enabled_support_confirmed' flag to totally disable ANY kind of cpu hotplug at init time?
Because that would be extra work and extra code. Why is it needed?
I think I'm missing the use-case for linux not supporting this when the firmware does.
Just to make aware guest user about any such spurious hotplugging event when VMM/firmware does not supports it.
This sounds like validating firmware from the OS. I'd prefer not to add extra code doing this.
Just a knob to enable/disable the hotplug behavior without relying on the ACPI _STA.{PRES, ENA} Bits?
For what reason? Why would someone want to do this? Without a reason is pointless configurability that has to be tested.
The only reason I can fathom is "I don't trust my firmware - ignore its attempts to offline CPUs" ... but firmware can prevent them coming online in the first place, and eject-request is the one thing that does work on arm64 today (by accident). I think we can kick this down the road until someone has a strong reason for needing it.
Ok. Let us defer this discussion.
Thanks Salil.