Hi Vishnu,
(CC: +Salil^2)
On 23/02/2023 06:10, Vishnu Pajjuri via Linaro-open-discussions wrote:
On 31-08-2022 16:39, James Morse wrote:
acpi_processor_get_info() registers all present CPUs. Registering a CPU is what creates the sysfs entries and triggers the udev notifications.
arm64 virtual machines that support 'virtual cpu hotplug' use the enabled bit to indicate whether the CPU can be brought online, as the existing ACPI tables require all hardware to be described and present.
If firmware describes a CPU as present, but disabled, skip the registration. Such CPUs are present, but can't be brought online for whatever reason. (e.g. firmware/hypervisor policy).
Once firmware sets the enabled bit, the CPU can be registered and brought online by user-space. Online CPUs, or CPUs that are missing an _STA method must always be registered.
Enumeration failure while trying to hot plug vcpus after guest reboot
Following sequence of steps causes the issue
- Boot VM with max 80 vcpus and current 72 vcpus(For example)
- hot plug 3 more vcpus
# virsh setvcpus vm2 75 --live 3. hot unplug 3 vcpus ( which were added earlier ) # virsh setvcpus vm2 72 --live 4. reboot VM # reboot 5. Again try to hot plug 3 vcpus -- ERROR # virsh setvcpus vm2 75 --live [ 47.367823] acpi ACPI0007:48: Enumeration failure [ 47.565173] acpi ACPI0007:49: Enumeration failure [ 47.703423] acpi ACPI0007:4a: Enumeration failure
I can reproduce this, thanks for the detailed description. (I didn't use 'virsh', that doesn't seem to be relevant)
I observed the following sequence while doing hotplug and unplug
When we do hotplug, making vcpu as *present*(_STA.Pres=1) and *enabled*(_STA.Enabled=1) When we do hot unplug, we just *disabled*(_STA.Enabled=0) vcpu but not *removed*(_STA.Pres=1)
In such a case CPU shall still be present at the firmware/VMM level and could be kept disabled i.e. _STA.Enable=0 and _STA.Present=1
After reboot, system is trying to enumerate vcpu(as *present*(_STA.Pres=1)) and it is marking flags flags.initialized = true and flags.visited = true
This would be no different to the first boot...
as the vcpu *disabled*(_STA.Enabled=0), vcpu has been made not present and flags flags.initialized and flags.visited were not cleared
(It's important that everything remains present all the time. When a CPU is disabled, its offline and unregistered, but all the bits and pieces of logic that go with the CPU are still present. This is what makes it safe to do this on physical hardware.)
And following fix resolves the issue
diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c index 558664d169fc..e93b933cb8a4 100644 --- a/drivers/acpi/scan.c +++ b/drivers/acpi/scan.c @@ -2158,6 +2158,9 @@ static int acpi_scan_attach_handler(struct acpi_device *device) break;
device->handler = NULL; + device->flags.initialized = false; + acpi_device_clear_enumerated(device); + device->flags.power_manageable = 0; if (ret < 0) break; }
I think the bigger question is why is acpi_dev_ready_for_enumeration() false on the first boot, and true on the second, when the state of the CPU should be the same in both cases. Nothing should have been preserved over the reboot.
~
Turns out Qemu is returning in '0' in _STA until the CPU is added, which makes it 0xf. Once its removed, it becomes 0xd.
Qemu should be reporting the CPU as at least present from the very beginning, as it is described in the MADT.
Salil, could look into this on the qemu side?
~
I'd much prefer we change the ACPI logic to only enumerate devices that are both present and enabled. Do the last four patches here, (or indeed the whole branch), fix this issue for you: https://gitlab.arm.com/linux-arm/linux-jm/-/tree/virtual_cpu_hotplug/rfc/sna...
The only reason not to do this, is a widely deploy set of hardware where only the present bit is set, because previously that was sufficient. If this is found to be true, the workaround would be to use acpi_dev_ready_for_enumeration() in all the places I've added acpi_device_is_present_and_enabled(), and abuse acpi_dev_ready_for_enumeration() to return different values depending on the type of device, the architecture and/or phase of the moon.
Thanks for the bug report!
James