Hi Salil,
On 03/11/2022 15:30, Salil Mehta wrote:
Hi James,
From: James Morse [mailto:james.morse@arm.com] Sent: Thursday, November 3, 2022 11:47 AM To: Salil Mehta salil.mehta@huawei.com; salil.mehta@opnsrc.net; mehta.salil.lnk@gmail.com Cc: Russell King linux@armlinux.org.uk; Jonathan Cameron jonathan.cameron@huawei.com; Lorenzo Pieralisi lorenzo.pieralisi@linaro.org; Jean-Philippe Brucker jean-philippe@linaro.org; linaro-open-discussions@op-lists.linaro.org Subject: Enabling/Disabling vCPUs and Qemu 'coldplug'
You mentioned 'cold plug behaves differently'.
Yes, it was because cold plugged vCPUs will have GICC.flags.Enabled Bit set and you mentioned below in one of the documentation patch:
<<excerpt from the patch[1]>> [...] +CPUs described as ``enabled`` in the static table, should not have their _STA +modified dynamically by firmware. Soft-restart features such as kexec will +re-read the static properties of the system from these static tables, and +may malfunction if these no longer describe the running system. [...]
This would be a bug in Qemu, if those CPUs are meant to be removable, they should be 'not-enabled and online-capable' in the MADT:GICC. Just because they were present at boot doesn't affect whether they are removable.
Linux will try to bring them online during boot regardless, and will poke _STA to register the CPUs.
The problem is if you kexec an older kernel that doesn't understand PSCI_DENIED, you'll get a bunch or warnings printed at boot. If you kexec that other operating system, (I don't know why you would), it will choke on enabled CPUs that it can't online.
Linux may be changed in the future to track online-capable and use it to enforce what firmware is allowed to do. I'm not a fan of this sort of thing, but its worth making sure Qemu's firmware descriptions are correct.
[1] [RFC PATCH v0.1 25/25] arm64: document virtual CPU hotplug's expectations
From the last call you described 'coldplug' as starting Qemu with '-S' so it doesn't actually run the guest, then adding vCPUs before releasing Qemu to run the guest. You said CPUs added this way can't be disabled.
(am I right so far?)
Yes, that is one of the way cold plugged vCPUs could be added. I think on x86 there is a way to distinguish cold-booted vCPUs which are managed by applications (i.e. will have 'Id') and the ones which will not be (so 'Id' get assigned automatically)
The 'id' right now do not appear with very helpful naming so for the initial patches I used some very naïve way of allocating the 'Ids' and which needs to be eventually changed. Please check[2].
I'll file this under qemu-internals!
This turns out to be a bit murkier than that. You can disable these vCPUs, but the first call will fail. The reason is very simple: Qemu is sending a device-check for the first call, not an eject-request.
Ok, it looks odd. I need to rest this case. I have not properly tested this case.
Linux prints a warning for the spurious device-check because the CPU already exists and is even online.
An example flow, with the below debug[0], is:
# qemu -S smp cpus=1,maxcpus=3,cores=3,threads=1,sockets=1 ${REST_OF_OPTIONS}
On the Qemu monitor: | device_add driver=host-arm-cpu,core-id=1,id=cpu1 | cont
[Qemu boots the guest]
acpi_processor_add() is called twice during boot, once for vCPU0 and once for the vCPU that was 'coldplugged' vCPU1.
On the Qemu monitor: | device_del cpu1
[ 56.427089] ACPI: XYZZY:ACPI_NOTIFY_DEVICE_CHECK on ACPI0007:1 [ 56.428239] ACPI: XYZZY: acpi_scan_device_check() 1 | 1 [ 56.429335] CPU: 1 PID: 105 Comm: kworker/u6:2 Not tainted 6.1.0-rc2-00028-g6eaecb5ffd26-dirty #14644 [ 56.431043] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [ 56.432431] Workqueue: kacpi_hotplug acpi_hotplug_work_fn [ 56.433520] Call trace: [ 56.434015] dump_backtrace.part.0+0xe0/0xf0 [ 56.434875] show_stack+0x18/0x40 [ 56.435546] dump_stack_lvl+0x68/0x84 [ 56.436308] dump_stack+0x18/0x34 [ 56.436983] acpi_device_hotplug+0x234/0x4e0 [ 56.437847] acpi_hotplug_work_fn+0x24/0x40 [ 56.438695] process_one_work+0x1d0/0x320 [ 56.439515] worker_thread+0x14c/0x444 [ 56.440283] kthread+0x10c/0x110 [ 56.440935] ret_from_fork+0x10/0x20 [ 56.441680] acpi ACPI0007:01: Already enumerated
[ This is because Qemu is adding a CPU that already exists ]
[Out of quick speculation] Not sure why but just going through it quickly looks like it could also happen since 'Ids' in Qemu are conflicting and the check to verify if the 'Id' already exists is missing. AFAICR, this was one of the pending items in Qemu. Please check the earlier discussion[1] on this with Igor Mammedov.
Suggestion was to use below library for generating Ids. Maybe this change could be common to both x86 and ARM eventually.
Patch: util - add automated ID generation utility File: https://github.com/qemu/qemu/blob/master/util/id.c Commit-id: https://github.com/qemu/qemu/commit/a0f1913637e6
...Will debug later today and get back to you with the confirmation.
Thanks. No need for that to be today, I don't think its any more urgent than the other bits. (PSCI support in Qemu for PSCI_DENIED and _STA describing CPUs as present).
Thanks,
James