From: James Morse james.morse@arm.com Sent: Tuesday, March 14, 2023 10:45 AM To: Vishnu Pajjuri vishnu@amperemail.onmicrosoft.com; Salil Mehta salil.mehta@huawei.com; Joyce Qi joyce.qi@linaro.org Cc: linaro-open-discussions@op-lists.linaro.org; Russell King linux@armlinux.org.uk; lorenzo.pieralisi@linaro.org; salil.mehta@opnsrc.net Subject: Re: [Linaro-open-discussions] Re: Qemu isn't responding with PSCI_DENIED when CPUs are forbidden.
Hi Vishnu, Salil,
On 14/03/2023 10:14, Vishnu Pajjuri wrote:
On 01-02-2023 23:18, James Morse via Linaro-open-discussions wrote:
On 25/01/2023 09:40, Salil Mehta wrote:
Qemu isn't responding with PSCI_DENIED when CPUs are forbidden.
('SUCCESS' means you
hit a 5 second timeout in the guest, on each CPU)
I have tested the straight forward case and it works. Could you please elaborate on this so that I can look into the issue?
Sorry for the delay'd response - (I've been debugging CPU_SUSPEND and
the arch-timer when
using this PSCI to user-space stuff).
This test ran with a vanilla v6.1 kernel, so it doesn't have the PSCI or
HVC to user-space
capabilities. It looks like Qemu ignores this, and offers the policy-management/hotplug stuff anyway. The result is hitting the 5-second timeout whenever a 'denied' CPU fails to come online. This kind of thing will break migration if a guest using these features is allowed to start on a host that doesn't have them.
While I'm trying VM live migration with vcpu hotplug stuff I observed a self-detected stall on vCPUs on guest And note that I did no vcpu hotplug/hot unplug before or after or during VM live migration
is it the similar issue you guys mentioned here?
It's not. The description above was about the enforcement disappearing over a migration because the destination host kernel lacked the kernel patches.
What you are seeing is new.
Yes, Live migration was never tested and does not works (In fact, discussed the same problem internally with Jonathan last week). I have been working on fixing Live Migration recently. Need some time.
The scenario is as follows 1. Migrate VM from src.host to dest.host Migration is successful, didn't see any CPU stall messages on the console 2. Migrate the same VM from dest.host to src.host Migration is successful, but self-detected stalls were observed on CPUs in the guest
It sounds like Qemu hasn't migrated its own PSCI state, which holds which CPUs are on/off. The result is the guest on CPU-0 thinks CPU-3 is on, but Qemu thinks its off, hence the stall warnings.
That’s correct. Debugging further as there are other bits too.
It may be possible to store the on/off state in the KVM vCPU's mp_state if that makes it easier for Qemu to migrate this state. This shouldn't get used by KVM as the PSCI CPU_AFFINITY call has moved to user-space, and Qemu won't (shouldn't) call VCPU_RUN against a vCPU that is PSCI:off when Qemu is managing PSCI.
It never does as it is only called when threads are launched.
Thanks