Hi Vishnu, Salil,
On 14/03/2023 10:14, Vishnu Pajjuri wrote:
On 01-02-2023 23:18, James Morse via Linaro-open-discussions wrote:
On 25/01/2023 09:40, Salil Mehta wrote:
Qemu isn't responding with PSCI_DENIED when CPUs are forbidden. ('SUCCESS' means you hit a 5 second timeout in the guest, on each CPU)
I have tested the straight forward case and it works. Could you please elaborate on this so that I can look into the issue?
Sorry for the delay'd response - (I've been debugging CPU_SUSPEND and the arch-timer when using this PSCI to user-space stuff).
This test ran with a vanilla v6.1 kernel, so it doesn't have the PSCI or HVC to user-space capabilities. It looks like Qemu ignores this, and offers the policy-management/hotplug stuff anyway. The result is hitting the 5-second timeout whenever a 'denied' CPU fails to come online. This kind of thing will break migration if a guest using these features is allowed to start on a host that doesn't have them.
While I'm trying VM live migration with vcpu hotplug stuff I observed a self-detected stall on vCPUs on guest And note that I did no vcpu hotplug/hot unplug before or after or during VM live migration
is it the similar issue you guys mentioned here?
It's not. The description above was about the enforcement disappearing over a migration because the destination host kernel lacked the kernel patches.
What you are seeing is new.
The scenario is as follows 1. Migrate VM from src.host to dest.host Migration is successful, didn't see any CPU stall messages on the console 2. Migrate the same VM from dest.host to src.host Migration is successful, but self-detected stalls were observed on CPUs in the guest
It sounds like Qemu hasn't migrated its own PSCI state, which holds which CPUs are on/off. The result is the guest on CPU-0 thinks CPU-3 is on, but Qemu thinks its off, hence the stall warnings.
It may be possible to store the on/off state in the KVM vCPU's mp_state if that makes it easier for Qemu to migrate this state. This shouldn't get used by KVM as the PSCI CPU_AFFINITY call has moved to user-space, and Qemu won't (shouldn't) call VCPU_RUN against a vCPU that is PSCI:off when Qemu is managing PSCI.
Thanks for the bug report!
Thanks,
James
The kernel version on all three(src.host, dest.host and VM) machines is same. https://git.gitlab.arm.com/linux-arm/linux-jm.git virtual_cpu_hotplug/rfc/v/0.3 # uname -r 6.1.0-rc2+
Qemu version on both(src.host, dest.host) hosts is same https://github.com/salil-mehta/qemu.git virt-cpuhp-armv8/rfc-v1-port29092022.psci.present # qemu-system-aarch64 --version QEMU emulator version 7.1.50 (v5.0.0-21802-gbe6cc696c2)
At the same time, I tried distro's qemu binary for live migration, it works fine # qemu-system-aarch64 --version QEMU emulator version 7.0.0 (qemu-7.0.0-14.fc37)
VM vCPU config: current 4, maximum 8 Note: Live migration performed with virsh commands.
VM console log: [ 60.881616] rcu: INFO: rcu_sched self-detected stall on CPU [ 60.884881] rcu: 0-...!: (1 ticks this GP) idle=c8b4/0/0x1 softirq=4255/4255 fqs=0 [ 60.888888] (t=53688 jiffies g=2773 q=30 ncpus=4) [ 60.891386] rcu: rcu_sched kthread timer wakeup didn't happen for 53687 jiffies! g2773 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 [ 60.897325] rcu: Possible timer handling issue on cpu=0 timer-softirq=531 [ 60.901491] rcu: rcu_sched kthread starved for 53688 jiffies! g2773 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0 [ 60.907503] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior. [ 60.912603] rcu: RCU grace-period kthread stack dump: [ 60.915490] task:rcu_sched state:I stack:0 pid:14 ppid:2 flags:0x00000008 [ 60.920284] Call trace: [ 60.921691] __switch_to+0xc8/0xf0 [ 60.923654] __schedule+0x200/0x5d0 [ 60.925661] schedule+0x58/0xf0 [ 60.927449] schedule_timeout+0x90/0x160 [ 60.929675] rcu_gp_fqs_loop+0x11c/0x3f4 [ 60.931897] rcu_gp_kthread+0x158/0x1bc [ 60.934076] kthread+0xd0/0xd4 [ 60.935826] ret_from_fork+0x10/0x20 [ 60.937859] rcu: Stack dump where RCU GP kthread last ran: [ 60.940922] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.1.0-rc2+ #1 [ 60.944213] Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20221117gitfff6d81270b5-14.fc37 11/17/2022 [ 60.948968] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 60.952133] pc : arch_cpu_idle+0x18/0x2c [ 60.953912] lr : arch_cpu_idle+0x10/0x2c [ 60.955723] sp : ffffd76c41f43d60 [ 60.957204] x29: ffffd76c41f43d60 x28: 0000000078bb0010 x27: 000000013bbefa88 [ 60.960107] x26: 000000013bbefa80 x25: 000000013e635d98 x24: 000000000016dcd0 [ 60.963001] x23: 0000576c380a0000 x22: ffffd76c41f63c80 x21: ffffd76c41f63c80 [ 60.965975] x20: ffffd76c419b5008 x19: ffffd76c419b5008 x18: 0000000000000000 [ 60.968621] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [ 60.971168] x14: 0000000000000000 x13: 0000000000000000 x12: ffffd76c41f4f8d8 [ 60.973709] x11: 0000000000000001 x10: 95fa3f5fd68e625e x9 : ffffd76c401a70d4 [ 60.976296] x8 : ffffd76c41f65dc8 x7 : 0000000000000004 x6 : 00000010c242cffc [ 60.978592] x5 : 4000000000000000 x4 : ffff2894bd126000 x3 : ffff2894bd126000 [ 60.980848] x2 : ffff2894bd126000 x1 : ffff0000feaeb530 x0 : 00000000000000e0 [ 60.983104] Call trace: [ 60.983881] arch_cpu_idle+0x18/0x2c [ 60.985025] default_idle_call+0x50/0x114 [ 60.986311] cpuidle_idle_call+0x154/0x1a0 [ 60.987586] do_idle+0xb8/0x110 [ 60.988471] cpu_startup_entry+0x30/0x34 [ 60.989576] rest_init+0xdc/0xe0 [ 60.990486] arch_post_acpi_subsys_init+0x0/0x28 [ 60.991782] start_kernel+0x47c/0x498 [ 60.992811] __primary_switched+0xbc/0xc4 [ 60.993945] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.1.0-rc2+ #1 [ 60.995693] Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20221117gitfff6d81270b5-14.fc37 11/17/2022 [ 60.998283] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 61.000084] pc : arch_cpu_idle+0x18/0x2c [ 61.001102] lr : arch_cpu_idle+0x10/0x2c [ 61.002118] sp : ffffd76c41f43d60 [ 61.002978] x29: ffffd76c41f43d60 x28: 0000000078bb0010 x27: 000000013bbefa88 [ 61.004826] x26: 000000013bbefa80 x25: 000000013e635d98 x24: 000000000016dcd0 [ 61.006682] x23: 0000576c380a0000 x22: ffffd76c41f63c80 x21: ffffd76c41f63c80 [ 61.008425] x20: ffffd76c419b5008 x19: ffffd76c419b5008 x18: 0000000000000000 [ 61.010149] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [ 61.011877] x14: 0000000000000000 x13: 0000000000000000 x12: ffffd76c41f4f8d8 [ 61.013605] x11: 0000000000000001 x10: 95fa3f5fd68e625e x9 : ffffd76c401a70d4 [ 61.015331] x8 : ffffd76c41f65dc8 x7 : 0000000000000004 x6 : 00000010c242cffc [ 61.017024] x5 : 4000000000000000 x4 : ffff2894bd126000 x3 : ffff2894bd126000 [ 61.018652] x2 : ffff2894bd126000 x1 : ffff0000feaeb530 x0 : 00000000000000e0 [ 61.020271] Call trace: [ 61.020830] arch_cpu_idle+0x18/0x2c [ 61.021654] default_idle_call+0x50/0x114 [ 61.022571] cpuidle_idle_call+0x154/0x1a0 [ 61.023514] do_idle+0xb8/0x110 [ 61.024232] cpu_startup_entry+0x30/0x34 [ 61.025134] rest_init+0xdc/0xe0 [ 61.025894] arch_post_acpi_subsys_init+0x0/0x28 [ 61.026934] start_kernel+0x47c/0x498 [ 61.027750] __primary_switched+0xbc/0xc4 [ 61.028636] Task dump for CPU 3: [ 61.029346] task:swapper/3 state:R running task stack:0 pid:0 ppid:1 flags:0x00000008 [ 61.031534] Call trace: [ 61.032074] __switch_to+0xc8/0xf0 [ 61.032827] ct_kernel_exit.constprop.0+0x84/0xb0 [ 61.033861] arch_cpu_idle+0x18/0x2c [ 61.034649] default_idle_call+0x50/0x114 [ 61.035530] cpuidle_idle_call+0x154/0x1a0 [ 61.036430] do_idle+0xb8/0x110 [ 61.037131] cpu_startup_entry+0x30/0x34 [ 61.037987] secondary_start_kernel+0xe0/0x104 [ 61.038966] __secondary_switched+0xb0/0xb4 [ 88.448798] rcu: INFO: rcu_sched self-detected stall on CPU [ 88.450066] rcu: 3-...!: (1 ticks this GP) idle=24f4/1/0x4000000000000000 softirq=3722/3722 fqs=1 [ 88.452158] (t=230585897971 jiffies g=2773 q=817 ncpus=4) [ 88.454360] Task dump for CPU 0: [ 88.455031] task:systemd state:R running task stack:0 pid:873 ppid:1 flags:0x00000000 [ 88.457115] Call trace: [ 88.457630] __switch_to+0xc8/0xf0 [ 88.458352] 0x0 [ 88.458818] Task dump for CPU 1: [ 88.459569] task:systemd-userwor state:R running task stack:0 pid:914 ppid:663 flags:0x0000080c [ 88.461747] Call trace: [ 88.462248] __switch_to+0xc8/0xf0 [ 88.462940] __schedule+0x200/0x5d0 [ 88.463658] 0x0 [ 88.464036] Task dump for CPU 2: [ 88.464705] task:systemd-userwor state:R running task stack:0 pid:670 ppid:663 flags:0x00000a00 [ 88.466729] Call trace: [ 88.467231] __switch_to+0xc8/0xf0 [ 88.467930] del_timer_sync+0x48/0xa4 [ 88.468691] schedule_timeout+0x98/0x160 [ 88.469526] sockfs_ops+0x0/0x100 [ 88.470219] CPU: 3 PID: 1 Comm: systemd Not tainted 6.1.0-rc2+ #1 [ 88.471457] Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20221117gitfff6d81270b5-14.fc37 11/17/2022 [ 88.473405] pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 88.474814] pc : pcpu_memcg_post_alloc_hook+0x40/0xd0 [ 88.475861] lr : pcpu_alloc+0x4e0/0xac4 [ 88.476655] sp : ffff80000805ba90 [ 88.477330] x29: ffff80000805ba90 x28: ffff0000d2246500 x27: 000000000000a408 [ 88.478779] x26: 0000000000000008 x25: ffffd76c4039b1a4 x24: ffffd76c416e8fa8 [ 88.480235] x23: 0000d36b32cf4408 x22: 0000000000400cc0 x21: 0000000000000560 [ 88.481688] x20: 00000000000002b0 x19: ffff0000d0490980 x18: 0000000000000000 [ 88.483144] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [ 88.484600] x14: 0000000000000002 x13: 0000000000000102 x12: 0000000000000001 [ 88.486064] x11: 00000000000000ac x10: 0000000000000401 x9 : ffffd76c4039aba0 [ 88.487521] x8 : fffffbffefef36b8 x7 : 0000000000000000 x6 : 000000000000003f [ 88.488972] x5 : 0000000000000040 x4 : fffffffffffffff0 x3 : 00000000000002b0 [ 88.490425] x2 : 0000000000002902 x1 : ffff0000d2246500 x0 : ffff800008ae5000 [ 88.491876] Call trace: [ 88.492381] pcpu_memcg_post_alloc_hook+0x40/0xd0 [ 88.493343] pcpu_alloc+0x4e0/0xac4 [ 88.494051] __alloc_percpu_gfp+0x20/0x2c [ 88.494964] mem_cgroup_alloc+0x108/0x2bc [ 88.495848] mem_cgroup_css_alloc+0x68/0x240 [ 88.496770] cgroup_apply_control_enable+0x140/0x32c [ 88.497850] cgroup_mkdir+0xf4/0x1e0 [ 88.498621] kernfs_iop_mkdir+0x68/0xa0 [ 88.499453] vfs_mkdir+0xb8/0x170 [ 88.500173] do_mkdirat+0x130/0x144 [ 88.500928] __arm64_sys_mkdirat+0x58/0x8c [ 88.501812] invoke_syscall+0x50/0x120 [ 88.502624] el0_svc_common.constprop.0+0x4c/0xf4 [ 88.503636] do_el0_svc+0x34/0x4c [ 88.504357] el0_svc+0x34/0xdc [ 88.505018] el0t_64_sync_handler+0xf4/0x120 [ 88.505946] el0t_64_sync+0x190/0x194 [ 88.510138] systemd-journald[572]: Oldest entry in /var/log/journal/165dbe8e5fa0400fa86b537c22caaa1f/system.journal is older than the configured file retention duration (1month), suggesting rotation. [ 88.517590] systemd-journald[572]: /var/log/journal/165dbe8e5fa0400fa86b537c22caaa1f/system.journal: Journal header limits reached or header out-of-date, rotating.