Hi James,
Qemu isn't responding with PSCI_DENIED when CPUs are forbidden. ('SUCCESS' means you hit a 5 second timeout in the guest, on each CPU)
I have tested the straight forward case and it works. Could you please elaborate on this so that I can look into the issue?
Thanks Salil
Hi Salil,
On 25/01/2023 09:40, Salil Mehta wrote:
Qemu isn't responding with PSCI_DENIED when CPUs are forbidden. ('SUCCESS' means you hit a 5 second timeout in the guest, on each CPU)
I have tested the straight forward case and it works. Could you please elaborate on this so that I can look into the issue?
Sorry for the delay'd response - (I've been debugging CPU_SUSPEND and the arch-timer when using this PSCI to user-space stuff).
This test ran with a vanilla v6.1 kernel, so it doesn't have the PSCI or HVC to user-space capabilities. It looks like Qemu ignores this, and offers the policy-management/hotplug stuff anyway. The result is hitting the 5-second timeout whenever a 'denied' CPU fails to come online. This kind of thing will break migration if a guest using these features is allowed to start on a host that doesn't have them.
As we should have a list of issues, the others are:
* libvirt, "first vcpu unplug times out", reported by Russell on the 5th-Jan. (private email to you an I)
* Guest can't shutdown properly, "reboot: Power down" is displayed, but then nothing happens. Reported by Russell on the 20th-December. It looks like the mail I sent here didn't get delivered... (sorry if this one is new!)
I've had a look at the KVM code for the last one, I don't think it requires all the vCPU threads to be in the kernel. I've also been unable to reproduce it.
Thanks,
James
Hi James/Salil,
On 01-02-2023 23:18, James Morse via Linaro-open-discussions wrote:
Hi Salil,
On 25/01/2023 09:40, Salil Mehta wrote:
Qemu isn't responding with PSCI_DENIED when CPUs are forbidden. ('SUCCESS' means you hit a 5 second timeout in the guest, on each CPU)
I have tested the straight forward case and it works. Could you please elaborate on this so that I can look into the issue?
Sorry for the delay'd response - (I've been debugging CPU_SUSPEND and the arch-timer when using this PSCI to user-space stuff).
This test ran with a vanilla v6.1 kernel, so it doesn't have the PSCI or HVC to user-space capabilities. It looks like Qemu ignores this, and offers the policy-management/hotplug stuff anyway. The result is hitting the 5-second timeout whenever a 'denied' CPU fails to come online. This kind of thing will break migration if a guest using these features is allowed to start on a host that doesn't have them.
While I'm trying VM live migration with vcpu hotplug stuff I observed a self-detected stall on vCPUs on guest And note that I did no vcpu hotplug/hot unplug before or after or during VM live migration
is it the similar issue you guys mentioned here?
The scenario is as follows 1. Migrate VM from src.host to dest.host Migration is successful, didn't see any CPU stall messages on the console 2. Migrate the same VM from dest.host to src.host Migration is successful, but self-detected stalls were observed on CPUs in the guest
The kernel version on all three(src.host, dest.host and VM) machines is same. https://git.gitlab.arm.com/linux-arm/linux-jm.git virtual_cpu_hotplug/rfc/v/0.3 # uname -r 6.1.0-rc2+
Qemu version on both(src.host, dest.host) hosts is same https://github.com/salil-mehta/qemu.git virt-cpuhp-armv8/rfc-v1-port29092022.psci.present # qemu-system-aarch64 --version QEMU emulator version 7.1.50 (v5.0.0-21802-gbe6cc696c2)
At the same time, I tried distro's qemu binary for live migration, it works fine # qemu-system-aarch64 --version QEMU emulator version 7.0.0 (qemu-7.0.0-14.fc37)
VM vCPU config: current 4, maximum 8 Note: Live migration performed with virsh commands.
VM console log: [ 60.881616] rcu: INFO: rcu_sched self-detected stall on CPU [ 60.884881] rcu: 0-...!: (1 ticks this GP) idle=c8b4/0/0x1 softirq=4255/4255 fqs=0 [ 60.888888] (t=53688 jiffies g=2773 q=30 ncpus=4) [ 60.891386] rcu: rcu_sched kthread timer wakeup didn't happen for 53687 jiffies! g2773 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 [ 60.897325] rcu: Possible timer handling issue on cpu=0 timer-softirq=531 [ 60.901491] rcu: rcu_sched kthread starved for 53688 jiffies! g2773 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0 [ 60.907503] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior. [ 60.912603] rcu: RCU grace-period kthread stack dump: [ 60.915490] task:rcu_sched state:I stack:0 pid:14 ppid:2 flags:0x00000008 [ 60.920284] Call trace: [ 60.921691] __switch_to+0xc8/0xf0 [ 60.923654] __schedule+0x200/0x5d0 [ 60.925661] schedule+0x58/0xf0 [ 60.927449] schedule_timeout+0x90/0x160 [ 60.929675] rcu_gp_fqs_loop+0x11c/0x3f4 [ 60.931897] rcu_gp_kthread+0x158/0x1bc [ 60.934076] kthread+0xd0/0xd4 [ 60.935826] ret_from_fork+0x10/0x20 [ 60.937859] rcu: Stack dump where RCU GP kthread last ran: [ 60.940922] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.1.0-rc2+ #1 [ 60.944213] Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20221117gitfff6d81270b5-14.fc37 11/17/2022 [ 60.948968] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 60.952133] pc : arch_cpu_idle+0x18/0x2c [ 60.953912] lr : arch_cpu_idle+0x10/0x2c [ 60.955723] sp : ffffd76c41f43d60 [ 60.957204] x29: ffffd76c41f43d60 x28: 0000000078bb0010 x27: 000000013bbefa88 [ 60.960107] x26: 000000013bbefa80 x25: 000000013e635d98 x24: 000000000016dcd0 [ 60.963001] x23: 0000576c380a0000 x22: ffffd76c41f63c80 x21: ffffd76c41f63c80 [ 60.965975] x20: ffffd76c419b5008 x19: ffffd76c419b5008 x18: 0000000000000000 [ 60.968621] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [ 60.971168] x14: 0000000000000000 x13: 0000000000000000 x12: ffffd76c41f4f8d8 [ 60.973709] x11: 0000000000000001 x10: 95fa3f5fd68e625e x9 : ffffd76c401a70d4 [ 60.976296] x8 : ffffd76c41f65dc8 x7 : 0000000000000004 x6 : 00000010c242cffc [ 60.978592] x5 : 4000000000000000 x4 : ffff2894bd126000 x3 : ffff2894bd126000 [ 60.980848] x2 : ffff2894bd126000 x1 : ffff0000feaeb530 x0 : 00000000000000e0 [ 60.983104] Call trace: [ 60.983881] arch_cpu_idle+0x18/0x2c [ 60.985025] default_idle_call+0x50/0x114 [ 60.986311] cpuidle_idle_call+0x154/0x1a0 [ 60.987586] do_idle+0xb8/0x110 [ 60.988471] cpu_startup_entry+0x30/0x34 [ 60.989576] rest_init+0xdc/0xe0 [ 60.990486] arch_post_acpi_subsys_init+0x0/0x28 [ 60.991782] start_kernel+0x47c/0x498 [ 60.992811] __primary_switched+0xbc/0xc4 [ 60.993945] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.1.0-rc2+ #1 [ 60.995693] Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20221117gitfff6d81270b5-14.fc37 11/17/2022 [ 60.998283] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 61.000084] pc : arch_cpu_idle+0x18/0x2c [ 61.001102] lr : arch_cpu_idle+0x10/0x2c [ 61.002118] sp : ffffd76c41f43d60 [ 61.002978] x29: ffffd76c41f43d60 x28: 0000000078bb0010 x27: 000000013bbefa88 [ 61.004826] x26: 000000013bbefa80 x25: 000000013e635d98 x24: 000000000016dcd0 [ 61.006682] x23: 0000576c380a0000 x22: ffffd76c41f63c80 x21: ffffd76c41f63c80 [ 61.008425] x20: ffffd76c419b5008 x19: ffffd76c419b5008 x18: 0000000000000000 [ 61.010149] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [ 61.011877] x14: 0000000000000000 x13: 0000000000000000 x12: ffffd76c41f4f8d8 [ 61.013605] x11: 0000000000000001 x10: 95fa3f5fd68e625e x9 : ffffd76c401a70d4 [ 61.015331] x8 : ffffd76c41f65dc8 x7 : 0000000000000004 x6 : 00000010c242cffc [ 61.017024] x5 : 4000000000000000 x4 : ffff2894bd126000 x3 : ffff2894bd126000 [ 61.018652] x2 : ffff2894bd126000 x1 : ffff0000feaeb530 x0 : 00000000000000e0 [ 61.020271] Call trace: [ 61.020830] arch_cpu_idle+0x18/0x2c [ 61.021654] default_idle_call+0x50/0x114 [ 61.022571] cpuidle_idle_call+0x154/0x1a0 [ 61.023514] do_idle+0xb8/0x110 [ 61.024232] cpu_startup_entry+0x30/0x34 [ 61.025134] rest_init+0xdc/0xe0 [ 61.025894] arch_post_acpi_subsys_init+0x0/0x28 [ 61.026934] start_kernel+0x47c/0x498 [ 61.027750] __primary_switched+0xbc/0xc4 [ 61.028636] Task dump for CPU 3: [ 61.029346] task:swapper/3 state:R running task stack:0 pid:0 ppid:1 flags:0x00000008 [ 61.031534] Call trace: [ 61.032074] __switch_to+0xc8/0xf0 [ 61.032827] ct_kernel_exit.constprop.0+0x84/0xb0 [ 61.033861] arch_cpu_idle+0x18/0x2c [ 61.034649] default_idle_call+0x50/0x114 [ 61.035530] cpuidle_idle_call+0x154/0x1a0 [ 61.036430] do_idle+0xb8/0x110 [ 61.037131] cpu_startup_entry+0x30/0x34 [ 61.037987] secondary_start_kernel+0xe0/0x104 [ 61.038966] __secondary_switched+0xb0/0xb4 [ 88.448798] rcu: INFO: rcu_sched self-detected stall on CPU [ 88.450066] rcu: 3-...!: (1 ticks this GP) idle=24f4/1/0x4000000000000000 softirq=3722/3722 fqs=1 [ 88.452158] (t=230585897971 jiffies g=2773 q=817 ncpus=4) [ 88.454360] Task dump for CPU 0: [ 88.455031] task:systemd state:R running task stack:0 pid:873 ppid:1 flags:0x00000000 [ 88.457115] Call trace: [ 88.457630] __switch_to+0xc8/0xf0 [ 88.458352] 0x0 [ 88.458818] Task dump for CPU 1: [ 88.459569] task:systemd-userwor state:R running task stack:0 pid:914 ppid:663 flags:0x0000080c [ 88.461747] Call trace: [ 88.462248] __switch_to+0xc8/0xf0 [ 88.462940] __schedule+0x200/0x5d0 [ 88.463658] 0x0 [ 88.464036] Task dump for CPU 2: [ 88.464705] task:systemd-userwor state:R running task stack:0 pid:670 ppid:663 flags:0x00000a00 [ 88.466729] Call trace: [ 88.467231] __switch_to+0xc8/0xf0 [ 88.467930] del_timer_sync+0x48/0xa4 [ 88.468691] schedule_timeout+0x98/0x160 [ 88.469526] sockfs_ops+0x0/0x100 [ 88.470219] CPU: 3 PID: 1 Comm: systemd Not tainted 6.1.0-rc2+ #1 [ 88.471457] Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20221117gitfff6d81270b5-14.fc37 11/17/2022 [ 88.473405] pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 88.474814] pc : pcpu_memcg_post_alloc_hook+0x40/0xd0 [ 88.475861] lr : pcpu_alloc+0x4e0/0xac4 [ 88.476655] sp : ffff80000805ba90 [ 88.477330] x29: ffff80000805ba90 x28: ffff0000d2246500 x27: 000000000000a408 [ 88.478779] x26: 0000000000000008 x25: ffffd76c4039b1a4 x24: ffffd76c416e8fa8 [ 88.480235] x23: 0000d36b32cf4408 x22: 0000000000400cc0 x21: 0000000000000560 [ 88.481688] x20: 00000000000002b0 x19: ffff0000d0490980 x18: 0000000000000000 [ 88.483144] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [ 88.484600] x14: 0000000000000002 x13: 0000000000000102 x12: 0000000000000001 [ 88.486064] x11: 00000000000000ac x10: 0000000000000401 x9 : ffffd76c4039aba0 [ 88.487521] x8 : fffffbffefef36b8 x7 : 0000000000000000 x6 : 000000000000003f [ 88.488972] x5 : 0000000000000040 x4 : fffffffffffffff0 x3 : 00000000000002b0 [ 88.490425] x2 : 0000000000002902 x1 : ffff0000d2246500 x0 : ffff800008ae5000 [ 88.491876] Call trace: [ 88.492381] pcpu_memcg_post_alloc_hook+0x40/0xd0 [ 88.493343] pcpu_alloc+0x4e0/0xac4 [ 88.494051] __alloc_percpu_gfp+0x20/0x2c [ 88.494964] mem_cgroup_alloc+0x108/0x2bc [ 88.495848] mem_cgroup_css_alloc+0x68/0x240 [ 88.496770] cgroup_apply_control_enable+0x140/0x32c [ 88.497850] cgroup_mkdir+0xf4/0x1e0 [ 88.498621] kernfs_iop_mkdir+0x68/0xa0 [ 88.499453] vfs_mkdir+0xb8/0x170 [ 88.500173] do_mkdirat+0x130/0x144 [ 88.500928] __arm64_sys_mkdirat+0x58/0x8c [ 88.501812] invoke_syscall+0x50/0x120 [ 88.502624] el0_svc_common.constprop.0+0x4c/0xf4 [ 88.503636] do_el0_svc+0x34/0x4c [ 88.504357] el0_svc+0x34/0xdc [ 88.505018] el0t_64_sync_handler+0xf4/0x120 [ 88.505946] el0t_64_sync+0x190/0x194 [ 88.510138] systemd-journald[572]: Oldest entry in /var/log/journal/165dbe8e5fa0400fa86b537c22caaa1f/system.journal is older than the configured file retention duration (1month), suggesting rotation. [ 88.517590] systemd-journald[572]: /var/log/journal/165dbe8e5fa0400fa86b537c22caaa1f/system.journal: Journal header limits reached or header out-of-date, rotating.
As we should have a list of issues, the others are:
- libvirt, "first vcpu unplug times out", reported by Russell on the 5th-Jan.
(private email to you an I)
- Guest can't shutdown properly, "reboot: Power down" is displayed, but then nothing
happens. Reported by Russell on the 20th-December. It looks like the mail I sent here didn't get delivered... (sorry if this one is new!)
I've had a look at the KVM code for the last one, I don't think it requires all the vCPU threads to be in the kernel. I've also been unable to reproduce it.
Thanks,
James
Regards,
-Vishnu
Hi Vishnu, Salil,
On 14/03/2023 10:14, Vishnu Pajjuri wrote:
On 01-02-2023 23:18, James Morse via Linaro-open-discussions wrote:
On 25/01/2023 09:40, Salil Mehta wrote:
Qemu isn't responding with PSCI_DENIED when CPUs are forbidden. ('SUCCESS' means you hit a 5 second timeout in the guest, on each CPU)
I have tested the straight forward case and it works. Could you please elaborate on this so that I can look into the issue?
Sorry for the delay'd response - (I've been debugging CPU_SUSPEND and the arch-timer when using this PSCI to user-space stuff).
This test ran with a vanilla v6.1 kernel, so it doesn't have the PSCI or HVC to user-space capabilities. It looks like Qemu ignores this, and offers the policy-management/hotplug stuff anyway. The result is hitting the 5-second timeout whenever a 'denied' CPU fails to come online. This kind of thing will break migration if a guest using these features is allowed to start on a host that doesn't have them.
While I'm trying VM live migration with vcpu hotplug stuff I observed a self-detected stall on vCPUs on guest And note that I did no vcpu hotplug/hot unplug before or after or during VM live migration
is it the similar issue you guys mentioned here?
It's not. The description above was about the enforcement disappearing over a migration because the destination host kernel lacked the kernel patches.
What you are seeing is new.
The scenario is as follows 1. Migrate VM from src.host to dest.host Migration is successful, didn't see any CPU stall messages on the console 2. Migrate the same VM from dest.host to src.host Migration is successful, but self-detected stalls were observed on CPUs in the guest
It sounds like Qemu hasn't migrated its own PSCI state, which holds which CPUs are on/off. The result is the guest on CPU-0 thinks CPU-3 is on, but Qemu thinks its off, hence the stall warnings.
It may be possible to store the on/off state in the KVM vCPU's mp_state if that makes it easier for Qemu to migrate this state. This shouldn't get used by KVM as the PSCI CPU_AFFINITY call has moved to user-space, and Qemu won't (shouldn't) call VCPU_RUN against a vCPU that is PSCI:off when Qemu is managing PSCI.
Thanks for the bug report!
Thanks,
James
The kernel version on all three(src.host, dest.host and VM) machines is same. https://git.gitlab.arm.com/linux-arm/linux-jm.git virtual_cpu_hotplug/rfc/v/0.3 # uname -r 6.1.0-rc2+
Qemu version on both(src.host, dest.host) hosts is same https://github.com/salil-mehta/qemu.git virt-cpuhp-armv8/rfc-v1-port29092022.psci.present # qemu-system-aarch64 --version QEMU emulator version 7.1.50 (v5.0.0-21802-gbe6cc696c2)
At the same time, I tried distro's qemu binary for live migration, it works fine # qemu-system-aarch64 --version QEMU emulator version 7.0.0 (qemu-7.0.0-14.fc37)
VM vCPU config: current 4, maximum 8 Note: Live migration performed with virsh commands.
VM console log: [ 60.881616] rcu: INFO: rcu_sched self-detected stall on CPU [ 60.884881] rcu: 0-...!: (1 ticks this GP) idle=c8b4/0/0x1 softirq=4255/4255 fqs=0 [ 60.888888] (t=53688 jiffies g=2773 q=30 ncpus=4) [ 60.891386] rcu: rcu_sched kthread timer wakeup didn't happen for 53687 jiffies! g2773 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 [ 60.897325] rcu: Possible timer handling issue on cpu=0 timer-softirq=531 [ 60.901491] rcu: rcu_sched kthread starved for 53688 jiffies! g2773 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0 [ 60.907503] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior. [ 60.912603] rcu: RCU grace-period kthread stack dump: [ 60.915490] task:rcu_sched state:I stack:0 pid:14 ppid:2 flags:0x00000008 [ 60.920284] Call trace: [ 60.921691] __switch_to+0xc8/0xf0 [ 60.923654] __schedule+0x200/0x5d0 [ 60.925661] schedule+0x58/0xf0 [ 60.927449] schedule_timeout+0x90/0x160 [ 60.929675] rcu_gp_fqs_loop+0x11c/0x3f4 [ 60.931897] rcu_gp_kthread+0x158/0x1bc [ 60.934076] kthread+0xd0/0xd4 [ 60.935826] ret_from_fork+0x10/0x20 [ 60.937859] rcu: Stack dump where RCU GP kthread last ran: [ 60.940922] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.1.0-rc2+ #1 [ 60.944213] Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20221117gitfff6d81270b5-14.fc37 11/17/2022 [ 60.948968] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 60.952133] pc : arch_cpu_idle+0x18/0x2c [ 60.953912] lr : arch_cpu_idle+0x10/0x2c [ 60.955723] sp : ffffd76c41f43d60 [ 60.957204] x29: ffffd76c41f43d60 x28: 0000000078bb0010 x27: 000000013bbefa88 [ 60.960107] x26: 000000013bbefa80 x25: 000000013e635d98 x24: 000000000016dcd0 [ 60.963001] x23: 0000576c380a0000 x22: ffffd76c41f63c80 x21: ffffd76c41f63c80 [ 60.965975] x20: ffffd76c419b5008 x19: ffffd76c419b5008 x18: 0000000000000000 [ 60.968621] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [ 60.971168] x14: 0000000000000000 x13: 0000000000000000 x12: ffffd76c41f4f8d8 [ 60.973709] x11: 0000000000000001 x10: 95fa3f5fd68e625e x9 : ffffd76c401a70d4 [ 60.976296] x8 : ffffd76c41f65dc8 x7 : 0000000000000004 x6 : 00000010c242cffc [ 60.978592] x5 : 4000000000000000 x4 : ffff2894bd126000 x3 : ffff2894bd126000 [ 60.980848] x2 : ffff2894bd126000 x1 : ffff0000feaeb530 x0 : 00000000000000e0 [ 60.983104] Call trace: [ 60.983881] arch_cpu_idle+0x18/0x2c [ 60.985025] default_idle_call+0x50/0x114 [ 60.986311] cpuidle_idle_call+0x154/0x1a0 [ 60.987586] do_idle+0xb8/0x110 [ 60.988471] cpu_startup_entry+0x30/0x34 [ 60.989576] rest_init+0xdc/0xe0 [ 60.990486] arch_post_acpi_subsys_init+0x0/0x28 [ 60.991782] start_kernel+0x47c/0x498 [ 60.992811] __primary_switched+0xbc/0xc4 [ 60.993945] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.1.0-rc2+ #1 [ 60.995693] Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20221117gitfff6d81270b5-14.fc37 11/17/2022 [ 60.998283] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 61.000084] pc : arch_cpu_idle+0x18/0x2c [ 61.001102] lr : arch_cpu_idle+0x10/0x2c [ 61.002118] sp : ffffd76c41f43d60 [ 61.002978] x29: ffffd76c41f43d60 x28: 0000000078bb0010 x27: 000000013bbefa88 [ 61.004826] x26: 000000013bbefa80 x25: 000000013e635d98 x24: 000000000016dcd0 [ 61.006682] x23: 0000576c380a0000 x22: ffffd76c41f63c80 x21: ffffd76c41f63c80 [ 61.008425] x20: ffffd76c419b5008 x19: ffffd76c419b5008 x18: 0000000000000000 [ 61.010149] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [ 61.011877] x14: 0000000000000000 x13: 0000000000000000 x12: ffffd76c41f4f8d8 [ 61.013605] x11: 0000000000000001 x10: 95fa3f5fd68e625e x9 : ffffd76c401a70d4 [ 61.015331] x8 : ffffd76c41f65dc8 x7 : 0000000000000004 x6 : 00000010c242cffc [ 61.017024] x5 : 4000000000000000 x4 : ffff2894bd126000 x3 : ffff2894bd126000 [ 61.018652] x2 : ffff2894bd126000 x1 : ffff0000feaeb530 x0 : 00000000000000e0 [ 61.020271] Call trace: [ 61.020830] arch_cpu_idle+0x18/0x2c [ 61.021654] default_idle_call+0x50/0x114 [ 61.022571] cpuidle_idle_call+0x154/0x1a0 [ 61.023514] do_idle+0xb8/0x110 [ 61.024232] cpu_startup_entry+0x30/0x34 [ 61.025134] rest_init+0xdc/0xe0 [ 61.025894] arch_post_acpi_subsys_init+0x0/0x28 [ 61.026934] start_kernel+0x47c/0x498 [ 61.027750] __primary_switched+0xbc/0xc4 [ 61.028636] Task dump for CPU 3: [ 61.029346] task:swapper/3 state:R running task stack:0 pid:0 ppid:1 flags:0x00000008 [ 61.031534] Call trace: [ 61.032074] __switch_to+0xc8/0xf0 [ 61.032827] ct_kernel_exit.constprop.0+0x84/0xb0 [ 61.033861] arch_cpu_idle+0x18/0x2c [ 61.034649] default_idle_call+0x50/0x114 [ 61.035530] cpuidle_idle_call+0x154/0x1a0 [ 61.036430] do_idle+0xb8/0x110 [ 61.037131] cpu_startup_entry+0x30/0x34 [ 61.037987] secondary_start_kernel+0xe0/0x104 [ 61.038966] __secondary_switched+0xb0/0xb4 [ 88.448798] rcu: INFO: rcu_sched self-detected stall on CPU [ 88.450066] rcu: 3-...!: (1 ticks this GP) idle=24f4/1/0x4000000000000000 softirq=3722/3722 fqs=1 [ 88.452158] (t=230585897971 jiffies g=2773 q=817 ncpus=4) [ 88.454360] Task dump for CPU 0: [ 88.455031] task:systemd state:R running task stack:0 pid:873 ppid:1 flags:0x00000000 [ 88.457115] Call trace: [ 88.457630] __switch_to+0xc8/0xf0 [ 88.458352] 0x0 [ 88.458818] Task dump for CPU 1: [ 88.459569] task:systemd-userwor state:R running task stack:0 pid:914 ppid:663 flags:0x0000080c [ 88.461747] Call trace: [ 88.462248] __switch_to+0xc8/0xf0 [ 88.462940] __schedule+0x200/0x5d0 [ 88.463658] 0x0 [ 88.464036] Task dump for CPU 2: [ 88.464705] task:systemd-userwor state:R running task stack:0 pid:670 ppid:663 flags:0x00000a00 [ 88.466729] Call trace: [ 88.467231] __switch_to+0xc8/0xf0 [ 88.467930] del_timer_sync+0x48/0xa4 [ 88.468691] schedule_timeout+0x98/0x160 [ 88.469526] sockfs_ops+0x0/0x100 [ 88.470219] CPU: 3 PID: 1 Comm: systemd Not tainted 6.1.0-rc2+ #1 [ 88.471457] Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20221117gitfff6d81270b5-14.fc37 11/17/2022 [ 88.473405] pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 88.474814] pc : pcpu_memcg_post_alloc_hook+0x40/0xd0 [ 88.475861] lr : pcpu_alloc+0x4e0/0xac4 [ 88.476655] sp : ffff80000805ba90 [ 88.477330] x29: ffff80000805ba90 x28: ffff0000d2246500 x27: 000000000000a408 [ 88.478779] x26: 0000000000000008 x25: ffffd76c4039b1a4 x24: ffffd76c416e8fa8 [ 88.480235] x23: 0000d36b32cf4408 x22: 0000000000400cc0 x21: 0000000000000560 [ 88.481688] x20: 00000000000002b0 x19: ffff0000d0490980 x18: 0000000000000000 [ 88.483144] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [ 88.484600] x14: 0000000000000002 x13: 0000000000000102 x12: 0000000000000001 [ 88.486064] x11: 00000000000000ac x10: 0000000000000401 x9 : ffffd76c4039aba0 [ 88.487521] x8 : fffffbffefef36b8 x7 : 0000000000000000 x6 : 000000000000003f [ 88.488972] x5 : 0000000000000040 x4 : fffffffffffffff0 x3 : 00000000000002b0 [ 88.490425] x2 : 0000000000002902 x1 : ffff0000d2246500 x0 : ffff800008ae5000 [ 88.491876] Call trace: [ 88.492381] pcpu_memcg_post_alloc_hook+0x40/0xd0 [ 88.493343] pcpu_alloc+0x4e0/0xac4 [ 88.494051] __alloc_percpu_gfp+0x20/0x2c [ 88.494964] mem_cgroup_alloc+0x108/0x2bc [ 88.495848] mem_cgroup_css_alloc+0x68/0x240 [ 88.496770] cgroup_apply_control_enable+0x140/0x32c [ 88.497850] cgroup_mkdir+0xf4/0x1e0 [ 88.498621] kernfs_iop_mkdir+0x68/0xa0 [ 88.499453] vfs_mkdir+0xb8/0x170 [ 88.500173] do_mkdirat+0x130/0x144 [ 88.500928] __arm64_sys_mkdirat+0x58/0x8c [ 88.501812] invoke_syscall+0x50/0x120 [ 88.502624] el0_svc_common.constprop.0+0x4c/0xf4 [ 88.503636] do_el0_svc+0x34/0x4c [ 88.504357] el0_svc+0x34/0xdc [ 88.505018] el0t_64_sync_handler+0xf4/0x120 [ 88.505946] el0t_64_sync+0x190/0x194 [ 88.510138] systemd-journald[572]: Oldest entry in /var/log/journal/165dbe8e5fa0400fa86b537c22caaa1f/system.journal is older than the configured file retention duration (1month), suggesting rotation. [ 88.517590] systemd-journald[572]: /var/log/journal/165dbe8e5fa0400fa86b537c22caaa1f/system.journal: Journal header limits reached or header out-of-date, rotating.
From: James Morse james.morse@arm.com Sent: Tuesday, March 14, 2023 10:45 AM To: Vishnu Pajjuri vishnu@amperemail.onmicrosoft.com; Salil Mehta salil.mehta@huawei.com; Joyce Qi joyce.qi@linaro.org Cc: linaro-open-discussions@op-lists.linaro.org; Russell King linux@armlinux.org.uk; lorenzo.pieralisi@linaro.org; salil.mehta@opnsrc.net Subject: Re: [Linaro-open-discussions] Re: Qemu isn't responding with PSCI_DENIED when CPUs are forbidden.
Hi Vishnu, Salil,
On 14/03/2023 10:14, Vishnu Pajjuri wrote:
On 01-02-2023 23:18, James Morse via Linaro-open-discussions wrote:
On 25/01/2023 09:40, Salil Mehta wrote:
Qemu isn't responding with PSCI_DENIED when CPUs are forbidden.
('SUCCESS' means you
hit a 5 second timeout in the guest, on each CPU)
I have tested the straight forward case and it works. Could you please elaborate on this so that I can look into the issue?
Sorry for the delay'd response - (I've been debugging CPU_SUSPEND and
the arch-timer when
using this PSCI to user-space stuff).
This test ran with a vanilla v6.1 kernel, so it doesn't have the PSCI or
HVC to user-space
capabilities. It looks like Qemu ignores this, and offers the policy-management/hotplug stuff anyway. The result is hitting the 5-second timeout whenever a 'denied' CPU fails to come online. This kind of thing will break migration if a guest using these features is allowed to start on a host that doesn't have them.
While I'm trying VM live migration with vcpu hotplug stuff I observed a self-detected stall on vCPUs on guest And note that I did no vcpu hotplug/hot unplug before or after or during VM live migration
is it the similar issue you guys mentioned here?
It's not. The description above was about the enforcement disappearing over a migration because the destination host kernel lacked the kernel patches.
What you are seeing is new.
Yes, Live migration was never tested and does not works (In fact, discussed the same problem internally with Jonathan last week). I have been working on fixing Live Migration recently. Need some time.
The scenario is as follows 1. Migrate VM from src.host to dest.host Migration is successful, didn't see any CPU stall messages on the console 2. Migrate the same VM from dest.host to src.host Migration is successful, but self-detected stalls were observed on CPUs in the guest
It sounds like Qemu hasn't migrated its own PSCI state, which holds which CPUs are on/off. The result is the guest on CPU-0 thinks CPU-3 is on, but Qemu thinks its off, hence the stall warnings.
That’s correct. Debugging further as there are other bits too.
It may be possible to store the on/off state in the KVM vCPU's mp_state if that makes it easier for Qemu to migrate this state. This shouldn't get used by KVM as the PSCI CPU_AFFINITY call has moved to user-space, and Qemu won't (shouldn't) call VCPU_RUN against a vCPU that is PSCI:off when Qemu is managing PSCI.
It never does as it is only called when threads are launched.
Thanks
Hi Vishnu,
From: Vishnu Pajjuri vishnu@amperemail.onmicrosoft.com Sent: Tuesday, March 14, 2023 10:14 AM To: James Morse james.morse@arm.com; Salil Mehta salil.mehta@huawei.com; Joyce Qi joyce.qi@linaro.org Cc: linaro-open-discussions@op-lists.linaro.org; Russell King linux@armlinux.org.uk; lorenzo.pieralisi@linaro.org; salil.mehta@opnsrc.net Subject: Re: [Linaro-open-discussions] Re: Qemu isn't responding with PSCI_DENIED when CPUs are forbidden.
Hi James/Salil, On 01-02-2023 23:18, James Morse via Linaro-open-discussions wrote: Hi Salil,
On 25/01/2023 09:40, Salil Mehta wrote: Qemu isn't responding with PSCI_DENIED when CPUs are forbidden. ('SUCCESS' means you hit a 5 second timeout in the guest, on each CPU)
I have tested the straight forward case and it works. Could you please elaborate on this so that I can look into the issue?
Sorry for the delay'd response - (I've been debugging CPU_SUSPEND and the arch-timer when using this PSCI to user-space stuff).
This test ran with a vanilla v6.1 kernel, so it doesn't have the PSCI or HVC to user-space capabilities. It looks like Qemu ignores this, and offers the policy-management/hotplug stuff anyway. The result is hitting the 5-second timeout whenever a 'denied' CPU fails to come online. This kind of thing will break migration if a guest using these features is allowed to start on a host that doesn't have them. While I'm trying VM live migration with vcpu hotplug stuff I observed a self-detected stall on vCPUs on guest And note that I did no vcpu hotplug/hot unplug before or after or during VM live migration
is it the similar issue you guys mentioned here?
The scenario is as follows 1. Migrate VM from src.host to dest.host Migration is successful, didn't see any CPU stall messages on the console 2. Migrate the same VM from dest.host to src.host Migration is successful, but self-detected stalls were observed on CPUs in the guest
Yes, we know this issue and I am trying to analyze it. Problem is this isn't always repeatable and destination VM does not always shows these prints but rather ends up in abrupt hang.
I have been working on analyzing precisely this w.r.t Qemu in the past few days.
<...log excerpt @destination migrated VM>
(qemu) (qemu) (qemu) [ 435.349036] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [ 435.351880] rcu: 1-...!: (55 GPs behind) idle=6d58/0/0x0 softirq=5655/5657 fqs=0 (false positive?) [ 435.352979] rcu: 2-...!: (56 GPs behind) idle=fc40/0/0x0 softirq=7022/7023 fqs=0 (false positive?) [ 435.354409] rcu: rcu_preempt kthread starved for 36762 jiffies! g2513 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0 [ 435.355651] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior. [ 435.356749] rcu: RCU grace-period kthread stack dump: [ 435.357395] rcu: Stack dump where RCU GP kthread last ran: [ 498.368421] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [ 498.369247] rcu: 1-...!: (55 GPs behind) idle=2da0/0/0x0 softirq=5655/5657 fqs=0 (false positive?) [ 498.370301] rcu: 2-...!: (56 GPs behind) idle=fe00/0/0x0 softirq=7022/7023 fqs=0 (false positive?) [ 498.371387] rcu: rcu_preempt kthread starved for 52517 jiffies! g2513 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0 [ 498.372577] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior. [ 498.373645] rcu: RCU grace-period kthread stack dump: [ 498.374262] rcu: Stack dump where RCU GP kthread last ran: [ 561.387806] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [ 561.393161] rcu: 1-...!: (55 GPs behind) idle=9a10/0/0x0 softirq=5655/5657 fqs=0 (false positive?) [ 561.394257] rcu: 2-...!: (56 GPs behind) idle=6658/0/0x0 softirq=7022/7023 fqs=0 (false positive?) [ 561.395396] rcu: rcu_preempt kthread starved for 68272 jiffies! g2513 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0 [ 561.396642] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior. [ 561.397741] rcu: RCU grace-period kthread stack dump: [ 561.398377] rcu: Stack dump where RCU GP kthread last ran: [ 624.407193] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [ 624.408044] rcu: 1-...!: (55 GPs behind) idle=4870/0/0x0 softirq=5655/5657 fqs=0 (false positive?) [ 624.409097] rcu: 2-...!: (56 GPs behind) idle=c910/0/0x0 softirq=7022/7023 fqs=0 (false positive?) [ 624.410176] rcu: rcu_preempt kthread starved for 84027 jiffies! g2513 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0 [ 624.411360] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior. [ 624.412420] rcu: RCU grace-period kthread stack dump: [ 624.413031] rcu: Stack dump where RCU GP kthread last ran: [ 687.426578] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [ 687.442912] rcu: 1-...!: (55 GPs behind) idle=8da0/0/0x0 softirq=5655/5657 fqs=0 (false positive?) [ 687.445696] rcu: 2-...!: (56 GPs behind) idle=d3d0/0/0x0 softirq=7022/7023 fqs=0 (false positive?) [ 687.447635] rcu: rcu_preempt kthread starved for 99782 jiffies! g2513 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0 [ 687.448884] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior. [ 687.449974] rcu: RCU grace-period kthread stack dump: [ 687.450603] rcu: Stack dump where RCU GP kthread last ran: [ 750.445966] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [ 750.446971] rcu: 1-...!: (55 GPs behind) idle=7b00/0/0x0 softirq=5655/5657 fqs=0 (false positive?) [ 750.448118] rcu: 2-...!: (56 GPs behind) idle=baa0/0/0x0 softirq=7022/7023 fqs=0 (false positive?) [ 750.449233] rcu: rcu_preempt kthread starved for 115537 jiffies! g2513 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0 [ 750.450454] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior. [ 750.451546] rcu: RCU grace-period kthread stack dump: [ 750.452167] rcu: Stack dump where RCU GP kthread last ran: [ 813.465350] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [ 813.466976] rcu: 1-...!: (55 GPs behind) idle=4d70/0/0x0 softirq=5655/5657 fqs=0 (false positive?) [ 813.468121] rcu: 2-...!: (56 GPs behind) idle=6068/0/0x0 softirq=7022/7023 fqs=0 (false positive?) [ 813.469215] rcu: rcu_preempt kthread starved for 131292 jiffies! g2513 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0 [ 813.470413] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior. [ 813.471493] rcu: RCU grace-period kthread stack dump: [ 813.472115] rcu: Stack dump where RCU GP kthread last ran:
<log excerpt ends...>
The kernel version on all three(src.host, dest.host and VM) machines is same. https://git.gitlab.arm.com/linux-arm/linux-jm.git virtual_cpu_hotplug/rfc/v/0.3 # uname -r 6.1.0-rc2+
Qemu version on both(src.host, dest.host) hosts is same https://github.com/salil-mehta/qemu.git virt-cpuhp-armv8/rfc-v1-port29092022.psci.present # qemu-system-aarch64 --version QEMU emulator version 7.1.50 (v5.0.0-21802-gbe6cc696c2)
At the same time, I tried distro's qemu binary for live migration, it works fine # qemu-system-aarch64 --version QEMU emulator version 7.0.0 (qemu-7.0.0-14.fc37)
VM vCPU config: current 4, maximum 8 Note: Live migration performed with virsh commands.
VM console log: [ 60.881616] rcu: INFO: rcu_sched self-detected stall on CPU [ 60.884881] rcu: 0-...!: (1 ticks this GP) idle=c8b4/0/0x1 softirq=4255/4255 fqs=0 [ 60.888888] (t=53688 jiffies g=2773 q=30 ncpus=4) [ 60.891386] rcu: rcu_sched kthread timer wakeup didn't happen for 53687 jiffies! g2773 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 [ 60.897325] rcu: Possible timer handling issue on cpu=0 timer-softirq=531 [ 60.901491] rcu: rcu_sched kthread starved for 53688 jiffies! g2773 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0 [ 60.907503] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior. [ 60.912603] rcu: RCU grace-period kthread stack dump: [ 60.915490] task:rcu_sched state:I stack:0 pid:14 ppid:2 flags:0x00000008 [ 60.920284] Call trace: [ 60.921691] __switch_to+0xc8/0xf0 [ 60.923654] __schedule+0x200/0x5d0 [ 60.925661] schedule+0x58/0xf0 [ 60.927449] schedule_timeout+0x90/0x160 [ 60.929675] rcu_gp_fqs_loop+0x11c/0x3f4 [ 60.931897] rcu_gp_kthread+0x158/0x1bc [ 60.934076] kthread+0xd0/0xd4 [ 60.935826] ret_from_fork+0x10/0x20 [ 60.937859] rcu: Stack dump where RCU GP kthread last ran: [ 60.940922] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.1.0-rc2+ #1 [ 60.944213] Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20221117gitfff6d81270b5-14.fc37 11/17/2022 [ 60.948968] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 60.952133] pc : arch_cpu_idle+0x18/0x2c [ 60.953912] lr : arch_cpu_idle+0x10/0x2c [ 60.955723] sp : ffffd76c41f43d60 [ 60.957204] x29: ffffd76c41f43d60 x28: 0000000078bb0010 x27: 000000013bbefa88 [ 60.960107] x26: 000000013bbefa80 x25: 000000013e635d98 x24: 000000000016dcd0 [ 60.963001] x23: 0000576c380a0000 x22: ffffd76c41f63c80 x21: ffffd76c41f63c80 [ 60.965975] x20: ffffd76c419b5008 x19: ffffd76c419b5008 x18: 0000000000000000 [ 60.968621] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [ 60.971168] x14: 0000000000000000 x13: 0000000000000000 x12: ffffd76c41f4f8d8 [ 60.973709] x11: 0000000000000001 x10: 95fa3f5fd68e625e x9 : ffffd76c401a70d4 [ 60.976296] x8 : ffffd76c41f65dc8 x7 : 0000000000000004 x6 : 00000010c242cffc [ 60.978592] x5 : 4000000000000000 x4 : ffff2894bd126000 x3 : ffff2894bd126000 [ 60.980848] x2 : ffff2894bd126000 x1 : ffff0000feaeb530 x0 : 00000000000000e0 [ 60.983104] Call trace: [ 60.983881] arch_cpu_idle+0x18/0x2c [ 60.985025] default_idle_call+0x50/0x114 [ 60.986311] cpuidle_idle_call+0x154/0x1a0 [ 60.987586] do_idle+0xb8/0x110 [ 60.988471] cpu_startup_entry+0x30/0x34 [ 60.989576] rest_init+0xdc/0xe0 [ 60.990486] arch_post_acpi_subsys_init+0x0/0x28 [ 60.991782] start_kernel+0x47c/0x498 [ 60.992811] __primary_switched+0xbc/0xc4 [ 60.993945] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.1.0-rc2+ #1 [ 60.995693] Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20221117gitfff6d81270b5-14.fc37 11/17/2022 [ 60.998283] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 61.000084] pc : arch_cpu_idle+0x18/0x2c [ 61.001102] lr : arch_cpu_idle+0x10/0x2c [ 61.002118] sp : ffffd76c41f43d60 [ 61.002978] x29: ffffd76c41f43d60 x28: 0000000078bb0010 x27: 000000013bbefa88 [ 61.004826] x26: 000000013bbefa80 x25: 000000013e635d98 x24: 000000000016dcd0 [ 61.006682] x23: 0000576c380a0000 x22: ffffd76c41f63c80 x21: ffffd76c41f63c80 [ 61.008425] x20: ffffd76c419b5008 x19: ffffd76c419b5008 x18: 0000000000000000 [ 61.010149] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [ 61.011877] x14: 0000000000000000 x13: 0000000000000000 x12: ffffd76c41f4f8d8 [ 61.013605] x11: 0000000000000001 x10: 95fa3f5fd68e625e x9 : ffffd76c401a70d4 [ 61.015331] x8 : ffffd76c41f65dc8 x7 : 0000000000000004 x6 : 00000010c242cffc [ 61.017024] x5 : 4000000000000000 x4 : ffff2894bd126000 x3 : ffff2894bd126000 [ 61.018652] x2 : ffff2894bd126000 x1 : ffff0000feaeb530 x0 : 00000000000000e0 [ 61.020271] Call trace: [ 61.020830] arch_cpu_idle+0x18/0x2c [ 61.021654] default_idle_call+0x50/0x114 [ 61.022571] cpuidle_idle_call+0x154/0x1a0 [ 61.023514] do_idle+0xb8/0x110 [ 61.024232] cpu_startup_entry+0x30/0x34 [ 61.025134] rest_init+0xdc/0xe0 [ 61.025894] arch_post_acpi_subsys_init+0x0/0x28 [ 61.026934] start_kernel+0x47c/0x498 [ 61.027750] __primary_switched+0xbc/0xc4 [ 61.028636] Task dump for CPU 3: [ 61.029346] task:swapper/3 state:R running task stack:0 pid:0 ppid:1 flags:0x00000008 [ 61.031534] Call trace: [ 61.032074] __switch_to+0xc8/0xf0 [ 61.032827] ct_kernel_exit.constprop.0+0x84/0xb0 [ 61.033861] arch_cpu_idle+0x18/0x2c [ 61.034649] default_idle_call+0x50/0x114 [ 61.035530] cpuidle_idle_call+0x154/0x1a0 [ 61.036430] do_idle+0xb8/0x110 [ 61.037131] cpu_startup_entry+0x30/0x34 [ 61.037987] secondary_start_kernel+0xe0/0x104 [ 61.038966] __secondary_switched+0xb0/0xb4 [ 88.448798] rcu: INFO: rcu_sched self-detected stall on CPU [ 88.450066] rcu: 3-...!: (1 ticks this GP) idle=24f4/1/0x4000000000000000 softirq=3722/3722 fqs=1 [ 88.452158] (t=230585897971 jiffies g=2773 q=817 ncpus=4) [ 88.454360] Task dump for CPU 0: [ 88.455031] task:systemd state:R running task stack:0 pid:873 ppid:1 flags:0x00000000 [ 88.457115] Call trace: [ 88.457630] __switch_to+0xc8/0xf0 [ 88.458352] 0x0 [ 88.458818] Task dump for CPU 1: [ 88.459569] task:systemd-userwor state:R running task stack:0 pid:914 ppid:663 flags:0x0000080c [ 88.461747] Call trace: [ 88.462248] __switch_to+0xc8/0xf0 [ 88.462940] __schedule+0x200/0x5d0 [ 88.463658] 0x0 [ 88.464036] Task dump for CPU 2: [ 88.464705] task:systemd-userwor state:R running task stack:0 pid:670 ppid:663 flags:0x00000a00 [ 88.466729] Call trace: [ 88.467231] __switch_to+0xc8/0xf0 [ 88.467930] del_timer_sync+0x48/0xa4 [ 88.468691] schedule_timeout+0x98/0x160 [ 88.469526] sockfs_ops+0x0/0x100 [ 88.470219] CPU: 3 PID: 1 Comm: systemd Not tainted 6.1.0-rc2+ #1 [ 88.471457] Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20221117gitfff6d81270b5-14.fc37 11/17/2022 [ 88.473405] pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 88.474814] pc : pcpu_memcg_post_alloc_hook+0x40/0xd0 [ 88.475861] lr : pcpu_alloc+0x4e0/0xac4 [ 88.476655] sp : ffff80000805ba90 [ 88.477330] x29: ffff80000805ba90 x28: ffff0000d2246500 x27: 000000000000a408 [ 88.478779] x26: 0000000000000008 x25: ffffd76c4039b1a4 x24: ffffd76c416e8fa8 [ 88.480235] x23: 0000d36b32cf4408 x22: 0000000000400cc0 x21: 0000000000000560 [ 88.481688] x20: 00000000000002b0 x19: ffff0000d0490980 x18: 0000000000000000 [ 88.483144] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [ 88.484600] x14: 0000000000000002 x13: 0000000000000102 x12: 0000000000000001 [ 88.486064] x11: 00000000000000ac x10: 0000000000000401 x9 : ffffd76c4039aba0 [ 88.487521] x8 : fffffbffefef36b8 x7 : 0000000000000000 x6 : 000000000000003f [ 88.488972] x5 : 0000000000000040 x4 : fffffffffffffff0 x3 : 00000000000002b0 [ 88.490425] x2 : 0000000000002902 x1 : ffff0000d2246500 x0 : ffff800008ae5000 [ 88.491876] Call trace: [ 88.492381] pcpu_memcg_post_alloc_hook+0x40/0xd0 [ 88.493343] pcpu_alloc+0x4e0/0xac4 [ 88.494051] __alloc_percpu_gfp+0x20/0x2c [ 88.494964] mem_cgroup_alloc+0x108/0x2bc [ 88.495848] mem_cgroup_css_alloc+0x68/0x240 [ 88.496770] cgroup_apply_control_enable+0x140/0x32c [ 88.497850] cgroup_mkdir+0xf4/0x1e0 [ 88.498621] kernfs_iop_mkdir+0x68/0xa0 [ 88.499453] vfs_mkdir+0xb8/0x170 [ 88.500173] do_mkdirat+0x130/0x144 [ 88.500928] __arm64_sys_mkdirat+0x58/0x8c [ 88.501812] invoke_syscall+0x50/0x120 [ 88.502624] el0_svc_common.constprop.0+0x4c/0xf4 [ 88.503636] do_el0_svc+0x34/0x4c [ 88.504357] el0_svc+0x34/0xdc [ 88.505018] el0t_64_sync_handler+0xf4/0x120 [ 88.505946] el0t_64_sync+0x190/0x194 [ 88.510138] systemd-journald[572]: Oldest entry in /var/log/journal/165dbe8e5fa0400fa86b537c22caaa1f/system.journal is older than the configured file retention duration (1month), suggesting rotation. [ 88.517590] systemd-journald[572]: /var/log/journal/165dbe8e5fa0400fa86b537c22caaa1f/system.journal: Journal header limits reached or header out-of-date, rotating.
As we should have a list of issues, the others are:
- libvirt, "first vcpu unplug times out", reported by Russell on the 5th-Jan.
(private email to you an I)
- Guest can't shutdown properly, "reboot: Power down" is displayed, but then nothing
happens. Reported by Russell on the 20th-December. It looks like the mail I sent here didn't get delivered... (sorry if this one is new!)
I've had a look at the KVM code for the last one, I don't think it requires all the vCPU threads to be in the kernel. I've also been unable to reproduce it.
We have already discussed above on 5th Feb call. These are different issues.
Thanks Salil
linaro-open-discussions@op-lists.linaro.org