Hi All,
Sorry for long email, but I need your advice to debug my current problem.
[Issue] SynchronousExceptionSPx() occurs in secondary core while primary core is executing to boot edk2. # I use almost latest TF-A and edk2 source code. # TF-A Includes Graeme's patches to support 512 cores.
[Detail] In sbsa-qemu boot flow, primary core starts bl1 -> bl2 -> bl31 -> bl32 -> edk2. My sbsa-qemu enables UEFI secure variables, so edk2 calls MM_COMMUNICATE service and access uefi variables using StandaloneMM variable/flash drivers running in Secure-EL0.
For secondary cores, all secondary cores sleep with wfe() in plat/qemu/common/aarch64/plat_helpers.S::plat_secondary_cold_boot_setup() in bl1, until linux kernel calls PSCI(cpu on).
The below gdb output is at edk2 boot. At this point, everythig is fine and edk2 <-> StandaloneMM communication works.
--- (gdb) info thread Id Target Id Frame * 1 Thread 1.1 (CPU#0 [running]) spm_mm_smc_handler (smc_fid=3288334433, x1=0, x2=0, x3=0, x4=0, cookie=0x0 <bl1_entrypoint>, handle=0x3ff8a010 <sp_ctx+16>, flags=0) at services/std_svc/spm_mm/spm_mm_main.c:280 2 Thread 1.2 (CPU#1 [running]) plat_secondary_cold_boot_setup () at plat/qemu/common/aarch64/plat_helpers.S:84 ---
Several MM_COMMUNICATE service call occurs in edk2, suddenly SynchronousExceptionSPx() is invoked in secondary core. (To make issue simple, I start qemu with "-smp 2") # I set breakpoint at SynchronousExceptionSPx() in secondary core. --- (gdb) info thread Id Target Id Frame 1 Thread 1.1 (CPU#0 [running]) 0x0000000020aef988 in ?? () * 2 Thread 1.2 (CPU#1 [running]) SynchronousExceptionSPx () at bl1/aarch64/bl1_exceptions.S:54 --- After that, secondary core goes into panic_handler() and could not boot anymore.
EDK2 and standaloneMM runs only on primary core until linux issues PSCI call, so I don't have any idea why SynchronousException(generated by software?) occurs in secondary core.
[Observation] 1) Some EL3 registers are as follows when secondary core stops with breakpoint at SynchronousExceptionSPx ().
ELR_EL3 0x31c4 ESR_EL3 0x2000000 FAR_EL3 0x0
ELR_EL3 points 0x31c4, it is wfe() at plat_secondary_cold_boot_setup(). Does it mean SynchronousException occurs while secondary core sleep with wfe()? # All register dump and disassembled bl1.elf are attached.
2) PSCI related area "PLAT_QEMU_HOLD_BASE"(starts from bottom of the secure SRAM(0x20000000)) is as expected. Secondary core#1 checks 0x20000010 in poll_mailbox() loop, the value(0) means secondary core shall keep to sleep with wfe(). --- INFO: 0x20000000 3fcee114 INFO: 0x20000008 0 INFO: 0x20000010 0 INFO: 0x20000018 0 --- This log is output in every spm_mm SMC handler of primary core.
3) When I load same TF-A/BL32(StandaloneMM) image and disable UEFI secure boot on edk2, this issue does not occur. It means calling MM_COMMUNICATE service causes this issue.
Does anyone have any idea/clue to debug of this issue?
Thanks, Masahisa
On Fri, 26 Feb 2021 at 09:22, Masahisa Kojima masahisa.kojima@linaro.org wrote:
Hi All,
Sorry for long email, but I need your advice to debug my current problem.
[Issue] SynchronousExceptionSPx() occurs in secondary core while primary core is executing to boot edk2. # I use almost latest TF-A and edk2 source code. # TF-A Includes Graeme's patches to support 512 cores.
[Detail] In sbsa-qemu boot flow, primary core starts bl1 -> bl2 -> bl31 -> bl32 -> edk2. My sbsa-qemu enables UEFI secure variables, so edk2 calls MM_COMMUNICATE service and access uefi variables using StandaloneMM variable/flash drivers running in Secure-EL0.
For secondary cores, all secondary cores sleep with wfe() in plat/qemu/common/aarch64/plat_helpers.S::plat_secondary_cold_boot_setup() in bl1, until linux kernel calls PSCI(cpu on).
The below gdb output is at edk2 boot. At this point, everythig is fine and edk2 <-> StandaloneMM communication works.
(gdb) info thread Id Target Id Frame
- 1 Thread 1.1 (CPU#0 [running]) spm_mm_smc_handler (smc_fid=3288334433, x1=0, x2=0, x3=0, x4=0, cookie=0x0 <bl1_entrypoint>, handle=0x3ff8a010 <sp_ctx+16>, flags=0) at services/std_svc/spm_mm/spm_mm_main.c:280 2 Thread 1.2 (CPU#1 [running]) plat_secondary_cold_boot_setup () at plat/qemu/common/aarch64/plat_helpers.S:84
Several MM_COMMUNICATE service call occurs in edk2, suddenly SynchronousExceptionSPx() is invoked in secondary core. (To make issue simple, I start qemu with "-smp 2")
# I set breakpoint at SynchronousExceptionSPx() in secondary core.
(gdb) info thread Id Target Id Frame 1 Thread 1.1 (CPU#0 [running]) 0x0000000020aef988 in ?? ()
- 2 Thread 1.2 (CPU#1 [running]) SynchronousExceptionSPx () at bl1/aarch64/bl1_exceptions.S:54
After that, secondary core goes into panic_handler() and could not boot anymore.
EDK2 and standaloneMM runs only on primary core until linux issues PSCI call, so I don't have any idea why SynchronousException(generated by software?) occurs in secondary core.
[Observation]
- Some EL3 registers are as follows when secondary core stops
with breakpoint at SynchronousExceptionSPx ().
ELR_EL3 0x31c4 ESR_EL3 0x2000000 FAR_EL3 0x0
ELR_EL3 points 0x31c4, it is wfe() at plat_secondary_cold_boot_setup(). Does it mean SynchronousException occurs while secondary core sleep with wfe()? # All register dump and disassembled bl1.elf are attached.
Hello Masahisa,
This means QEMU is delivering an exception with unknown exception class to EL3 on the secondary core. It should be possible to track this down in QEMU - exceptions are not delivered by accident so you should be able to run QEMU in the debugger and figure out what is going on when this happens.
- PSCI related area "PLAT_QEMU_HOLD_BASE"(starts from bottom of
the secure SRAM(0x20000000)) is as expected. Secondary core#1 checks 0x20000010 in poll_mailbox() loop, the value(0) means secondary core shall keep to sleep with wfe().
INFO: 0x20000000 3fcee114 INFO: 0x20000008 0 INFO: 0x20000010 0 INFO: 0x20000018 0
This log is output in every spm_mm SMC handler of primary core.
The EL3 code and logic looks correct to me. I suspect this is a QEMU issue.
- When I load same TF-A/BL32(StandaloneMM) image and disable UEFI secure boot
on edk2, this issue does not occur. It means calling MM_COMMUNICATE service causes this issue.
Does anyone have any idea/clue to debug of this issue?
Thanks, Masahisa
On Fri, 26 Feb 2021 at 17:32, Ard Biesheuvel ardb@kernel.org wrote:
On Fri, 26 Feb 2021 at 09:22, Masahisa Kojima masahisa.kojima@linaro.org wrote:
Hi All,
Sorry for long email, but I need your advice to debug my current problem.
[Issue] SynchronousExceptionSPx() occurs in secondary core while primary core is executing to boot edk2. # I use almost latest TF-A and edk2 source code. # TF-A Includes Graeme's patches to support 512 cores.
[Detail] In sbsa-qemu boot flow, primary core starts bl1 -> bl2 -> bl31 -> bl32 -> edk2. My sbsa-qemu enables UEFI secure variables, so edk2 calls MM_COMMUNICATE service and access uefi variables using StandaloneMM variable/flash drivers running in Secure-EL0.
For secondary cores, all secondary cores sleep with wfe() in plat/qemu/common/aarch64/plat_helpers.S::plat_secondary_cold_boot_setup() in bl1, until linux kernel calls PSCI(cpu on).
The below gdb output is at edk2 boot. At this point, everythig is fine and edk2 <-> StandaloneMM communication works.
(gdb) info thread Id Target Id Frame
- 1 Thread 1.1 (CPU#0 [running]) spm_mm_smc_handler (smc_fid=3288334433, x1=0, x2=0, x3=0, x4=0, cookie=0x0 <bl1_entrypoint>, handle=0x3ff8a010 <sp_ctx+16>, flags=0) at services/std_svc/spm_mm/spm_mm_main.c:280 2 Thread 1.2 (CPU#1 [running]) plat_secondary_cold_boot_setup () at plat/qemu/common/aarch64/plat_helpers.S:84
Several MM_COMMUNICATE service call occurs in edk2, suddenly SynchronousExceptionSPx() is invoked in secondary core. (To make issue simple, I start qemu with "-smp 2")
# I set breakpoint at SynchronousExceptionSPx() in secondary core.
(gdb) info thread Id Target Id Frame 1 Thread 1.1 (CPU#0 [running]) 0x0000000020aef988 in ?? ()
- 2 Thread 1.2 (CPU#1 [running]) SynchronousExceptionSPx () at bl1/aarch64/bl1_exceptions.S:54
After that, secondary core goes into panic_handler() and could not boot anymore.
EDK2 and standaloneMM runs only on primary core until linux issues PSCI call, so I don't have any idea why SynchronousException(generated by software?) occurs in secondary core.
[Observation]
- Some EL3 registers are as follows when secondary core stops
with breakpoint at SynchronousExceptionSPx ().
ELR_EL3 0x31c4 ESR_EL3 0x2000000 FAR_EL3 0x0
ELR_EL3 points 0x31c4, it is wfe() at plat_secondary_cold_boot_setup(). Does it mean SynchronousException occurs while secondary core sleep with wfe()? # All register dump and disassembled bl1.elf are attached.
Hello Masahisa,
This means QEMU is delivering an exception with unknown exception class to EL3 on the secondary core. It should be possible to track this down in QEMU - exceptions are not delivered by accident so you should be able to run QEMU in the debugger and figure out what is going on when this happens.
Thank you for your comment. I will try to continue debugging including QEMU.
Thanks, Masahisa
- PSCI related area "PLAT_QEMU_HOLD_BASE"(starts from bottom of
the secure SRAM(0x20000000)) is as expected. Secondary core#1 checks 0x20000010 in poll_mailbox() loop, the value(0) means secondary core shall keep to sleep with wfe().
INFO: 0x20000000 3fcee114 INFO: 0x20000008 0 INFO: 0x20000010 0 INFO: 0x20000018 0
This log is output in every spm_mm SMC handler of primary core.
The EL3 code and logic looks correct to me. I suspect this is a QEMU issue.
- When I load same TF-A/BL32(StandaloneMM) image and disable UEFI secure boot
on edk2, this issue does not occur. It means calling MM_COMMUNICATE service causes this issue.
Does anyone have any idea/clue to debug of this issue?
Thanks, Masahisa