Hi All,
Sorry for long email, but I need your advice to debug my current problem.
[Issue] SynchronousExceptionSPx() occurs in secondary core while primary core is executing to boot edk2. # I use almost latest TF-A and edk2 source code. # TF-A Includes Graeme's patches to support 512 cores.
[Detail] In sbsa-qemu boot flow, primary core starts bl1 -> bl2 -> bl31 -> bl32 -> edk2. My sbsa-qemu enables UEFI secure variables, so edk2 calls MM_COMMUNICATE service and access uefi variables using StandaloneMM variable/flash drivers running in Secure-EL0.
For secondary cores, all secondary cores sleep with wfe() in plat/qemu/common/aarch64/plat_helpers.S::plat_secondary_cold_boot_setup() in bl1, until linux kernel calls PSCI(cpu on).
The below gdb output is at edk2 boot. At this point, everythig is fine and edk2 <-> StandaloneMM communication works.
--- (gdb) info thread Id Target Id Frame * 1 Thread 1.1 (CPU#0 [running]) spm_mm_smc_handler (smc_fid=3288334433, x1=0, x2=0, x3=0, x4=0, cookie=0x0 <bl1_entrypoint>, handle=0x3ff8a010 <sp_ctx+16>, flags=0) at services/std_svc/spm_mm/spm_mm_main.c:280 2 Thread 1.2 (CPU#1 [running]) plat_secondary_cold_boot_setup () at plat/qemu/common/aarch64/plat_helpers.S:84 ---
Several MM_COMMUNICATE service call occurs in edk2, suddenly SynchronousExceptionSPx() is invoked in secondary core. (To make issue simple, I start qemu with "-smp 2") # I set breakpoint at SynchronousExceptionSPx() in secondary core. --- (gdb) info thread Id Target Id Frame 1 Thread 1.1 (CPU#0 [running]) 0x0000000020aef988 in ?? () * 2 Thread 1.2 (CPU#1 [running]) SynchronousExceptionSPx () at bl1/aarch64/bl1_exceptions.S:54 --- After that, secondary core goes into panic_handler() and could not boot anymore.
EDK2 and standaloneMM runs only on primary core until linux issues PSCI call, so I don't have any idea why SynchronousException(generated by software?) occurs in secondary core.
[Observation] 1) Some EL3 registers are as follows when secondary core stops with breakpoint at SynchronousExceptionSPx ().
ELR_EL3 0x31c4 ESR_EL3 0x2000000 FAR_EL3 0x0
ELR_EL3 points 0x31c4, it is wfe() at plat_secondary_cold_boot_setup(). Does it mean SynchronousException occurs while secondary core sleep with wfe()? # All register dump and disassembled bl1.elf are attached.
2) PSCI related area "PLAT_QEMU_HOLD_BASE"(starts from bottom of the secure SRAM(0x20000000)) is as expected. Secondary core#1 checks 0x20000010 in poll_mailbox() loop, the value(0) means secondary core shall keep to sleep with wfe(). --- INFO: 0x20000000 3fcee114 INFO: 0x20000008 0 INFO: 0x20000010 0 INFO: 0x20000018 0 --- This log is output in every spm_mm SMC handler of primary core.
3) When I load same TF-A/BL32(StandaloneMM) image and disable UEFI secure boot on edk2, this issue does not occur. It means calling MM_COMMUNICATE service causes this issue.
Does anyone have any idea/clue to debug of this issue?
Thanks, Masahisa