SynchronousException occurs in secondary core while booting edk2 - Asa-dev - op-lists.linaro.org

26 Feb 2021


      Hi All,
Sorry for long email, but I need your advice to debug my current problem.
[Issue]
SynchronousExceptionSPx() occurs in secondary core
while primary core is executing to boot edk2.
 # I use almost latest TF-A and edk2 source code.
 # TF-A Includes Graeme's patches to support 512 cores.
[Detail]
In sbsa-qemu boot flow, primary core starts bl1 -> bl2 -> bl31 -> bl32 -> edk2.
My sbsa-qemu enables UEFI secure variables, so edk2 calls MM_COMMUNICATE service
and access uefi variables using StandaloneMM variable/flash drivers
running in Secure-EL0.
For secondary cores, all secondary cores sleep with wfe()
in plat/qemu/common/aarch64/plat_helpers.S::plat_secondary_cold_boot_setup()
in bl1,
until linux kernel calls PSCI(cpu on).
The below gdb output is at edk2 boot. At this point, everythig is fine and
edk2 <-> StandaloneMM communication works.
---
(gdb) info thread
  Id   Target Id         Frame
* 1    Thread 1.1 (CPU#0 [running]) spm_mm_smc_handler (smc_fid=3288334433,
    x1=0, x2=0, x3=0, x4=0, cookie=0x0 <bl1_entrypoint>,
    handle=0x3ff8a010 <sp_ctx+16>, flags=0)
    at services/std_svc/spm_mm/spm_mm_main.c:280
  2    Thread 1.2 (CPU#1 [running]) plat_secondary_cold_boot_setup ()
    at plat/qemu/common/aarch64/plat_helpers.S:84
---
Several MM_COMMUNICATE service call occurs in edk2,
suddenly SynchronousExceptionSPx() is invoked in secondary core.
(To make issue simple, I start qemu with "-smp 2")
 # I set breakpoint at SynchronousExceptionSPx() in secondary core.
---
(gdb) info thread
  Id   Target Id         Frame
  1    Thread 1.1 (CPU#0 [running]) 0x0000000020aef988 in ?? ()
* 2    Thread 1.2 (CPU#1 [running]) SynchronousExceptionSPx ()
    at bl1/aarch64/bl1_exceptions.S:54
---
After that, secondary core goes into panic_handler() and could not boot anymore.
EDK2 and standaloneMM runs only on primary core until linux issues PSCI call,
so I don't have any idea why SynchronousException(generated by software?) occurs
in secondary core.
[Observation]
1) Some EL3 registers are as follows when secondary core stops
with breakpoint at SynchronousExceptionSPx ().
ELR_EL3        0x31c4
ESR_EL3        0x2000000
FAR_EL3        0x0
ELR_EL3 points 0x31c4, it is wfe() at plat_secondary_cold_boot_setup().
Does it mean SynchronousException occurs while secondary core sleep with wfe()?
 # All register dump and disassembled bl1.elf are attached.
2) PSCI related area "PLAT_QEMU_HOLD_BASE"(starts from bottom of
the secure SRAM(0x20000000)) is as expected.
Secondary core#1 checks 0x20000010 in poll_mailbox() loop, the value(0) means
secondary core shall keep to sleep with wfe().
---
INFO:    0x20000000 3fcee114
INFO:    0x20000008 0
INFO:    0x20000010 0
INFO:    0x20000018 0
---
This log is output in every spm_mm SMC handler of primary core.
3) When I load same TF-A/BL32(StandaloneMM) image and disable UEFI secure boot
on edk2, this issue does not occur. It means calling MM_COMMUNICATE
service causes
this issue.
Does anyone have any idea/clue to debug of this issue?
Thanks,
Masahisa