Hi,
I am looking for some suggestions about dvm [1].
we are testing sva with openssl, and dvm is enabled by default to broadcast TLB maintenance. And thp is enabled in system by default, a daemon khugepaged (mm/khugepaged.c) is running to collapse memory to huge page in a period of time.
And we found thp: khugepaged may cause issue to sva test case. https://github.com/Linaro/uadk/issues/215
Two cases: 1. Heavy weight test case, async mode, 36+ jobs. openssl speed -elapsed -engine uadk -async_jobs 36 rsa2048
Once collapse_huge_page happens. hardware may hung in io page fault, there maybe huge numbers of page fault keeps happening, while usually only several io page fault reported.
2. With high thp scan frequence, low weight test case, sync mode, 1 job. data may not correct. sudo openssl speed -engine uadk -seconds 1 rsa2048 Doing 2048 bits public rsa's for 1s: RSA verify failure
Two workarounds: 1. disable thp echo never > /sys/kernel/mm/transparent_hugepage/enabled
2. enable thp but add tlbi and ignore dvm. Adding arm_smmu_tlb_inv_range in arm-smmu-v3-sva.c: arm_smmu_mm_invalidate_range. It is called by khugepaged: collapse_huge_page-> mmu_notifier_invalidate_range_end
Looks dvm is not taking effect in some corner cases.
Questions 1. if khugepaged collapse the memory used by device, and then change tlb, can dvm sync this tlb change to smmu.
2. Any possible dma is just using the memory, and collapsed by khugepaged, can dvm handle this case? Or khugepaged should not touch memory using by device, looks khugepaged can not distinguish.
[1] DVM Distributed Virtual Memory, a protocol for interconnect messages to provide broadcast TLB maintenance operations (among other things).
Thanks
linaro-open-discussions@op-lists.linaro.org