On Wed, Nov 24, 2021 at 11:17 PM Yicong Yang yangyicong@hisilicon.com wrote:
This is the follow-up work to support cluster scheduler. Previously we have added cluster level in the scheduler[1] to make tasks spread between clusters to bring more memory bandwidth and decrease cache contention. But it may hurt some workloads which are sensitive to the communication latency as they will be placed across clusters.
We modified the select_idle_cpu() on the wake affine path in this series, expecting the wake affined task to be woken more likely on the same cluster with the waker. The latency will be decreased as the waker and wakee in the same cluster may benefit from the hot L3 cache tag.
if the task runs in the same cluster with the scanned target, data synchronization cost will be lower. the task can wake up either in the cluster of waker, or the cluster of the wakee based on the return of wake_wide. so to be more accurate, we are not always putting waker and wakee in the same cluster. We are trying to put the task in the same cluster of the target so it can either get active cache from waker, or get its old cache from wakee.
in case a wakes up b, if we scan from a, we get the new cache A wrote to B just now; if we scan from b as the target, we get the old cache of B;
In both cases, cache synchronization cost is lower by finding an idle cpu within the cluster of a or b.
[1] https://lore.kernel.org/lkml/20210924085104.44806-1-21cnbao@gmail.com/
we are able to directly refer to commit id as it has been mainlined.
Hi Tim and Barry, This the modified patch of packing path of cluster scheduler and tests have been done on Kunpeng 920 2-socket 4-NUMA 128core platform, with 8 clusters on each NUMA. Patches based on 5.16-rc1.
Compared to the previous one[2], we give up scanning the first cpu of the cluster as the cpu id may not be continuous. So we pickup the way of scanning cluster first before LLC. The result from tbench and pgbench are rather positive.
[2] https://op-lists.linaro.org/pipermail/linaro-open-discussions/2021-October/0...
Barry Song (2): sched: Add per_cpu cluster domain info sched/fair: Scan cluster before scanning LLC in wake-up path
include/linux/sched/sd_flags.h | 9 ++++++++ include/linux/sched/topology.h | 2 +- kernel/sched/fair.c | 41 +++++++++++++++++++++++++++++++++- kernel/sched/sched.h | 1 + kernel/sched/topology.c | 5 +++++ 5 files changed, 56 insertions(+), 2 deletions(-)
-- 2.33.0
Thanks Barry