This is the follow-up work to support cluster scheduler. Previously we have added cluster level in the scheduler[1] to make tasks spread between clusters to bring more memory bandwidth and decrease cache contention. But it may hurt some workloads which are sensitive to the communication latency as they will be placed across clusters.
We modified the select_idle_cpu() on the wake affine path in this series, expecting the wake affined task to be woken more likely on the same cluster with the waker. The latency will be decreased as the waker and wakee in the same cluster may benefit from the hot L3 cache tag.
[1] https://lore.kernel.org/lkml/20210924085104.44806-1-21cnbao@gmail.com/
Hi Tim and Barry, This the modified patch of packing path of cluster scheduler and tests have been done on Kunpeng 920 2-socket 4-NUMA 128core platform, with 8 clusters on each NUMA. Patches based on 5.15-rc1.
Barry Song (2): sched: Add per_cpu cluster domain info sched/fair: Scan from the first cpu of cluster if presents in select_idle_cpu
include/linux/sched/sd_flags.h | 9 +++++++++ include/linux/sched/topology.h | 2 +- kernel/sched/fair.c | 10 +++++++--- kernel/sched/sched.h | 1 + kernel/sched/topology.c | 5 +++++ 5 files changed, 23 insertions(+), 4 deletions(-)