On 6/17/21 4:33 AM, Yicong Yang wrote:
(w/o is without the patch, the bigger the rate is the better)
Then I test the mcf_r alone with different copies and bind to NUMA 0:
Base Base Run Time Rate ------- ---------
4 Copies w/o 618 (w 580) w/o 10.5 (w 11.1) 8 Copies w/o 645 (w 647) w/o 20.0 (w 20) 16 Copies w/o 849 (w 844) w/o 30.4 (w 30.6)
As I checked from the htop, the tasks running on the cpu didn't spread through the clusters rigidly.
Looking at the code, it seems like the active load balance path should run to move a running task from CPU in the overloaded cluster to the empty cluster in the test in the 4 copies test.
Wonder whether in the mean time the task has slept as we do active balance via cpu_stop, which takes some time to stop the CPUs. So we fail to move the task as it is no longer running.
I am wondering if we are incurring more active load balance cpu stop overhead without reaping the benefit of actually balancing the tasks.
Do you notice increase in the rate of calls to active_load_balance_cpu_stop for the 4 copies case compared to vanilla kernel?
I didn't apply Patch #3 as I met some conflicts and didn't try to resolve it. As we're testing on arm64 I think it's okay to test without patch #3.
The machine I have tested have 128 cores in 2 sockets and 4 numas with 32 cores each. Of course, still 4 cores in one cluster. Below are the memory info through numa:
Any comments? I notice Tim observed that sleep and wakeup will have some influences. So I wonder whether the speccpu intrate test also suffers from this.
This could be the case. Probably should check to see if a single copy mcf has some sleep or get blocked frequently.
Tim