On Wed, Dec 21, 2022 at 05:39:32PM +0000, Teo Couprie Diaz wrote:
On 21/12/2022 17:00, Beata Michalska wrote:
On Wed, Dec 21, 2022 at 09:59:04AM +0000, Teo Couprie Diaz wrote:
On 20/12/2022 17:18, Kevin Brodsky wrote:
On 20/12/2022 10:45, Teo Couprie Diaz wrote:
In our debian-based distribution, the two files used in lib/tst_pid are not available, but systemd still imposes a task limit far lesser than the kernel pid_max. Add another file that seems to be always available to read the maximum number of PIDs.
This fixed msgstress04, but it appeared that msgstress03 didn't account for all of its PIDs, so it still hit the limit. Reduce the number of free PIDs by 10% in msgstress03 to account for it.
Signed-off-by: Teo Couprie Diaz teo.coupriediaz@arm.com
lib/tst_pid.c | 4 ++++ testcases/kernel/syscalls/ipc/msgstress/msgstress03.c | 2 +- 2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/lib/tst_pid.c b/lib/tst_pid.c index 21cadef2a..3c10e0298 100644 --- a/lib/tst_pid.c +++ b/lib/tst_pid.c @@ -33,6 +33,7 @@ #define PID_MAX_PATH "/proc/sys/kernel/pid_max" #define CGROUPS_V1_SLICE_FMT "/sys/fs/cgroup/pids/user.slice/user-%d.slice/pids.max" #define CGROUPS_V2_SLICE_FMT "/sys/fs/cgroup/user.slice/user-%d.slice/pids.max" +#define CGROUPS_V2_INIT_SCOPE "/sys/fs/cgroup/pids/init.scope/pids.max"
On Arch I don't get pids/ under cgroup/, what I have is init.scope/ directly underneath, that is:
� /sys/fs/cgroup/init.scope/pids.max
Not sure if that's a configuration difference? In any case the idea sounds sensible.
Interesting, thanks for pointing it out. I do have the same on my personal Arch machine, but on my Ubuntu laptop I have the same as in our Debian image. Looking at the defines with a bit more attention, it seems like it could be a cgroups v1/v2 difference. The path I added looking like a v1 thing (similar to CGRUPS_V1_SLICE_FMT), rather than a v2.
I'll try to have a look and see if I can get our image to use cgroups v2.
Would it make sense to add two paths if I don't find another solution, to cover all of our bases ?
/* Leave some available processes for the OS */ #define PIDS_RESERVE 50 @@ -103,6 +104,9 @@ static int get_session_pids_limit(void (*cleanup_fn) (void)) if (max_pids < 0) max_pids = read_session_pids_limit(CGROUPS_V1_SLICE_FMT, uid, cleanup_fn);
- if (max_pids < 0)
max_pids = read_session_pids_limit(CGROUPS_V2_INIT_SCOPE, uid,
if (max_pids < 0) return -1;cleanup_fn);
diff --git a/testcases/kernel/syscalls/ipc/msgstress/msgstress03.c b/testcases/kernel/syscalls/ipc/msgstress/msgstress03.c index 3cb70ab18..f0a631479 100644 --- a/testcases/kernel/syscalls/ipc/msgstress/msgstress03.c +++ b/testcases/kernel/syscalls/ipc/msgstress/msgstress03.c @@ -109,7 +109,7 @@ int main(int argc, char **argv) } }
- free_pids = tst_get_free_pids(cleanup);
- free_pids = tst_get_free_pids(cleanup) * 0.9;
Floating point calculations on integers are typically avoided if possible, you could simply use * 9 / 10 here.
Interesting, I didn't know : thanks for pointing it out !
Otherwise that idea sounds sensible to me too, I found it quite strange that the test attempts to use literally all the remaining PIDs in fact.
It's one of the more aggressive tests, but in theory it leaves 50 PIDs free (that's directly handled by tst_get_free_pids_. But it goes over that quite a bit anyway, as explained.
I think I am missing smth still: "....it seems like msgstress03 had some accounting issues, the real number of PIDs in use always being 5-10% greater than what it thought it was."
Isn't that expected ? Especially if the system is busy?
I think it _would_ indeed be expected if the system was busy. The error message does highlight it !
However, this is clearly not the case of my system during the test : before running msgstress03, the current task count in the scope was always less than 10, and quite stable. As the test is the only thing loading the system in my case, it should be quite close to the real task count. However, when it hit the limit (which was about 4900 PIDs), it was forking its 4500-4600th children, which is what lead me to this conclusion.
msgstress04 doesn't seem to suffer from such a large disparity. (It being much more cautious and halving the free_pids probably helps !)
Any reason for not incorporating free_pids in msgstress03 ? (Just a quick thought)
tst_get_free_pids_ takes a 'snapshot' of the number of processes running at the time that function is being called but that does not mean that number will stand still - number of running processes may change behind the scenes.
Yep, completely agree.
The worrying part is that the buffer of 50 is not big enough to accommodate that. Is my understanding correct that the test reports a failure if it fails to spawn expected number of processes ?
The test fails if any of the forks fail, so in a way yes ? (It will report a TFAIL for each failing fork to be exact)
Is there a chance the fork fails for a different reason that exceeding the max pids limit ?
If so, wouldn't it be better to, at that point, compare number of running processes - number of test forks against number of expected forks to see if there are other processes eating up max pid allowance ? Or did I misundertand the issue in the first place ?
I believe that's what I tried to do, in my explanation above. Do tell me if I didn't understand the suggestion properly. I could always go back and be more thorough in investigating the why there is such a disparity, but there's not really much going on on the system when doing those tests, and others tests don't appear to have such an issue.
So what I was thinking is, at the point of failure, check the number of active processes (within a cgroup) and compare that with the max pids - for debugging purposes only. There seems to be also an event counter in the pids controller reflecting number of forks failed due to imposed limit - might be worth to check that to (again for debugging purposes, can be done outside of the test itself). I must admit I am not a big fan of arbitrary limits which might work for some cases, but not the others, and it does slightly feel like being more of covering the issue instead of fixing it. On the other hand, the test failure might be somewhat 'expected'. BTW: did we try to run that on cgrup-free system ?
--- BR B.
Thanks for the comments, Téo
BR B.
Kevin
Thanks for the review ! T�o
if (nprocs >= free_pids) { tst_resm(TINFO, "Requested number of processes higher than limit (%d > %d), "
linux-morello-ltp mailing list -- linux-morello-ltp@op-lists.linaro.org To unsubscribe send an email to linux-morello-ltp-leave@op-lists.linaro.org
linux-morello-ltp mailing list -- linux-morello-ltp@op-lists.linaro.org To unsubscribe send an email to linux-morello-ltp-leave@op-lists.linaro.org