[PATCH 0/3] Misc runltp/runtest patches

List overview All Threads
Download

newer

older

[PATCH v1] runtest: add keyctl...

[LTP PATCH v1] runtest: add...

Zachary Leaf

31 Aug 2022 31 Aug '22

12:42 p.m.

Hi,

Grouping up what should be some small and fairly uncontroversial patches to the transitional runtest lists and runltp script.

Available as review branch at: https://git.morello-project.org/zdleaf/morello-linux-test-project/-/commits/...

Thanks, Zach

Zachary Leaf (3): runtest: add shmat to extended PCuABI syscall list runtest: remove mmap1/2/3 tests from PCuABI list runltp: increase default timeout

runltp | 2 ++ runtest/morello_transitional | 4 ---- runtest/morello_transitional_extended | 4 ++++ 3 files changed, 6 insertions(+), 4 deletions(-)

-- 2.25.1

Show replies by date

Zachary Leaf

31 Aug 31 Aug

12:42 p.m.

New subject: [PATCH 1/3] runtest: add shmat to extended PCuABI syscall list

Signed-off-by: Zachary Leaf zachary.leaf@arm.com --- runtest/morello_transitional_extended | 4 ++++ 1 file changed, 4 insertions(+)

diff --git a/runtest/morello_transitional_extended b/runtest/morello_transitional_extended index a31f0884c..a7699ce5a 100644 --- a/runtest/morello_transitional_extended +++ b/runtest/morello_transitional_extended @@ -17,3 +17,7 @@ semget06 semget06 semop01 semop01 semop02 semop02 semop03 semop03 + +shmat01 shmat01 +shmat02 shmat02 +shmat03 shmat03

-- 2.25.1

Carsten Haitzler

2 Sep 2 Sep

3:14 p.m.

New subject: [PATCH 1/3] runtest: add shmat to extended PCuABI syscall list

On 8/31/22 13:42, Zachary Leaf wrote:

...

Signed-off-by: Zachary Leaf zachary.leaf@arm.com

runtest/morello_transitional_extended | 4 ++++ 1 file changed, 4 insertions(+)

diff --git a/runtest/morello_transitional_extended b/runtest/morello_transitional_extended index a31f0884c..a7699ce5a 100644 --- a/runtest/morello_transitional_extended +++ b/runtest/morello_transitional_extended @@ -17,3 +17,7 @@ semget06 semget06 semop01 semop01 semop02 semop02 semop03 semop03

+shmat01 shmat01 +shmat02 shmat02 +shmat03 shmat03

Zachary Leaf

31 Aug 31 Aug

12:42 p.m.

New subject: [PATCH 2/3] runtest: remove mmap1/2/3 tests from PCuABI list

Commit 35cda2b9c8b9 (build.sh: Allow specifying make targets) enabled building only specific test suites, e.g. only syscall tests.

For testing PCuABI, only the kernel/syscall set of tests is targeted.

Since the mmap1/mmap2/mmap3 tests are part of the mm (kernel/mem) tests, remove them from the morello_transitional list as they will no longer be built by default when testing PCuABI.

Reported-by: Kevin Brodsky kevin.brodsky@arm.com Signed-off-by: Zachary Leaf zachary.leaf@arm.com --- runtest/morello_transitional | 4 ---- 1 file changed, 4 deletions(-)

diff --git a/runtest/morello_transitional b/runtest/morello_transitional index ddef45787..568d00ca1 100644 --- a/runtest/morello_transitional +++ b/runtest/morello_transitional @@ -193,10 +193,6 @@ madvise10 madvise10

mmap001 mmap001 -m 1

-mmap1 mmap1 -mmap2 mmap2 -a -p -mmap3 mmap3 -p - mmap01 mmap01 mmap02 mmap02 mmap03 mmap03

-- 2.25.1

Carsten Haitzler

2 Sep 2 Sep

3:13 p.m.

New subject: [PATCH 2/3] runtest: remove mmap1/2/3 tests from PCuABI list

On 8/31/22 13:42, Zachary Leaf wrote:

...

Commit 35cda2b9c8b9 (build.sh: Allow specifying make targets) enabled building only specific test suites, e.g. only syscall tests.

For testing PCuABI, only the kernel/syscall set of tests is targeted.

Since the mmap1/mmap2/mmap3 tests are part of the mm (kernel/mem) tests, remove them from the morello_transitional list as they will no longer be built by default when testing PCuABI.

Reported-by: Kevin Brodsky kevin.brodsky@arm.com Signed-off-by: Zachary Leaf zachary.leaf@arm.com

runtest/morello_transitional | 4 ---- 1 file changed, 4 deletions(-)

diff --git a/runtest/morello_transitional b/runtest/morello_transitional index ddef45787..568d00ca1 100644 --- a/runtest/morello_transitional +++ b/runtest/morello_transitional @@ -193,10 +193,6 @@ madvise10 madvise10 mmap001 mmap001 -m 1 -mmap1 mmap1 -mmap2 mmap2 -a -p -mmap3 mmap3 -p

mmap01 mmap01 mmap02 mmap02 mmap03 mmap03

Beata Michalska

8 Sep 8 Sep

11:57 a.m.

New subject: [PATCH 2/3] runtest: remove mmap1/2/3 tests from PCuABI list

Now applied on both: master & next. Thanks!

--- BR B. On Wed, Aug 31, 2022 at 01:42:48PM +0100, Zachary Leaf wrote:

...

Commit 35cda2b9c8b9 (build.sh: Allow specifying make targets) enabled building only specific test suites, e.g. only syscall tests.

For testing PCuABI, only the kernel/syscall set of tests is targeted.

Since the mmap1/mmap2/mmap3 tests are part of the mm (kernel/mem) tests, remove them from the morello_transitional list as they will no longer be built by default when testing PCuABI.

Reported-by: Kevin Brodsky kevin.brodsky@arm.com Signed-off-by: Zachary Leaf zachary.leaf@arm.com

runtest/morello_transitional | 4 ---- 1 file changed, 4 deletions(-)

diff --git a/runtest/morello_transitional b/runtest/morello_transitional index ddef45787..568d00ca1 100644 --- a/runtest/morello_transitional +++ b/runtest/morello_transitional @@ -193,10 +193,6 @@ madvise10 madvise10 mmap001 mmap001 -m 1 -mmap1 mmap1 -mmap2 mmap2 -a -p -mmap3 mmap3 -p

mmap01 mmap01 mmap02 mmap02 mmap03 mmap03 -- 2.25.1

linux-morello-ltp mailing list -- linux-morello-ltp@op-lists.linaro.org To unsubscribe send an email to linux-morello-ltp-leave@op-lists.linaro.org

Zachary Leaf

31 Aug 31 Aug

12:42 p.m.

New subject: [PATCH 3/3] runltp: increase default timeout

Some tests, e.g. madvise06 can timeout on slow systems for some operations:

... madvise06.c:104: TINFO: dropping caches Test timeouted, sending SIGKILL!

When running tests with runltp, export LTP_TIMEOUT_MUL to double the DEFAULT_TIMEOUT value, as set in tst_test.c.

Reported-by: Kevin Brodsky kevin.brodsky@arm.com Signed-off-by Zachary Leaf zachary.leaf@arm.com --- runltp | 2 ++ 1 file changed, 2 insertions(+)

diff --git a/runltp b/runltp index 4447da156..32cf39dfb 100755 --- a/runltp +++ b/runltp @@ -79,6 +79,8 @@ setup() export LTPROOT=${PWD} export TMPBASE="/tmp" export PATH="${PATH}:${LTPROOT}/testcases/bin:${LTPROOT}/bin" + # double the DEFAULT_TIMEOUT (tst_test.c) when running tests + export LTP_TIMEOUT_MUL=2

export LTP_DEV_FS_TYPE="ext2"

-- 2.25.1

Carsten Haitzler

2 Sep 2 Sep

3:14 p.m.

New subject: [PATCH 3/3] runltp: increase default timeout

Do you have some idea why it's taking so long to need a longer timeout?

On 8/31/22 13:42, Zachary Leaf wrote:

...

Some tests, e.g. madvise06 can timeout on slow systems for some operations:
     ... madvise06.c:104: TINFO: dropping caches
     Test timeouted, sending SIGKILL!
When running tests with runltp, export LTP_TIMEOUT_MUL to double the DEFAULT_TIMEOUT value, as set in tst_test.c.

Reported-by: Kevin Brodsky kevin.brodsky@arm.com Signed-off-by Zachary Leaf zachary.leaf@arm.com

runltp | 2 ++ 1 file changed, 2 insertions(+)

diff --git a/runltp b/runltp index 4447da156..32cf39dfb 100755 --- a/runltp +++ b/runltp @@ -79,6 +79,8 @@ setup() export LTPROOT=${PWD} export TMPBASE="/tmp" export PATH="${PATH}:${LTPROOT}/testcases/bin:${LTPROOT}/bin"

# double the DEFAULT_TIMEOUT (tst_test.c) when running tests

export LTP_TIMEOUT_MUL=2

export LTP_DEV_FS_TYPE="ext2"

Zachary Leaf

4:33 p.m.

New subject: [PATCH 3/3] runltp: increase default timeout

On 02/09/2022 16:14, Carsten Haitzler wrote:

...

Do you have some idea why it's taking so long to need a longer timeout?

Not sure exactly in this particular case. The madvise06 test calls sync() and writes a "3" to /proc/sys/vm/drop_caches; if there's a lot of stuff in the cache, that seems to take longer than the default timeout of 30s on the Morello board. Subsequent calls are fast but then there's nothing further to write to storage or drop from cache.

...

30s does seem like a long time for this (?) but I don't know much about

sync() or dropping caches to say. Slow writes to fs? Slow cache access?

In any case I've been finding it hard to replicate. Attempting to fill cache by reading the entire fs to /dev/null but no luck so far: find / -type f -exec cat '{}' >> /dev/null ;

Any ideas?

Thanks, Zach

...

On 8/31/22 13:42, Zachary Leaf wrote:

...
Some tests, e.g. madvise06 can timeout on slow systems for some operations:

... madvise06.c:104: TINFO: dropping caches Test timeouted, sending SIGKILL!

When running tests with runltp, export LTP_TIMEOUT_MUL to double the DEFAULT_TIMEOUT value, as set in tst_test.c.

Reported-by: Kevin Brodsky kevin.brodsky@arm.com Signed-off-by Zachary Leaf zachary.leaf@arm.com

runltp | 2 ++ 1 file changed, 2 insertions(+)

diff --git a/runltp b/runltp index 4447da156..32cf39dfb 100755 --- a/runltp +++ b/runltp @@ -79,6 +79,8 @@ setup() export LTPROOT=${PWD} export TMPBASE="/tmp" export PATH="${PATH}:${LTPROOT}/testcases/bin:${LTPROOT}/bin" + # double the DEFAULT_TIMEOUT (tst_test.c) when running tests + export LTP_TIMEOUT_MUL=2 export LTP_DEV_FS_TYPE="ext2"

Carsten Haitzler

5 Sep 5 Sep

2 p.m.

New subject: [PATCH 3/3] runltp: increase default timeout

On 9/2/22 17:33, Zachary Leaf wrote:

...

On 02/09/2022 16:14, Carsten Haitzler wrote:

...
Do you have some idea why it's taking so long to need a longer timeout?

Not sure exactly in this particular case. The madvise06 test calls sync() and writes a "3" to /proc/sys/vm/drop_caches; if there's a lot of stuff in the cache, that seems to take longer than the default timeout of 30s on the Morello board. Subsequent calls are fast but then there's nothing further to write to storage or drop from cache.

hmm and you're using the sata ssd on the device? or nfsroot write?

...

...
30s does seem like a long time for this (?) but I don't know much about

sync() or dropping caches to say. Slow writes to fs? Slow cache access?

In any case I've been finding it hard to replicate. Attempting to fill cache by reading the entire fs to /dev/null but no luck so far: find / -type f -exec cat '{}' >> /dev/null ;

Any ideas?

well other than "you're using nfs or remote storage" ... or something very peculiar is going on in the io layer?

...

Thanks, Zach

...
On 8/31/22 13:42, Zachary Leaf wrote:

...
Some tests, e.g. madvise06 can timeout on slow systems for some operations:

... madvise06.c:104: TINFO: dropping caches Test timeouted, sending SIGKILL!

When running tests with runltp, export LTP_TIMEOUT_MUL to double the DEFAULT_TIMEOUT value, as set in tst_test.c.

Reported-by: Kevin Brodsky kevin.brodsky@arm.com Signed-off-by Zachary Leaf zachary.leaf@arm.com

runltp | 2 ++ 1 file changed, 2 insertions(+)

diff --git a/runltp b/runltp index 4447da156..32cf39dfb 100755 --- a/runltp +++ b/runltp @@ -79,6 +79,8 @@ setup() export LTPROOT=${PWD} export TMPBASE="/tmp" export PATH="${PATH}:${LTPROOT}/testcases/bin:${LTPROOT}/bin" + # double the DEFAULT_TIMEOUT (tst_test.c) when running tests + export LTP_TIMEOUT_MUL=2 export LTP_DEV_FS_TYPE="ext2"

Zachary Leaf

6 Sep 6 Sep

10:42 a.m.

New subject: [PATCH 3/3] runltp: increase default timeout

On 05/09/2022 15:00, Carsten Haitzler wrote:

...

On 9/2/22 17:33, Zachary Leaf wrote:

...
On 02/09/2022 16:14, Carsten Haitzler wrote:

...
Do you have some idea why it's taking so long to need a longer timeout?

Not sure exactly in this particular case. The madvise06 test calls sync() and writes a "3" to /proc/sys/vm/drop_caches; if there's a lot of stuff in the cache, that seems to take longer than the default timeout of 30s on the Morello board. Subsequent calls are fast but then there's nothing further to write to storage or drop from cache.

hmm and you're using the sata ssd on the device? or nfsroot write?

Ah - actually using a 64GB USB drive to boot from and as root fs, not using the SSD at all. That would explain slow write speeds.

I actually didn't know they had SSDs, our setup is a board farm so the shared USB drive allows for easy re-imaging.

...

...
30s does seem like a long time for this (?) but I don't know much about sync() or dropping caches to say. Slow writes to fs? Slow cache access?

In any case I've been finding it hard to replicate. Attempting to fill cache by reading the entire fs to /dev/null but no luck so far: find / -type f -exec cat '{}' >> /dev/null ; Any ideas?

well other than "you're using nfs or remote storage" ... or something very peculiar is going on in the io layer?

Okay, I can replicate it now. It is slow USB write speed that is the issue.

If there are lot of cached writes for some reason (e.g. some LTP tests write data to disk, other activity on the system), then calling sync() can be slow.

I can replicate this by writing enough of /dev/urandom to a file then running madvise06 test: $ head -c 1G /dev/urandom > file $ ./runltp -f syscalls -s madvise06 [...] madvise06.c:104: TINFO: dropping caches Test timeouted, sending SIGKILL!

Kevin suggested if write speed is an issue, then we can keep the current USB fs/boot sequence unchanged and mount the SSD as temporary storage for LTP tests. Temporary directory can be changed by passing -d /path/to/storage to runltp script. I think that's a good idea for a general improvement to CI, and should help speed things up (depending on how much disk I/O the tests do - TBC).

For now, and to cover all bases for all users I think doubling the default timeout here is fine. If we see further issues we can look into it again.

Thanks, Zach

...

...
Thanks, Zach

...
On 8/31/22 13:42, Zachary Leaf wrote:

...
Some tests, e.g. madvise06 can timeout on slow systems for some operations:

... madvise06.c:104: TINFO: dropping caches           Test timeouted, sending SIGKILL!

When running tests with runltp, export LTP_TIMEOUT_MUL to double the DEFAULT_TIMEOUT value, as set in tst_test.c.

Reported-by: Kevin Brodsky kevin.brodsky@arm.com Signed-off-by Zachary Leaf zachary.leaf@arm.com

runltp | 2 ++    1 file changed, 2 insertions(+)

diff --git a/runltp b/runltp index 4447da156..32cf39dfb 100755 --- a/runltp +++ b/runltp @@ -79,6 +79,8 @@ setup()        export LTPROOT=${PWD}        export TMPBASE="/tmp"        export PATH="${PATH}:${LTPROOT}/testcases/bin:${LTPROOT}/bin" +    # double the DEFAULT_TIMEOUT (tst_test.c) when running tests +    export LTP_TIMEOUT_MUL=2          export LTP_DEV_FS_TYPE="ext2"

Carsten Haitzler

11:02 a.m.

New subject: [PATCH 3/3] runltp: increase default timeout

On 9/6/22 11:42, Zachary Leaf wrote:

...

On 05/09/2022 15:00, Carsten Haitzler wrote:

...
On 9/2/22 17:33, Zachary Leaf wrote:

...
On 02/09/2022 16:14, Carsten Haitzler wrote:

...
Do you have some idea why it's taking so long to need a longer timeout?

Not sure exactly in this particular case. The madvise06 test calls sync() and writes a "3" to /proc/sys/vm/drop_caches; if there's a lot of stuff in the cache, that seems to take longer than the default timeout of 30s on the Morello board. Subsequent calls are fast but then there's nothing further to write to storage or drop from cache.

hmm and you're using the sata ssd on the device? or nfsroot write?

Ah - actually using a 64GB USB drive to boot from and as root fs, not using the SSD at all. That would explain slow write speeds.

Oh ... don't do that! USB thumbdrives are mostly atrocious with writes. Reads - they do OK in the most part. If you must use a USB drive - buy a USB SSD - something proper. This is what I've got plugged into my Juno and my raspberry pi's - no way I rely on micro-sd for rootfs of anything heavily use. USB2/3 bandwidth will then be your limitation if you get a proper USB SSD. You won't get these insane stalls that can hang for 10's of seconds or even minutes. :)

Changing the test cases (timeout) to work around what I might say is "poor hardware choice for this kind of workload" I think is probably a bad idea. :)

...

I actually didn't know they had SSDs, our setup is a board farm so the shared USB drive allows for easy re-imaging.

Yeah. This is why I asked. :) They do have SSD's. /dev/sda should be the built in sata SSD drive. poke around at it, but I might advise in formatting it to something like ext4, make / world writable on that disk, mount it like /mnt/tmp and then store all your test logs/results on this drive somewhere. use it as a scratch drive for this stuff. :)

...

...
...
30s does seem like a long time for this (?) but I don't know much about sync() or dropping caches to say. Slow writes to fs? Slow cache access?

In any case I've been finding it hard to replicate. Attempting to fill cache by reading the entire fs to /dev/null but no luck so far: find / -type f -exec cat '{}' >> /dev/null ; Any ideas?

well other than "you're using nfs or remote storage" ... or something very peculiar is going on in the io layer?

Okay, I can replicate it now. It is slow USB write speed that is the issue.

If there are lot of cached writes for some reason (e.g. some LTP tests write data to disk, other activity on the system), then calling sync() can be slow.

Yup. In fact it can be even worse. The kernel flushes the writes that are queued, but reads stall too - for possible many seconds or longer while these writes get flushed it. I've seen this again and again with USB thumbdrives and SD cards. They have poor firmware that stalls when writing and won't listen to any read IO requests while doing the write. thumbdrives and sd cards are ok for 99.9% read-only filesystems that you rarely modify and if you do - you don't care if they stall for a while.

...

I can replicate this by writing enough of /dev/urandom to a file then running madvise06 test: $ head -c 1G /dev/urandom > file $ ./runltp -f syscalls -s madvise06 [...] madvise06.c:104: TINFO: dropping caches Test timeouted, sending SIGKILL!

Kevin suggested if write speed is an issue, then we can keep the current USB fs/boot sequence unchanged and mount the SSD as temporary storage

exactly my suggestion above :)

...

for LTP tests. Temporary directory can be changed by passing -d /path/to/storage to runltp script. I think that's a good idea for a general improvement to CI, and should help speed things up (depending on how much disk I/O the tests do - TBC).

absolutely. right way to go IMHO.

...

For now, and to cover all bases for all users I think doubling the default timeout here is fine. If we see further issues we can look into it again.

now here comes one thing fun and un-morello related but related to testing in general. if i had a write take an unreasonably long amount of time ... i'd consider it a bug. a performance bug. it could be that some i/o queue stalls and just sits idle because of a bug/mistake and happens to flush out 10 seconds later because of some unrelated timer that wakes up and happens to then flush the queue. i think we'd all agree that if you wrote 4k to a disk and it took 24h to finally write that there is a bug to be fixed and the test case found it. :) so somewhere along the "increase timeout" path is a point where you have to decide if that is the right solution.

in this case, i'd just fix the test environment :)

...

Thanks, Zach

...
...
Thanks, Zach

...
On 8/31/22 13:42, Zachary Leaf wrote:

...
Some tests, e.g. madvise06 can timeout on slow systems for some operations:

... madvise06.c:104: TINFO: dropping caches           Test timeouted, sending SIGKILL!

When running tests with runltp, export LTP_TIMEOUT_MUL to double the DEFAULT_TIMEOUT value, as set in tst_test.c.

Reported-by: Kevin Brodsky kevin.brodsky@arm.com Signed-off-by Zachary Leaf zachary.leaf@arm.com

runltp | 2 ++    1 file changed, 2 insertions(+)

diff --git a/runltp b/runltp index 4447da156..32cf39dfb 100755 --- a/runltp +++ b/runltp @@ -79,6 +79,8 @@ setup()        export LTPROOT=${PWD}        export TMPBASE="/tmp"        export PATH="${PATH}:${LTPROOT}/testcases/bin:${LTPROOT}/bin" +    # double the DEFAULT_TIMEOUT (tst_test.c) when running tests +    export LTP_TIMEOUT_MUL=2          export LTP_DEV_FS_TYPE="ext2"

Zachary Leaf

2:24 p.m.

New subject: [PATCH 3/3] runltp: increase default timeout

On 06/09/2022 12:02, Carsten Haitzler wrote:

...

On 9/6/22 11:42, Zachary Leaf wrote:

...
On 05/09/2022 15:00, Carsten Haitzler wrote:

...
On 9/2/22 17:33, Zachary Leaf wrote:

...
On 02/09/2022 16:14, Carsten Haitzler wrote:

...
Do you have some idea why it's taking so long to need a longer timeout?

Not sure exactly in this particular case. The madvise06 test calls sync() and writes a "3" to /proc/sys/vm/drop_caches; if there's a lot of stuff in the cache, that seems to take longer than the default timeout of 30s on the Morello board. Subsequent calls are fast but then there's nothing further to write to storage or drop from cache.

hmm and you're using the sata ssd on the device? or nfsroot write?

Ah - actually using a 64GB USB drive to boot from and as root fs, not using the SSD at all. That would explain slow write speeds.

Oh ... don't do that! USB thumbdrives are mostly atrocious with writes. Reads - they do OK in the most part. If you must use a USB drive - buy a USB SSD - something proper. This is what I've got plugged into my Juno and my raspberry pi's - no way I rely on micro-sd for rootfs of anything heavily use. USB2/3 bandwidth will then be your limitation if you get a proper USB SSD. You won't get these insane stalls that can hang for 10's of seconds or even minutes. :)

Changing the test cases (timeout) to work around what I might say is "poor hardware choice for this kind of workload" I think is probably a bad idea. :)

...
I actually didn't know they had SSDs, our setup is a board farm so the shared USB drive allows for easy re-imaging.

Yeah. This is why I asked. :) They do have SSD's. /dev/sda should be the built in sata SSD drive. poke around at it, but I might advise in formatting it to something like ext4, make / world writable on that disk, mount it like /mnt/tmp and then store all your test logs/results on this drive somewhere. use it as a scratch drive for this stuff. :)

Yep - /dev/sda is SSD. I've mounted as scratch disk for now but will probably try run everything from there in future. I'm not exactly re-imaging very often, just need to be able to boot into new kernels mostly so that would work for me.

...

...
...
...
30s does seem like a long time for this (?) but I don't know much about sync() or dropping caches to say. Slow writes to fs? Slow cache access?

In any case I've been finding it hard to replicate. Attempting to fill cache by reading the entire fs to /dev/null but no luck so far: find / -type f -exec cat '{}' >> /dev/null ; Any ideas?

well other than "you're using nfs or remote storage" ... or something very peculiar is going on in the io layer?

Okay, I can replicate it now. It is slow USB write speed that is the issue.

If there are lot of cached writes for some reason (e.g. some LTP tests write data to disk, other activity on the system), then calling sync() can be slow.

Yup. In fact it can be even worse. The kernel flushes the writes that are queued, but reads stall too - for possible many seconds or longer while these writes get flushed it. I've seen this again and again with USB thumbdrives and SD cards. They have poor firmware that stalls when writing and won't listen to any read IO requests while doing the write. thumbdrives and sd cards are ok for 99.9% read-only filesystems that you rarely modify and if you do - you don't care if they stall for a while.

...
I can replicate this by writing enough of /dev/urandom to a file then running madvise06 test: $ head -c 1G /dev/urandom > file $ ./runltp -f syscalls -s madvise06 [...] madvise06.c:104: TINFO: dropping caches Test timeouted, sending SIGKILL!

Kevin suggested if write speed is an issue, then we can keep the current USB fs/boot sequence unchanged and mount the SSD as temporary storage

exactly my suggestion above :)

...
for LTP tests. Temporary directory can be changed by passing -d /path/to/storage to runltp script. I think that's a good idea for a general improvement to CI, and should help speed things up (depending on how much disk I/O the tests do - TBC).

absolutely. right way to go IMHO.

...
For now, and to cover all bases for all users I think doubling the default timeout here is fine. If we see further issues we can look into it again.

now here comes one thing fun and un-morello related but related to testing in general. if i had a write take an unreasonably long amount of time ... i'd consider it a bug. a performance bug. it could be that some i/o queue stalls and just sits idle because of a bug/mistake and happens to flush out 10 seconds later because of some unrelated timer that wakes up and happens to then flush the queue. i think we'd all agree that if you wrote 4k to a disk and it took 24h to finally write that there is a bug to be fixed and the test case found it. :) so somewhere along the "increase timeout" path is a point where you have to decide if that is the right solution.

in this case, i'd just fix the test environment :)

Point taken. It is a band-aid. I'll try update the CI to use SSD as temp dir for LTP tests, so I don't mind dropping this patch from the series.

...

...
Thanks, Zach

...
...
Thanks, Zach

...
On 8/31/22 13:42, Zachary Leaf wrote:

...
Some tests, e.g. madvise06 can timeout on slow systems for some operations:

... madvise06.c:104: TINFO: dropping caches            Test timeouted, sending SIGKILL!

When running tests with runltp, export LTP_TIMEOUT_MUL to double the DEFAULT_TIMEOUT value, as set in tst_test.c.

Reported-by: Kevin Brodsky kevin.brodsky@arm.com Signed-off-by Zachary Leaf zachary.leaf@arm.com

runltp | 2 ++     1 file changed, 2 insertions(+)

diff --git a/runltp b/runltp index 4447da156..32cf39dfb 100755 --- a/runltp +++ b/runltp @@ -79,6 +79,8 @@ setup()         export LTPROOT=${PWD}         export TMPBASE="/tmp"         export PATH="${PATH}:${LTPROOT}/testcases/bin:${LTPROOT}/bin" +    # double the DEFAULT_TIMEOUT (tst_test.c) when running tests +    export LTP_TIMEOUT_MUL=2           export LTP_DEV_FS_TYPE="ext2"

linux-morello-ltp mailing list -- linux-morello-ltp@op-lists.linaro.org To unsubscribe send an email to linux-morello-ltp-leave@op-lists.linaro.org

Beata Michalska

9 Sep 9 Sep

3:33 p.m.

New subject: [PATCH 3/3] runltp: increase default timeout

On Tue, Sep 06, 2022 at 03:24:21PM +0100, Zachary Leaf wrote:

...

On 06/09/2022 12:02, Carsten Haitzler wrote:

...
On 9/6/22 11:42, Zachary Leaf wrote:

...
On 05/09/2022 15:00, Carsten Haitzler wrote:

...
On 9/2/22 17:33, Zachary Leaf wrote:

...
On 02/09/2022 16:14, Carsten Haitzler wrote:

...
Do you have some idea why it's taking so long to need a longer timeout?

Not sure exactly in this particular case. The madvise06 test calls sync() and writes a "3" to /proc/sys/vm/drop_caches; if there's a lot of stuff in the cache, that seems to take longer than the default timeout of 30s on the Morello board. Subsequent calls are fast but then there's nothing further to write to storage or drop from cache.

hmm and you're using the sata ssd on the device? or nfsroot write?

Ah - actually using a 64GB USB drive to boot from and as root fs, not using the SSD at all. That would explain slow write speeds.

Oh ... don't do that! USB thumbdrives are mostly atrocious with writes. Reads - they do OK in the most part. If you must use a USB drive - buy a USB SSD - something proper. This is what I've got plugged into my Juno and my raspberry pi's - no way I rely on micro-sd for rootfs of anything heavily use. USB2/3 bandwidth will then be your limitation if you get a proper USB SSD. You won't get these insane stalls that can hang for 10's of seconds or even minutes. :)

Changing the test cases (timeout) to work around what I might say is "poor hardware choice for this kind of workload" I think is probably a bad idea. :)

...
I actually didn't know they had SSDs, our setup is a board farm so the shared USB drive allows for easy re-imaging.

Yeah. This is why I asked. :) They do have SSD's. /dev/sda should be the built in sata SSD drive. poke around at it, but I might advise in formatting it to something like ext4, make / world writable on that disk, mount it like /mnt/tmp and then store all your test logs/results on this drive somewhere. use it as a scratch drive for this stuff. :)

Yep - /dev/sda is SSD. I've mounted as scratch disk for now but will probably try run everything from there in future. I'm not exactly re-imaging very often, just need to be able to boot into new kernels mostly so that would work for me.

...
...
...
...
30s does seem like a long time for this (?) but I don't know much about sync() or dropping caches to say. Slow writes to fs? Slow cache access?

In any case I've been finding it hard to replicate. Attempting to fill cache by reading the entire fs to /dev/null but no luck so far: find / -type f -exec cat '{}' >> /dev/null ; Any ideas?

well other than "you're using nfs or remote storage" ... or something very peculiar is going on in the io layer?

Okay, I can replicate it now. It is slow USB write speed that is the issue.

If there are lot of cached writes for some reason (e.g. some LTP tests write data to disk, other activity on the system), then calling sync() can be slow.

Yup. In fact it can be even worse. The kernel flushes the writes that are queued, but reads stall too - for possible many seconds or longer while these writes get flushed it. I've seen this again and again with USB thumbdrives and SD cards. They have poor firmware that stalls when writing and won't listen to any read IO requests while doing the write. thumbdrives and sd cards are ok for 99.9% read-only filesystems that you rarely modify and if you do - you don't care if they stall for a while.

...
I can replicate this by writing enough of /dev/urandom to a file then running madvise06 test: $ head -c 1G /dev/urandom > file $ ./runltp -f syscalls -s madvise06 [...] madvise06.c:104: TINFO: dropping caches Test timeouted, sending SIGKILL!

Kevin suggested if write speed is an issue, then we can keep the current USB fs/boot sequence unchanged and mount the SSD as temporary storage

exactly my suggestion above :)

...
for LTP tests. Temporary directory can be changed by passing -d /path/to/storage to runltp script. I think that's a good idea for a general improvement to CI, and should help speed things up (depending on how much disk I/O the tests do - TBC).

absolutely. right way to go IMHO.

...
For now, and to cover all bases for all users I think doubling the default timeout here is fine. If we see further issues we can look into it again.

now here comes one thing fun and un-morello related but related to testing in general. if i had a write take an unreasonably long amount of time ... i'd consider it a bug. a performance bug. it could be that some i/o queue stalls and just sits idle because of a bug/mistake and happens to flush out 10 seconds later because of some unrelated timer that wakes up and happens to then flush the queue. i think we'd all agree that if you wrote 4k to a disk and it took 24h to finally write that there is a bug to be fixed and the test case found it. :) so somewhere along the "increase timeout" path is a point where you have to decide if that is the right solution.

in this case, i'd just fix the test environment :)

Point taken. It is a band-aid. I'll try update the CI to use SSD as temp dir for LTP tests, so I don't mind dropping this patch from the series.

To be fair LTP_TIMEOUT_MUL is one of the env variables (see: testcases/lib/tst_test.sh) so even if the timeout is to be extended, it can be done through specifying LTP_TIMEOUT_MUL as env var for runltp, e.g:

LTP_TIMEOUT_MUL=2 ./runltp -f syscalls -s madvise06

which gives: Timeout per run is 0h 01m 00s

So there is no need for direct change in runltp itself.

--- BR B.

...

...
...
Thanks, Zach

...
...
Thanks, Zach

...
On 8/31/22 13:42, Zachary Leaf wrote: > Some tests, e.g. madvise06 can timeout on slow systems for some > operations: > > ... madvise06.c:104: TINFO: dropping caches > Test timeouted, sending SIGKILL! > > When running tests with runltp, export LTP_TIMEOUT_MUL to double > the DEFAULT_TIMEOUT value, as set in tst_test.c. > > Reported-by: Kevin Brodsky kevin.brodsky@arm.com > Signed-off-by Zachary Leaf zachary.leaf@arm.com > --- > runltp | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/runltp b/runltp > index 4447da156..32cf39dfb 100755 > --- a/runltp > +++ b/runltp > @@ -79,6 +79,8 @@ setup() > export LTPROOT=${PWD} > export TMPBASE="/tmp" > export PATH="${PATH}:${LTPROOT}/testcases/bin:${LTPROOT}/bin" > + # double the DEFAULT_TIMEOUT (tst_test.c) when running tests > + export LTP_TIMEOUT_MUL=2 > export LTP_DEV_FS_TYPE="ext2" >

linux-morello-ltp mailing list -- linux-morello-ltp@op-lists.linaro.org To unsubscribe send an email to linux-morello-ltp-leave@op-lists.linaro.org

linux-morello-ltp mailing list -- linux-morello-ltp@op-lists.linaro.org To unsubscribe send an email to linux-morello-ltp-leave@op-lists.linaro.org

Kevin Brodsky

1 Sep 1 Sep

7:56 a.m.

On 31/08/2022 14:42, Zachary Leaf wrote:

...

Hi,

Grouping up what should be some small and fairly uncontroversial patches to the transitional runtest lists and runltp script.

Available as review branch at: https://git.morello-project.org/zdleaf/morello-linux-test-project/-/commits/...

Thanks, Zach

Zachary Leaf (3): runtest: add shmat to extended PCuABI syscall list runtest: remove mmap1/2/3 tests from PCuABI list runltp: increase default timeout

These make sense to me, thanks for putting them together!

Kevin

...

runltp | 2 ++ runtest/morello_transitional | 4 ---- runtest/morello_transitional_extended | 4 ++++ 3 files changed, 6 insertions(+), 4 deletions(-)

Beata Michalska

9 Sep 9 Sep

3:47 p.m.

Hi Zachary,

PATCH 1/3: runtest: add shmat to extended PCuABI syscall list has been now applied on next.

PATCH 2/3: runtest: remove mmap1/2/3 tests from PCuABI list has been applied on both: master and next (as this fixes CI issues)

PATCH 3/3: runltp: increase default timeout as discussed, shall be dropped.

Thanks a lot!

--- BR B.

On Wed, Aug 31, 2022 at 01:42:46PM +0100, Zachary Leaf wrote:

...

Hi,

Grouping up what should be some small and fairly uncontroversial patches to the transitional runtest lists and runltp script.

Available as review branch at: https://git.morello-project.org/zdleaf/morello-linux-test-project/-/commits/...

Thanks, Zach

Zachary Leaf (3): runtest: add shmat to extended PCuABI syscall list runtest: remove mmap1/2/3 tests from PCuABI list runltp: increase default timeout

runltp | 2 ++ runtest/morello_transitional | 4 ---- runtest/morello_transitional_extended | 4 ++++ 3 files changed, 6 insertions(+), 4 deletions(-)

-- 2.25.1

linux-morello-ltp mailing list -- linux-morello-ltp@op-lists.linaro.org To unsubscribe send an email to linux-morello-ltp-leave@op-lists.linaro.org

1031

days inactive

1040

days old

linux-morello-ltp@op-lists.linaro.org

15 comments

participants

tags (0)

participants (4)

Beata Michalska
Carsten Haitzler
Kevin Brodsky
Zachary Leaf