Hi,
Grouping up what should be some small and fairly uncontroversial patches to the transitional runtest lists and runltp script.
Available as review branch at: https://git.morello-project.org/zdleaf/morello-linux-test-project/-/commits/...
Thanks, Zach
Zachary Leaf (3): runtest: add shmat to extended PCuABI syscall list runtest: remove mmap1/2/3 tests from PCuABI list runltp: increase default timeout
runltp | 2 ++ runtest/morello_transitional | 4 ---- runtest/morello_transitional_extended | 4 ++++ 3 files changed, 6 insertions(+), 4 deletions(-)
Signed-off-by: Zachary Leaf zachary.leaf@arm.com --- runtest/morello_transitional_extended | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/runtest/morello_transitional_extended b/runtest/morello_transitional_extended index a31f0884c..a7699ce5a 100644 --- a/runtest/morello_transitional_extended +++ b/runtest/morello_transitional_extended @@ -17,3 +17,7 @@ semget06 semget06 semop01 semop01 semop02 semop02 semop03 semop03 + +shmat01 shmat01 +shmat02 shmat02 +shmat03 shmat03
On 8/31/22 13:42, Zachary Leaf wrote:
Signed-off-by: Zachary Leaf zachary.leaf@arm.com
runtest/morello_transitional_extended | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/runtest/morello_transitional_extended b/runtest/morello_transitional_extended index a31f0884c..a7699ce5a 100644 --- a/runtest/morello_transitional_extended +++ b/runtest/morello_transitional_extended @@ -17,3 +17,7 @@ semget06 semget06 semop01 semop01 semop02 semop02 semop03 semop03
+shmat01 shmat01 +shmat02 shmat02 +shmat03 shmat03
+1
Commit 35cda2b9c8b9 (build.sh: Allow specifying make targets) enabled building only specific test suites, e.g. only syscall tests.
For testing PCuABI, only the kernel/syscall set of tests is targeted.
Since the mmap1/mmap2/mmap3 tests are part of the mm (kernel/mem) tests, remove them from the morello_transitional list as they will no longer be built by default when testing PCuABI.
Reported-by: Kevin Brodsky kevin.brodsky@arm.com Signed-off-by: Zachary Leaf zachary.leaf@arm.com --- runtest/morello_transitional | 4 ---- 1 file changed, 4 deletions(-)
diff --git a/runtest/morello_transitional b/runtest/morello_transitional index ddef45787..568d00ca1 100644 --- a/runtest/morello_transitional +++ b/runtest/morello_transitional @@ -193,10 +193,6 @@ madvise10 madvise10
mmap001 mmap001 -m 1
-mmap1 mmap1 -mmap2 mmap2 -a -p -mmap3 mmap3 -p - mmap01 mmap01 mmap02 mmap02 mmap03 mmap03
On 8/31/22 13:42, Zachary Leaf wrote:
Commit 35cda2b9c8b9 (build.sh: Allow specifying make targets) enabled building only specific test suites, e.g. only syscall tests.
For testing PCuABI, only the kernel/syscall set of tests is targeted.
Since the mmap1/mmap2/mmap3 tests are part of the mm (kernel/mem) tests, remove them from the morello_transitional list as they will no longer be built by default when testing PCuABI.
Reported-by: Kevin Brodsky kevin.brodsky@arm.com Signed-off-by: Zachary Leaf zachary.leaf@arm.com
runtest/morello_transitional | 4 ---- 1 file changed, 4 deletions(-)
diff --git a/runtest/morello_transitional b/runtest/morello_transitional index ddef45787..568d00ca1 100644 --- a/runtest/morello_transitional +++ b/runtest/morello_transitional @@ -193,10 +193,6 @@ madvise10 madvise10 mmap001 mmap001 -m 1 -mmap1 mmap1 -mmap2 mmap2 -a -p -mmap3 mmap3 -p
- mmap01 mmap01 mmap02 mmap02 mmap03 mmap03
+1
Now applied on both: master & next. Thanks!
--- BR B. On Wed, Aug 31, 2022 at 01:42:48PM +0100, Zachary Leaf wrote:
Commit 35cda2b9c8b9 (build.sh: Allow specifying make targets) enabled building only specific test suites, e.g. only syscall tests.
For testing PCuABI, only the kernel/syscall set of tests is targeted.
Since the mmap1/mmap2/mmap3 tests are part of the mm (kernel/mem) tests, remove them from the morello_transitional list as they will no longer be built by default when testing PCuABI.
Reported-by: Kevin Brodsky kevin.brodsky@arm.com Signed-off-by: Zachary Leaf zachary.leaf@arm.com
runtest/morello_transitional | 4 ---- 1 file changed, 4 deletions(-)
diff --git a/runtest/morello_transitional b/runtest/morello_transitional index ddef45787..568d00ca1 100644 --- a/runtest/morello_transitional +++ b/runtest/morello_transitional @@ -193,10 +193,6 @@ madvise10 madvise10 mmap001 mmap001 -m 1 -mmap1 mmap1 -mmap2 mmap2 -a -p -mmap3 mmap3 -p
mmap01 mmap01 mmap02 mmap02 mmap03 mmap03 -- 2.25.1
linux-morello-ltp mailing list -- linux-morello-ltp@op-lists.linaro.org To unsubscribe send an email to linux-morello-ltp-leave@op-lists.linaro.org
Some tests, e.g. madvise06 can timeout on slow systems for some operations:
... madvise06.c:104: TINFO: dropping caches Test timeouted, sending SIGKILL!
When running tests with runltp, export LTP_TIMEOUT_MUL to double the DEFAULT_TIMEOUT value, as set in tst_test.c.
Reported-by: Kevin Brodsky kevin.brodsky@arm.com Signed-off-by Zachary Leaf zachary.leaf@arm.com --- runltp | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/runltp b/runltp index 4447da156..32cf39dfb 100755 --- a/runltp +++ b/runltp @@ -79,6 +79,8 @@ setup() export LTPROOT=${PWD} export TMPBASE="/tmp" export PATH="${PATH}:${LTPROOT}/testcases/bin:${LTPROOT}/bin" + # double the DEFAULT_TIMEOUT (tst_test.c) when running tests + export LTP_TIMEOUT_MUL=2
export LTP_DEV_FS_TYPE="ext2"
Do you have some idea why it's taking so long to need a longer timeout?
On 8/31/22 13:42, Zachary Leaf wrote:
Some tests, e.g. madvise06 can timeout on slow systems for some operations:
... madvise06.c:104: TINFO: dropping caches Test timeouted, sending SIGKILL!
When running tests with runltp, export LTP_TIMEOUT_MUL to double the DEFAULT_TIMEOUT value, as set in tst_test.c.
Reported-by: Kevin Brodsky kevin.brodsky@arm.com Signed-off-by Zachary Leaf zachary.leaf@arm.com
runltp | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/runltp b/runltp index 4447da156..32cf39dfb 100755 --- a/runltp +++ b/runltp @@ -79,6 +79,8 @@ setup() export LTPROOT=${PWD} export TMPBASE="/tmp" export PATH="${PATH}:${LTPROOT}/testcases/bin:${LTPROOT}/bin"
- # double the DEFAULT_TIMEOUT (tst_test.c) when running tests
- export LTP_TIMEOUT_MUL=2
export LTP_DEV_FS_TYPE="ext2"
On 02/09/2022 16:14, Carsten Haitzler wrote:
Do you have some idea why it's taking so long to need a longer timeout?
Not sure exactly in this particular case. The madvise06 test calls sync() and writes a "3" to /proc/sys/vm/drop_caches; if there's a lot of stuff in the cache, that seems to take longer than the default timeout of 30s on the Morello board. Subsequent calls are fast but then there's nothing further to write to storage or drop from cache.
30s does seem like a long time for this (?) but I don't know much about
sync() or dropping caches to say. Slow writes to fs? Slow cache access?
In any case I've been finding it hard to replicate. Attempting to fill cache by reading the entire fs to /dev/null but no luck so far: find / -type f -exec cat '{}' >> /dev/null ;
Any ideas?
Thanks, Zach
On 8/31/22 13:42, Zachary Leaf wrote:
Some tests, e.g. madvise06 can timeout on slow systems for some operations:
... madvise06.c:104: TINFO: dropping caches Test timeouted, sending SIGKILL!
When running tests with runltp, export LTP_TIMEOUT_MUL to double the DEFAULT_TIMEOUT value, as set in tst_test.c.
Reported-by: Kevin Brodsky kevin.brodsky@arm.com Signed-off-by Zachary Leaf zachary.leaf@arm.com
runltp | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/runltp b/runltp index 4447da156..32cf39dfb 100755 --- a/runltp +++ b/runltp @@ -79,6 +79,8 @@ setup() export LTPROOT=${PWD} export TMPBASE="/tmp" export PATH="${PATH}:${LTPROOT}/testcases/bin:${LTPROOT}/bin" + # double the DEFAULT_TIMEOUT (tst_test.c) when running tests + export LTP_TIMEOUT_MUL=2 export LTP_DEV_FS_TYPE="ext2"
On 9/2/22 17:33, Zachary Leaf wrote:
On 02/09/2022 16:14, Carsten Haitzler wrote:
Do you have some idea why it's taking so long to need a longer timeout?
Not sure exactly in this particular case. The madvise06 test calls sync() and writes a "3" to /proc/sys/vm/drop_caches; if there's a lot of stuff in the cache, that seems to take longer than the default timeout of 30s on the Morello board. Subsequent calls are fast but then there's nothing further to write to storage or drop from cache.
hmm and you're using the sata ssd on the device? or nfsroot write?
30s does seem like a long time for this (?) but I don't know much about
sync() or dropping caches to say. Slow writes to fs? Slow cache access?
In any case I've been finding it hard to replicate. Attempting to fill cache by reading the entire fs to /dev/null but no luck so far: find / -type f -exec cat '{}' >> /dev/null ;
Any ideas?
well other than "you're using nfs or remote storage" ... or something very peculiar is going on in the io layer?
Thanks, Zach
On 8/31/22 13:42, Zachary Leaf wrote:
Some tests, e.g. madvise06 can timeout on slow systems for some operations:
... madvise06.c:104: TINFO: dropping caches Test timeouted, sending SIGKILL!
When running tests with runltp, export LTP_TIMEOUT_MUL to double the DEFAULT_TIMEOUT value, as set in tst_test.c.
Reported-by: Kevin Brodsky kevin.brodsky@arm.com Signed-off-by Zachary Leaf zachary.leaf@arm.com
runltp | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/runltp b/runltp index 4447da156..32cf39dfb 100755 --- a/runltp +++ b/runltp @@ -79,6 +79,8 @@ setup() export LTPROOT=${PWD} export TMPBASE="/tmp" export PATH="${PATH}:${LTPROOT}/testcases/bin:${LTPROOT}/bin" + # double the DEFAULT_TIMEOUT (tst_test.c) when running tests + export LTP_TIMEOUT_MUL=2 export LTP_DEV_FS_TYPE="ext2"
On 05/09/2022 15:00, Carsten Haitzler wrote:
On 9/2/22 17:33, Zachary Leaf wrote:
On 02/09/2022 16:14, Carsten Haitzler wrote:
Do you have some idea why it's taking so long to need a longer timeout?
Not sure exactly in this particular case. The madvise06 test calls sync() and writes a "3" to /proc/sys/vm/drop_caches; if there's a lot of stuff in the cache, that seems to take longer than the default timeout of 30s on the Morello board. Subsequent calls are fast but then there's nothing further to write to storage or drop from cache.
hmm and you're using the sata ssd on the device? or nfsroot write?
Ah - actually using a 64GB USB drive to boot from and as root fs, not using the SSD at all. That would explain slow write speeds.
I actually didn't know they had SSDs, our setup is a board farm so the shared USB drive allows for easy re-imaging.
30s does seem like a long time for this (?) but I don't know much about sync() or dropping caches to say. Slow writes to fs? Slow cache access?
In any case I've been finding it hard to replicate. Attempting to fill cache by reading the entire fs to /dev/null but no luck so far: find / -type f -exec cat '{}' >> /dev/null ; Any ideas?
well other than "you're using nfs or remote storage" ... or something very peculiar is going on in the io layer?
Okay, I can replicate it now. It is slow USB write speed that is the issue.
If there are lot of cached writes for some reason (e.g. some LTP tests write data to disk, other activity on the system), then calling sync() can be slow.
I can replicate this by writing enough of /dev/urandom to a file then running madvise06 test: $ head -c 1G /dev/urandom > file $ ./runltp -f syscalls -s madvise06 [...] madvise06.c:104: TINFO: dropping caches Test timeouted, sending SIGKILL!
Kevin suggested if write speed is an issue, then we can keep the current USB fs/boot sequence unchanged and mount the SSD as temporary storage for LTP tests. Temporary directory can be changed by passing -d /path/to/storage to runltp script. I think that's a good idea for a general improvement to CI, and should help speed things up (depending on how much disk I/O the tests do - TBC).
For now, and to cover all bases for all users I think doubling the default timeout here is fine. If we see further issues we can look into it again.
Thanks, Zach
Thanks, Zach
On 8/31/22 13:42, Zachary Leaf wrote:
Some tests, e.g. madvise06 can timeout on slow systems for some operations:
... madvise06.c:104: TINFO: dropping caches Test timeouted, sending SIGKILL!
When running tests with runltp, export LTP_TIMEOUT_MUL to double the DEFAULT_TIMEOUT value, as set in tst_test.c.
Reported-by: Kevin Brodsky kevin.brodsky@arm.com Signed-off-by Zachary Leaf zachary.leaf@arm.com
runltp | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/runltp b/runltp index 4447da156..32cf39dfb 100755 --- a/runltp +++ b/runltp @@ -79,6 +79,8 @@ setup() export LTPROOT=${PWD} export TMPBASE="/tmp" export PATH="${PATH}:${LTPROOT}/testcases/bin:${LTPROOT}/bin" + # double the DEFAULT_TIMEOUT (tst_test.c) when running tests + export LTP_TIMEOUT_MUL=2 export LTP_DEV_FS_TYPE="ext2"
On 9/6/22 11:42, Zachary Leaf wrote:
On 05/09/2022 15:00, Carsten Haitzler wrote:
On 9/2/22 17:33, Zachary Leaf wrote:
On 02/09/2022 16:14, Carsten Haitzler wrote:
Do you have some idea why it's taking so long to need a longer timeout?
Not sure exactly in this particular case. The madvise06 test calls sync() and writes a "3" to /proc/sys/vm/drop_caches; if there's a lot of stuff in the cache, that seems to take longer than the default timeout of 30s on the Morello board. Subsequent calls are fast but then there's nothing further to write to storage or drop from cache.
hmm and you're using the sata ssd on the device? or nfsroot write?
Ah - actually using a 64GB USB drive to boot from and as root fs, not using the SSD at all. That would explain slow write speeds.
Oh ... don't do that! USB thumbdrives are mostly atrocious with writes. Reads - they do OK in the most part. If you must use a USB drive - buy a USB SSD - something proper. This is what I've got plugged into my Juno and my raspberry pi's - no way I rely on micro-sd for rootfs of anything heavily use. USB2/3 bandwidth will then be your limitation if you get a proper USB SSD. You won't get these insane stalls that can hang for 10's of seconds or even minutes. :)
Changing the test cases (timeout) to work around what I might say is "poor hardware choice for this kind of workload" I think is probably a bad idea. :)
I actually didn't know they had SSDs, our setup is a board farm so the shared USB drive allows for easy re-imaging.
Yeah. This is why I asked. :) They do have SSD's. /dev/sda should be the built in sata SSD drive. poke around at it, but I might advise in formatting it to something like ext4, make / world writable on that disk, mount it like /mnt/tmp and then store all your test logs/results on this drive somewhere. use it as a scratch drive for this stuff. :)
30s does seem like a long time for this (?) but I don't know much about sync() or dropping caches to say. Slow writes to fs? Slow cache access?
In any case I've been finding it hard to replicate. Attempting to fill cache by reading the entire fs to /dev/null but no luck so far: find / -type f -exec cat '{}' >> /dev/null ; Any ideas?
well other than "you're using nfs or remote storage" ... or something very peculiar is going on in the io layer?
Okay, I can replicate it now. It is slow USB write speed that is the issue.
If there are lot of cached writes for some reason (e.g. some LTP tests write data to disk, other activity on the system), then calling sync() can be slow.
Yup. In fact it can be even worse. The kernel flushes the writes that are queued, but reads stall too - for possible many seconds or longer while these writes get flushed it. I've seen this again and again with USB thumbdrives and SD cards. They have poor firmware that stalls when writing and won't listen to any read IO requests while doing the write. thumbdrives and sd cards are ok for 99.9% read-only filesystems that you rarely modify and if you do - you don't care if they stall for a while.
I can replicate this by writing enough of /dev/urandom to a file then running madvise06 test: $ head -c 1G /dev/urandom > file $ ./runltp -f syscalls -s madvise06 [...] madvise06.c:104: TINFO: dropping caches Test timeouted, sending SIGKILL!
Kevin suggested if write speed is an issue, then we can keep the current USB fs/boot sequence unchanged and mount the SSD as temporary storage
exactly my suggestion above :)
for LTP tests. Temporary directory can be changed by passing -d /path/to/storage to runltp script. I think that's a good idea for a general improvement to CI, and should help speed things up (depending on how much disk I/O the tests do - TBC).
absolutely. right way to go IMHO.
For now, and to cover all bases for all users I think doubling the default timeout here is fine. If we see further issues we can look into it again.
now here comes one thing fun and un-morello related but related to testing in general. if i had a write take an unreasonably long amount of time ... i'd consider it a bug. a performance bug. it could be that some i/o queue stalls and just sits idle because of a bug/mistake and happens to flush out 10 seconds later because of some unrelated timer that wakes up and happens to then flush the queue. i think we'd all agree that if you wrote 4k to a disk and it took 24h to finally write that there is a bug to be fixed and the test case found it. :) so somewhere along the "increase timeout" path is a point where you have to decide if that is the right solution.
in this case, i'd just fix the test environment :)
Thanks, Zach
Thanks, Zach
On 8/31/22 13:42, Zachary Leaf wrote:
Some tests, e.g. madvise06 can timeout on slow systems for some operations:
... madvise06.c:104: TINFO: dropping caches Test timeouted, sending SIGKILL!
When running tests with runltp, export LTP_TIMEOUT_MUL to double the DEFAULT_TIMEOUT value, as set in tst_test.c.
Reported-by: Kevin Brodsky kevin.brodsky@arm.com Signed-off-by Zachary Leaf zachary.leaf@arm.com
runltp | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/runltp b/runltp index 4447da156..32cf39dfb 100755 --- a/runltp +++ b/runltp @@ -79,6 +79,8 @@ setup() export LTPROOT=${PWD} export TMPBASE="/tmp" export PATH="${PATH}:${LTPROOT}/testcases/bin:${LTPROOT}/bin" + # double the DEFAULT_TIMEOUT (tst_test.c) when running tests + export LTP_TIMEOUT_MUL=2 export LTP_DEV_FS_TYPE="ext2"
On 06/09/2022 12:02, Carsten Haitzler wrote:
On 9/6/22 11:42, Zachary Leaf wrote:
On 05/09/2022 15:00, Carsten Haitzler wrote:
On 9/2/22 17:33, Zachary Leaf wrote:
On 02/09/2022 16:14, Carsten Haitzler wrote:
Do you have some idea why it's taking so long to need a longer timeout?
Not sure exactly in this particular case. The madvise06 test calls sync() and writes a "3" to /proc/sys/vm/drop_caches; if there's a lot of stuff in the cache, that seems to take longer than the default timeout of 30s on the Morello board. Subsequent calls are fast but then there's nothing further to write to storage or drop from cache.
hmm and you're using the sata ssd on the device? or nfsroot write?
Ah - actually using a 64GB USB drive to boot from and as root fs, not using the SSD at all. That would explain slow write speeds.
Oh ... don't do that! USB thumbdrives are mostly atrocious with writes. Reads - they do OK in the most part. If you must use a USB drive - buy a USB SSD - something proper. This is what I've got plugged into my Juno and my raspberry pi's - no way I rely on micro-sd for rootfs of anything heavily use. USB2/3 bandwidth will then be your limitation if you get a proper USB SSD. You won't get these insane stalls that can hang for 10's of seconds or even minutes. :)
Changing the test cases (timeout) to work around what I might say is "poor hardware choice for this kind of workload" I think is probably a bad idea. :)
I actually didn't know they had SSDs, our setup is a board farm so the shared USB drive allows for easy re-imaging.
Yeah. This is why I asked. :) They do have SSD's. /dev/sda should be the built in sata SSD drive. poke around at it, but I might advise in formatting it to something like ext4, make / world writable on that disk, mount it like /mnt/tmp and then store all your test logs/results on this drive somewhere. use it as a scratch drive for this stuff. :)
Yep - /dev/sda is SSD. I've mounted as scratch disk for now but will probably try run everything from there in future. I'm not exactly re-imaging very often, just need to be able to boot into new kernels mostly so that would work for me.
30s does seem like a long time for this (?) but I don't know much about sync() or dropping caches to say. Slow writes to fs? Slow cache access?
In any case I've been finding it hard to replicate. Attempting to fill cache by reading the entire fs to /dev/null but no luck so far: find / -type f -exec cat '{}' >> /dev/null ; Any ideas?
well other than "you're using nfs or remote storage" ... or something very peculiar is going on in the io layer?
Okay, I can replicate it now. It is slow USB write speed that is the issue.
If there are lot of cached writes for some reason (e.g. some LTP tests write data to disk, other activity on the system), then calling sync() can be slow.
Yup. In fact it can be even worse. The kernel flushes the writes that are queued, but reads stall too - for possible many seconds or longer while these writes get flushed it. I've seen this again and again with USB thumbdrives and SD cards. They have poor firmware that stalls when writing and won't listen to any read IO requests while doing the write. thumbdrives and sd cards are ok for 99.9% read-only filesystems that you rarely modify and if you do - you don't care if they stall for a while.
I can replicate this by writing enough of /dev/urandom to a file then running madvise06 test: $ head -c 1G /dev/urandom > file $ ./runltp -f syscalls -s madvise06 [...] madvise06.c:104: TINFO: dropping caches Test timeouted, sending SIGKILL!
Kevin suggested if write speed is an issue, then we can keep the current USB fs/boot sequence unchanged and mount the SSD as temporary storage
exactly my suggestion above :)
for LTP tests. Temporary directory can be changed by passing -d /path/to/storage to runltp script. I think that's a good idea for a general improvement to CI, and should help speed things up (depending on how much disk I/O the tests do - TBC).
absolutely. right way to go IMHO.
For now, and to cover all bases for all users I think doubling the default timeout here is fine. If we see further issues we can look into it again.
now here comes one thing fun and un-morello related but related to testing in general. if i had a write take an unreasonably long amount of time ... i'd consider it a bug. a performance bug. it could be that some i/o queue stalls and just sits idle because of a bug/mistake and happens to flush out 10 seconds later because of some unrelated timer that wakes up and happens to then flush the queue. i think we'd all agree that if you wrote 4k to a disk and it took 24h to finally write that there is a bug to be fixed and the test case found it. :) so somewhere along the "increase timeout" path is a point where you have to decide if that is the right solution.
in this case, i'd just fix the test environment :)
Point taken. It is a band-aid. I'll try update the CI to use SSD as temp dir for LTP tests, so I don't mind dropping this patch from the series.
Thanks, Zach
Thanks, Zach
On 8/31/22 13:42, Zachary Leaf wrote:
Some tests, e.g. madvise06 can timeout on slow systems for some operations:
... madvise06.c:104: TINFO: dropping caches Test timeouted, sending SIGKILL!
When running tests with runltp, export LTP_TIMEOUT_MUL to double the DEFAULT_TIMEOUT value, as set in tst_test.c.
Reported-by: Kevin Brodsky kevin.brodsky@arm.com Signed-off-by Zachary Leaf zachary.leaf@arm.com
runltp | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/runltp b/runltp index 4447da156..32cf39dfb 100755 --- a/runltp +++ b/runltp @@ -79,6 +79,8 @@ setup() export LTPROOT=${PWD} export TMPBASE="/tmp" export PATH="${PATH}:${LTPROOT}/testcases/bin:${LTPROOT}/bin" + # double the DEFAULT_TIMEOUT (tst_test.c) when running tests + export LTP_TIMEOUT_MUL=2 export LTP_DEV_FS_TYPE="ext2"
linux-morello-ltp mailing list -- linux-morello-ltp@op-lists.linaro.org To unsubscribe send an email to linux-morello-ltp-leave@op-lists.linaro.org
On Tue, Sep 06, 2022 at 03:24:21PM +0100, Zachary Leaf wrote:
On 06/09/2022 12:02, Carsten Haitzler wrote:
On 9/6/22 11:42, Zachary Leaf wrote:
On 05/09/2022 15:00, Carsten Haitzler wrote:
On 9/2/22 17:33, Zachary Leaf wrote:
On 02/09/2022 16:14, Carsten Haitzler wrote:
Do you have some idea why it's taking so long to need a longer timeout?
Not sure exactly in this particular case. The madvise06 test calls sync() and writes a "3" to /proc/sys/vm/drop_caches; if there's a lot of stuff in the cache, that seems to take longer than the default timeout of 30s on the Morello board. Subsequent calls are fast but then there's nothing further to write to storage or drop from cache.
hmm and you're using the sata ssd on the device? or nfsroot write?
Ah - actually using a 64GB USB drive to boot from and as root fs, not using the SSD at all. That would explain slow write speeds.
Oh ... don't do that! USB thumbdrives are mostly atrocious with writes. Reads - they do OK in the most part. If you must use a USB drive - buy a USB SSD - something proper. This is what I've got plugged into my Juno and my raspberry pi's - no way I rely on micro-sd for rootfs of anything heavily use. USB2/3 bandwidth will then be your limitation if you get a proper USB SSD. You won't get these insane stalls that can hang for 10's of seconds or even minutes. :)
Changing the test cases (timeout) to work around what I might say is "poor hardware choice for this kind of workload" I think is probably a bad idea. :)
I actually didn't know they had SSDs, our setup is a board farm so the shared USB drive allows for easy re-imaging.
Yeah. This is why I asked. :) They do have SSD's. /dev/sda should be the built in sata SSD drive. poke around at it, but I might advise in formatting it to something like ext4, make / world writable on that disk, mount it like /mnt/tmp and then store all your test logs/results on this drive somewhere. use it as a scratch drive for this stuff. :)
Yep - /dev/sda is SSD. I've mounted as scratch disk for now but will probably try run everything from there in future. I'm not exactly re-imaging very often, just need to be able to boot into new kernels mostly so that would work for me.
30s does seem like a long time for this (?) but I don't know much about sync() or dropping caches to say. Slow writes to fs? Slow cache access?
In any case I've been finding it hard to replicate. Attempting to fill cache by reading the entire fs to /dev/null but no luck so far: find / -type f -exec cat '{}' >> /dev/null ; Any ideas?
well other than "you're using nfs or remote storage" ... or something very peculiar is going on in the io layer?
Okay, I can replicate it now. It is slow USB write speed that is the issue.
If there are lot of cached writes for some reason (e.g. some LTP tests write data to disk, other activity on the system), then calling sync() can be slow.
Yup. In fact it can be even worse. The kernel flushes the writes that are queued, but reads stall too - for possible many seconds or longer while these writes get flushed it. I've seen this again and again with USB thumbdrives and SD cards. They have poor firmware that stalls when writing and won't listen to any read IO requests while doing the write. thumbdrives and sd cards are ok for 99.9% read-only filesystems that you rarely modify and if you do - you don't care if they stall for a while.
I can replicate this by writing enough of /dev/urandom to a file then running madvise06 test: $ head -c 1G /dev/urandom > file $ ./runltp -f syscalls -s madvise06 [...] madvise06.c:104: TINFO: dropping caches Test timeouted, sending SIGKILL!
Kevin suggested if write speed is an issue, then we can keep the current USB fs/boot sequence unchanged and mount the SSD as temporary storage
exactly my suggestion above :)
for LTP tests. Temporary directory can be changed by passing -d /path/to/storage to runltp script. I think that's a good idea for a general improvement to CI, and should help speed things up (depending on how much disk I/O the tests do - TBC).
absolutely. right way to go IMHO.
For now, and to cover all bases for all users I think doubling the default timeout here is fine. If we see further issues we can look into it again.
now here comes one thing fun and un-morello related but related to testing in general. if i had a write take an unreasonably long amount of time ... i'd consider it a bug. a performance bug. it could be that some i/o queue stalls and just sits idle because of a bug/mistake and happens to flush out 10 seconds later because of some unrelated timer that wakes up and happens to then flush the queue. i think we'd all agree that if you wrote 4k to a disk and it took 24h to finally write that there is a bug to be fixed and the test case found it. :) so somewhere along the "increase timeout" path is a point where you have to decide if that is the right solution.
in this case, i'd just fix the test environment :)
Point taken. It is a band-aid. I'll try update the CI to use SSD as temp dir for LTP tests, so I don't mind dropping this patch from the series.
To be fair LTP_TIMEOUT_MUL is one of the env variables (see: testcases/lib/tst_test.sh) so even if the timeout is to be extended, it can be done through specifying LTP_TIMEOUT_MUL as env var for runltp, e.g:
LTP_TIMEOUT_MUL=2 ./runltp -f syscalls -s madvise06
which gives: Timeout per run is 0h 01m 00s
So there is no need for direct change in runltp itself.
--- BR B.
Thanks, Zach
Thanks, Zach
On 8/31/22 13:42, Zachary Leaf wrote: > Some tests, e.g. madvise06 can timeout on slow systems for some > operations: > > ... madvise06.c:104: TINFO: dropping caches > Test timeouted, sending SIGKILL! > > When running tests with runltp, export LTP_TIMEOUT_MUL to double > the DEFAULT_TIMEOUT value, as set in tst_test.c. > > Reported-by: Kevin Brodsky kevin.brodsky@arm.com > Signed-off-by Zachary Leaf zachary.leaf@arm.com > --- > runltp | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/runltp b/runltp > index 4447da156..32cf39dfb 100755 > --- a/runltp > +++ b/runltp > @@ -79,6 +79,8 @@ setup() > export LTPROOT=${PWD} > export TMPBASE="/tmp" > export PATH="${PATH}:${LTPROOT}/testcases/bin:${LTPROOT}/bin" > + # double the DEFAULT_TIMEOUT (tst_test.c) when running tests > + export LTP_TIMEOUT_MUL=2 > export LTP_DEV_FS_TYPE="ext2" >
linux-morello-ltp mailing list -- linux-morello-ltp@op-lists.linaro.org To unsubscribe send an email to linux-morello-ltp-leave@op-lists.linaro.org
linux-morello-ltp mailing list -- linux-morello-ltp@op-lists.linaro.org To unsubscribe send an email to linux-morello-ltp-leave@op-lists.linaro.org
On 31/08/2022 14:42, Zachary Leaf wrote:
Hi,
Grouping up what should be some small and fairly uncontroversial patches to the transitional runtest lists and runltp script.
Available as review branch at: https://git.morello-project.org/zdleaf/morello-linux-test-project/-/commits/...
Thanks, Zach
Zachary Leaf (3): runtest: add shmat to extended PCuABI syscall list runtest: remove mmap1/2/3 tests from PCuABI list runltp: increase default timeout
These make sense to me, thanks for putting them together!
Kevin
runltp | 2 ++ runtest/morello_transitional | 4 ---- runtest/morello_transitional_extended | 4 ++++ 3 files changed, 6 insertions(+), 4 deletions(-)
Hi Zachary,
PATCH 1/3: runtest: add shmat to extended PCuABI syscall list has been now applied on next.
PATCH 2/3: runtest: remove mmap1/2/3 tests from PCuABI list has been applied on both: master and next (as this fixes CI issues)
PATCH 3/3: runltp: increase default timeout as discussed, shall be dropped.
Thanks a lot!
--- BR B.
On Wed, Aug 31, 2022 at 01:42:46PM +0100, Zachary Leaf wrote:
Hi,
Grouping up what should be some small and fairly uncontroversial patches to the transitional runtest lists and runltp script.
Available as review branch at: https://git.morello-project.org/zdleaf/morello-linux-test-project/-/commits/...
Thanks, Zach
Zachary Leaf (3): runtest: add shmat to extended PCuABI syscall list runtest: remove mmap1/2/3 tests from PCuABI list runltp: increase default timeout
runltp | 2 ++ runtest/morello_transitional | 4 ---- runtest/morello_transitional_extended | 4 ++++ 3 files changed, 6 insertions(+), 4 deletions(-)
-- 2.25.1
linux-morello-ltp mailing list -- linux-morello-ltp@op-lists.linaro.org To unsubscribe send an email to linux-morello-ltp-leave@op-lists.linaro.org
linux-morello-ltp@op-lists.linaro.org