On 05/09/2022 15:00, Carsten Haitzler wrote:
On 9/2/22 17:33, Zachary Leaf wrote:
On 02/09/2022 16:14, Carsten Haitzler wrote:
Do you have some idea why it's taking so long to need a longer timeout?
Not sure exactly in this particular case. The madvise06 test calls sync() and writes a "3" to /proc/sys/vm/drop_caches; if there's a lot of stuff in the cache, that seems to take longer than the default timeout of 30s on the Morello board. Subsequent calls are fast but then there's nothing further to write to storage or drop from cache.
hmm and you're using the sata ssd on the device? or nfsroot write?
Ah - actually using a 64GB USB drive to boot from and as root fs, not using the SSD at all. That would explain slow write speeds.
I actually didn't know they had SSDs, our setup is a board farm so the shared USB drive allows for easy re-imaging.
30s does seem like a long time for this (?) but I don't know much about sync() or dropping caches to say. Slow writes to fs? Slow cache access?
In any case I've been finding it hard to replicate. Attempting to fill cache by reading the entire fs to /dev/null but no luck so far: find / -type f -exec cat '{}' >> /dev/null ; Any ideas?
well other than "you're using nfs or remote storage" ... or something very peculiar is going on in the io layer?
Okay, I can replicate it now. It is slow USB write speed that is the issue.
If there are lot of cached writes for some reason (e.g. some LTP tests write data to disk, other activity on the system), then calling sync() can be slow.
I can replicate this by writing enough of /dev/urandom to a file then running madvise06 test: $ head -c 1G /dev/urandom > file $ ./runltp -f syscalls -s madvise06 [...] madvise06.c:104: TINFO: dropping caches Test timeouted, sending SIGKILL!
Kevin suggested if write speed is an issue, then we can keep the current USB fs/boot sequence unchanged and mount the SSD as temporary storage for LTP tests. Temporary directory can be changed by passing -d /path/to/storage to runltp script. I think that's a good idea for a general improvement to CI, and should help speed things up (depending on how much disk I/O the tests do - TBC).
For now, and to cover all bases for all users I think doubling the default timeout here is fine. If we see further issues we can look into it again.
Thanks, Zach
Thanks, Zach
On 8/31/22 13:42, Zachary Leaf wrote:
Some tests, e.g. madvise06 can timeout on slow systems for some operations:
... madvise06.c:104: TINFO: dropping caches Test timeouted, sending SIGKILL!
When running tests with runltp, export LTP_TIMEOUT_MUL to double the DEFAULT_TIMEOUT value, as set in tst_test.c.
Reported-by: Kevin Brodsky kevin.brodsky@arm.com Signed-off-by Zachary Leaf zachary.leaf@arm.com
runltp | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/runltp b/runltp index 4447da156..32cf39dfb 100755 --- a/runltp +++ b/runltp @@ -79,6 +79,8 @@ setup() export LTPROOT=${PWD} export TMPBASE="/tmp" export PATH="${PATH}:${LTPROOT}/testcases/bin:${LTPROOT}/bin" + # double the DEFAULT_TIMEOUT (tst_test.c) when running tests + export LTP_TIMEOUT_MUL=2 export LTP_DEV_FS_TYPE="ext2"