Hi,
We've recently had an issue with our LAVA instance (version 2019.05.post1),
where a long running LAVA job which had a large log file led to
instabilities when serving web content.
The large job was seemingly causing lava-server-gunicorn workers to use up
more memory than was available, leading to workers crashing and then
restarting. This led to all the workers processing the large jobs most of
the time, while other requests would only be served once the workers
restarted. This led to the webpages being served extremely slowly and
lavacli usage timing out (if a larger timeout was not set).
We had "LOG_SIZE_LIMIT": 3 set in our /etc/lava-server/settings.conf, and
we did have the message on that job page for "*This log file is too large
to view"*, but it seems that some requests were still attempting to process
some aspect of the job causing these worker crashes. Is there any other
settings that might need to be set in order to cope with long running jobs
with large log files that might help with this situation?
Before we look into this any further, does anyone know if this is fixed
with a newer version of LAVA? Has anyone had any similar issues with their
instances?
Thanks,
Dean
Hi Lava users,
I have a problem with the lxc container which is created on the host machine, by getting an IP address.
Lava is installed in a, VM with Debian stretch.
Has anyone experienced this problem before? Or anyone any idee why lxc does not get the ip?
Here are the logs:
lava-dispatcher, installed at version: 2019.09+stretch
start: 0 validate
Start time: 2019-09-24 12:29:39.318789+00:00 (UTC)
lxc, installed at version: 1:2.0.7-2+deb9u2
Validating that file:///usr/mgu22/mgu22-19w32.5-1-2-bmw-image-mgu22-sa8155.rootfs.ext4 exists
validate duration: 0.07
definition: lava
result: pass
case: validate
start: 1 lxc-deploy (timeout 00:05:00) [tlxc]
start: 1.1 lxc-create-action (timeout 00:05:00) [tlxc]
nice lxc-create -q -t debian -n lxc-hikey-test-13 -- --release stretch --mirror http://mirror.bytemark.co.uk/debian --packages systemd,systemd-sysv
Container created successfully
end: 1.1 lxc-create-action (duration 00:01:22) [tlxc]
level: 1.1
case: lxc-create-action
definition: lava
result: pass
namespace: tlxc
duration: 82.21
extra: ...
start: 1.2 lxc-create-udev-rule-action (timeout 00:03:38) [tlxc]
device info file '/var/lib/lava/dispatcher/tmp/13/lxc-create-udev-rule-action-y0_d19aq/device-info.yaml' created with:
[{'board_id': '7d4452a4'}]
udev rules file '/var/lib/lava/dispatcher/tmp/13/lxc-create-udev-rule-action-nj_8pzv0/100-lava-lxc-hikey-test-13.rules' created
ACTION=="add", ATTR{serial}=="7d4452a4", RUN+="/usr/share/lava-dispatcher/lava_lxc_device_add.py --lxc-name lxc-hikey-test-13 --device-node $name --job-id 13 --logging-url tcp://localhost:5555"
'/etc/udev/rules.d/100-lava-lxc-hikey-test-13.rules' symlinked to '/var/lib/lava/dispatcher/tmp/13/lxc-create-udev-rule-action-nj_8pzv0/100-lava-lxc-hikey-test-13.rules'
nice udevadm control --reload-rules
udev rules reloaded.
end: 1.2 lxc-create-udev-rule-action (duration 00:00:00) [tlxc]
start: 1.3 boot-lxc (timeout 00:03:38) [tlxc]
nice lxc-start -n lxc-hikey-test-13 -d
output:
Wait until 'lxc-hikey-test-13' state becomes RUNNING
nice lxc-info -sH -n lxc-hikey-test-13
output: RUNNING
output:
'lxc-hikey-test-13' state is RUNNING
Wait until 'lxc-hikey-test-13' gets an IP address
nice lxc-info -iH -n lxc-hikey-test-13
output:
nice lxc-info -iH -n lxc-hikey-test-13
output:
nice lxc-info -iH -n lxc-hikey-test-13
output:
nice lxc-info -iH -n lxc-hikey-test-13
output:
nice lxc-info -iH -n lxc-hikey-test-13
output:
nice lxc-info -iH -n lxc-hikey-test-13
output:
Here is my Lava job:
https://pastebin.com/kiLbnjAX
Hi All,
i am new LAVA framework
i am trying to submit my first job , but error like "Invalid definition:
extra keys not allowed @ data['job_timeout']"
i am attaching screen shot & file.
Can someone please help me to solve this issue
Thanks
Veera
Hello everyone,
I am trying to send some information containing whitespaces via lava-send. Obviously this is not supported. Is this a bug or by design?
I tried the following command:
lava-send my-message my-variable="some string with whitespaces"
Which produces the following output:
<LAVA_SEND_DEBUG lava_multi_node_send preparing Tue Aug 27 15:02:41 CEST 2019>
<LAVA_SEND_DEBUG _lava_multi_node_send started Tue Aug 27 15:02:41 CEST 2019>
<LAVA_MULTI_NODE> <LAVA_SEND my-message my-variable=some>
<LAVA_SEND_DEBUG _lava_multi_node_send finished Tue Aug 27 15:02:41 CEST 2019>
<LAVA_SEND_DEBUG lava_multi_node_send finished Tue Aug 27 15:02:41 CEST 2019>
Received Multi_Node API <LAVA_SEND>
messageID: SEND-my-message
lava-multinode lava-send
1 key value pair(s) to be sent.
Handling signal <LAVA_SEND {"timeout": 360, "request": "lava_send", "messageID": "my-message", "message": {"my-variable": "some"}}>
So obviously the string is truncated on the first whitespace.
Is there any possibility to send a string containing whitespaces to another node?
Mit freundlichen Grüßen / Best regards
Tim Jaacks
DEVELOPMENT ENGINEER
Garz & Fricke GmbH
Tempowerkring 2
21079 Hamburg
Direct: +49 40 791 899 - 55
Fax: +49 40 791899 - 39
tim.jaacks(a)garz-fricke.com
www.garz-fricke.com
WE MAKE IT YOURS!
Sitz der Gesellschaft: D-21079 Hamburg
Registergericht: Amtsgericht Hamburg, HRB 60514
Geschäftsführer: Matthias Fricke, Manfred Garz, Marc-Michael Braun
#
root@device:~# #
lava-test-shell: Wait for prompt ['root@device:~#'] (timeout 00:05:00)
#
Using /lava-133
export SHELL=/bin/bash
root@device:~# export SHELL=/bin/bash
export SHELL=/bin/bash
. /lava-133/environment
root@device:~# . /lava-133/environment
. /lava-133/environment
-sh: .: can't open '/lava-133/environment'
Will listen to feedbacks from 'tlxc' for 1 second
/lava-133/bin/lava-test-runner /lava-133/0
root@device:~# /lava-133/bin/lava-test-runner /lava-133/0
Test shell timeout: 10s (minimum of the action and connection timeout)
/lava-133/bin/lava-test-runner /lava-133/0
-sh: /lava-133/bin/lava-test-runner: not found
Device is successfully booted and logged in. But I cant run any commands.
Why lava running this command? :
. /lava-133/environment
And how it downloads to device.
Ilya
Hi!
Lava as default at beginning runs that command:
nice fastboot -s 261a1c5d reboot-bootloader
Is it possible to avoid this step? Dont run this command
Ilya
Hi Milosz,
Please take a look at the first message of this thread. There are 2
different information for this error.
First, this COMMA error happens when I'm trying to access the URL (LAVA
WebU), manually inserting the job_id in a browser, and not through the code.
Second, the previous code is returning:
lavac.server.NoSuchJob: No such job: 68271.0
Even though this job_id is been returned by the LAVA server.
Thanks
Em seg, 2 de set de 2019 às 11:32, Milosz Wasilewski <
milosz.wasilewski(a)linaro.org> escreveu:
> On Mon, 2 Sep 2019 at 10:26, Fabiano Ferronato <fabiferro(a)hotmail.com>
> wrote:
> >
> > Hi Milosz,
> >
> > I still couldn't update the server but, just to clarify, the mentioned
> job pasted in URL is 68271.0 as we can see in the error message:
> >
> > URL : http://lava.server.net/scheduler/job/68271.0
> >
> > > Trying to access LAVA WebUI using the jobid (68271.0)
> >
> > And then some process is translating that jobid into a comma " , " :
> >
> > > Reverse for 'lava.scheduler.job.detail' with arguments '('',)' not
> found. 1 pattern(s) tried: ['scheduler/job/(?P<pk>[0-9]+|[0-9]+\\.[0-9]+)$']
> >
> > Otherwise, a really not existent jobid in the URL, let's say 99999.9,
> results in "404 Not found".
> >
> > Here is the job submission code:
> >
> > try:
> > job_id = self._server.scheduler.submit_job(job_data)
> > if not isinstance(job_id, list):
> > job_id = [job_id]
> > return job_id
> >
> > And than the job_id is used to get job details:
> >
> > try:
> > return self._server.scheduler.job_details(job_id)
>
> scheduled.job_details expects a string:
> https://master.lavasoftware.org/api/help/#scheduler.job_details
>
> If I understand your code correctly you're passing a list to this
> function. Serialized list will contain "," character.
>
> milosz
>
> >
> > Best Regards,
> > Fabiano
> >
> > Em sex, 23 de ago de 2019 às 16:25, Milosz Wasilewski <
> milosz.wasilewski(a)linaro.org> escreveu:
> >>
> >> I don't think there was a problem with 2018.10 with this feature.
> >> Reading the error message I think you pasted "," character in the URL
> >> so the pattern didn't match. As you can see in the regex, "." is there
> >> and I don't recall any issues with multinode jobs then. Anyway, even
> >> if this is a bug it won't be fixed in 2018.10. When you migrate to
> >> latest version and you hit the same problem something can be done.
> >>
> >> If you can post full script you're using I can try on latest master
> >> and see what happens.
> >>
> >> milosz
> >>
> >> On Fri, 23 Aug 2019 at 13:10, Fabiano Ferronato <fabiferro(a)hotmail.com>
> wrote:
> >> >
> >> > Hi Milosz, thanks for your answer.
> >> >
> >> > Yes, it is a multinode job.
> >> > This is a known bug on version 2018.10? I need to install the new
> version and keep pipe lines running until I get the error to answer you.
> >> >
> >> > Fabiano
> >> >
> >> > Em qui, 22 de ago de 2019 às 18:58, Milosz Wasilewski <
> milosz.wasilewski(a)linaro.org> escreveu:
> >> >>
> >> >> On Thu, 22 Aug 2019 at 17:30, Fabiano Ferronato <
> fabiferro(a)hotmail.com> wrote:
> >> >> >
> >> >> > Hi,
> >> >> > we have a LAVA test setup working for some time. Automated
> pipelines are running tests on different devices in parallel.
> >> >> > After updating to version 2018.10+stretch and changing to in-line
> job definitions we started to get some sporadic errors.
> >> >> >
> >> >> > The error message shows up after jobs are submitted and the return
> from the submission is then used to ask for server job details:
> >> >> >
> >> >> > res = lava_server.submit_job(lava_test_job_description)
> >> >> > for entry in res:
> >> >> > job_details = lavasrv.job_details(entry)
> >> >> > ...
> >> >> >
> >> >> > Resulting in the following error:
> >> >> >
> >> >> > lib/python3.5/site-packages/lavac/server.py", line 272, in
> job_details
> >> >> > raise get_server_error(error, job_id)
> >> >> > lavac.server.NoSuchJob: No such job: 68271.0
> >> >>
> >> >> are you submitting a multinode job? Does this also happen in more
> >> >> recent version of LAVA (like 2019.07)?
> >> >>
> >> >> milosz
> >> >>
> >> >> >
> >> >> >
> >> >> > Trying to access LAVA WebUI using the jobid (68271.0):
> >> >> >
> >> >> > 500 Internal Server Error
> >> >> > Reverse for 'lava.scheduler.job.detail' with arguments '('',)' not
> found. 1 pattern(s) tried: ['scheduler/job/(?P<pk>[0-9]+|[0-9]+\\.[0-9]+)$']
> >> >> >
> >> >> > Can you give me a hint about this error?
> >> >> >
> >> >> > Thanks!
> >> >> >
> >> >> > _______________________________________________
> >> >> > Lava-users mailing list
> >> >> > Lava-users(a)lists.lavasoftware.org
> >> >> > https://lists.lavasoftware.org/mailman/listinfo/lava-users
> >> >
> >> > _______________________________________________
> >> > Lava-users mailing list
> >> > Lava-users(a)lists.lavasoftware.org
> >> > https://lists.lavasoftware.org/mailman/listinfo/lava-users
>
Hi!
I'm using lava from debian stretch.
I try flash image by fastboot.
At that step job just stuck and util timeout of 10 minutes.
nice fastboot -s '261a1c5d' flash persist /var/lib/lava/dispatcher/tmp/15/fastboot-deploy-7xj200yq/persist/persist.ext4
If run command from terminal
fastboot -s 261a1c5d flash persist /tmp/persist.ext4
device is flashed in 3 seconds
It looks like, that LAVA doesnt see my device.
Job desc : https://pastebin.com/ZNeS71Ev
Device desc: https://pastebin.com/rYRyEnjy
Job log : https://pastebin.com/fH0u3bUb
I know, that for fastboot better to work with lxc. I also tried with lxc. And it stuck on the same place.
Job log with lxc : https://pastebin.com/RnZLNfvN
Ilya
Hello,
I have a question regarding running a test that I have on my host machine inside a Lava job.
Basically, I start Lava and try to submit a job through: Scheduler/Submit
The job description looks like this:
device_type: qemu
job_name: qemu amd64 LTP
timeouts:
job:
minutes: 120
action:
minutes: 120
connection:
minutes: 120
priority: medium
visibility: public
metadata:
source: https://ci.linaro.org/view/lava-ci/job/lava-debian-stable-amd64-vm/
path: https://git.linaro.org/ci/job/configs.git/blob/HEAD:/lava-debian-stable-amd…
build-readme: https://images.validation.linaro.org/snapshots.linaro.org/components/lava/s…
build-console: https://ci.linaro.org/view/lava-ci/job/lava-debian-stable-amd64-vm/console
build-log: http://images.validation.linaro.org/snapshots.linaro.org/components/lava/st…
# CONTEXT_BLOCK
context:
arch: amd64
# ACTIONS_BLOCK
actions:
- deploy:
timeout:
minutes: 120
to: tmpfs
images:
rootfs:
image_arg: -drive format=raw,file={rootfs}
url: https://images.validation.linaro.org/snapshots.linaro.org/components/lava/s…
sha256sum: 4ab50cc69fc61faa9bf48edada8bc1a317247f77ced5a815f40e75cef1d62cc7
compression: gz
# BOOT_BLOCK
- boot:
method: qemu
media: tmpfs
timeout:
minutes: 120
prompts:
- "root@debian:"
auto_login:
login_prompt: "login:"
username: root
- test:
timeout:
minutes: 120
definitions:
- repository:
metadata:
format: Lava-Test Test Definition 1.0
name: apache-server
description: "server installation"
os:
- debian
scope:
- functional
run:
steps:
Here I would like to have my test.Something like:
- make
- ./home/user/folder/mytest
from: inline
name: apache-server
path: inline/apache-server.yaml
Can you maybe give me a short example on how to do that?
I tried using the "inline" keyword, but to no avail.
Best regards,
Emanuel-Vladut Magas
L4B Software, Iasi, Romania
E-mail: vladut.m(a)l4b-software.com
[cid:37eb12b3-1085-46f2-8e9a-e636f5edc7d8]