- lava-users - op-lists.linaro.org

by Dan Rue

I'm sorry, as surely this is an FAQ but I've spent quite a bit of time troubleshooting and reading. This is very similar to Kevin's thread from May subject 'u-boot devices broken after 2018.4 upgrade, strange u-boot interaction'. In that thread's case, the issue was that interrupt_char was being set to "\n". My symptoms are the same, but interrupt_char is set to " " or "d". I'm running LAVA from the latest released containers (2018.11), and trying to use a beaglebone-black with a more recent u-boot than exists in validation.l.o. qemu works fine. The problem seems to be that LAVA thinks there's a prompt when there isn't, and so it sends commands too quickly. Here's example output from the serial console (job link[2]): U-Boot 2017.07 (Aug 31 2017 - 15:35:58 +0000) CPU : AM335X-GP rev 2.1 I2C: ready DRAM: 512 MiB No match for driver 'omap_hsmmc' No match for driver 'omap_hsmmc' Some drivers were not found MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1 Net: cpsw, usb_ether Press SPACE to abort autoboot in 10 seconds => => setenv autoload no => setenv initrd_high 0xffffffff => setenv fdt_high 0xffffffff => dhcp link up on port 0, speed 100, full duplex BOOTP broadcast 1 BOOTP broadcast 2 BOOTP broadcast 3 DHCP client bound to address 10.100.0.55 (1006 ms) => 172.28.0.4 Unknown command '172.28.0.4' - try 'help' => tftp 0x82000000 57/tftp-deploy-t7xus3ey/kernel/vmlinuz link up on port 0, speed 100, full duplex *** ERROR: `serverip' not set ... When I u-boot manually, after I hit SPACE (or 'd', both work), u-boot *deletes* the character and then prints '=> ' (is that delete the root cause?). When LAVA runs, it shows an extra => and starts typing as seen above. dhcp takes a second or two, and so the subsequent command starts to get lost (in the above log we see an IP, because 'setenv serverip' got lost). If I set boot_character_delay to like 1000, it works because it gives enough time for dhcp to finish before typing the next character, but obviously makes the job very slow, and still not reliable. I'm out of ideas.. help? P.S. Two interesting things I've learned recently: 1) boot_character_delay must be specified in device_types file. it's ignored when specified in the device file (surprising, as I see it listed in some people's device files[3]). 2) If you install ser2net from sid, you can set max-connections and do some _very handy_ voyeurism on the serial console while lava does its thing (hat tip Kevin Hilman for that one). Thanks, Dan [1] https://lists.lavasoftware.org/pipermail/lava-users/2018-May/001064.html [2] https://lava.therub.org/scheduler/job/57 [3] https://git.linaro.org/lava/lava-lab.git/tree/lkft.validation.linaro.org/ma… -- Linaro - Kernel Validation

6 years, 5 months

2
7
0 0

Re: [Lava-users] reboot during test

by cnspring2002

100 testcases, submit to lava before leave the office, expect to get all the results next morning. If everything is ok, get 100 results next morning, and check every issues. Then, e.g. the 10th cases OOM, I wish from 11st case to 100th cases can continue run during the night, so I want to reboot the device after 10th case which I find it OOM. Then after reboot, continue 11st case to 100 the cases. I know OOM not automation related, but if I do not resume the 11st ~ 100th cases during this night. I had to resubmit these cases tomorrow morning after I back to office, maybe the 100th case also have some bug, I wish it could send a result, then that morning I could assign other guy to fix it quickly, not wait I back to office then remove the 10th issue case, resubmit it, wait another 8 hours, finally execute the 100th case. 8 hours passed, we do not want the process's efficiency so low! This is our aim. ------------------------------------------------------------------ 发件人：lava-users-request <lava-users-request(a)lists.lavasoftware.org> 发送时间：2019年1月25日(星期五) 16:55 收件人：lava-users <lava-users(a)lists.lavasoftware.org> 主　题：Lava-users Digest, Vol 5, Issue 37 Send Lava-users mailing list submissions to lava-users(a)lists.lavasoftware.org To subscribe or unsubscribe via the World Wide Web, visit https://lists.lavasoftware.org/mailman/listinfo/lava-users or, via email, send a message with subject or body 'help' to lava-users-request(a)lists.lavasoftware.org You can reach the person managing the list at lava-users-owner(a)lists.lavasoftware.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Lava-users digest..." Today's Topics: 1. reboot during test (cnspring2002) 2. Re: AOSP multiple node job (Neil Williams) 3. Re: reboot during test (Neil Williams) 4. Re: AOSP multiple node job (Chase Qi) ---------------------------------------------------------------------- Message: 1 Date: Thu, 24 Jan 2019 20:11:39 +0800 From: cnspring2002 <cnspring2002(a)aliyun.com> To: lava-users(a)lists.lavasoftware.org Subject: [Lava-users] reboot during test Message-ID: <C0C4B61B-7B5A-4DAB-B644-849E77F0119B(a)aliyun.com> Content-Type: text/plain; charset=us-ascii Dear all, In test stage, I have a case, when it run, one firmware OOM. So I trigger the device to reboot, Then all left case cannot run . I can not use multiple boot when define job in advance because I can not predict which case will make OOM, what you suggest doing? ------------------------------ Message: 2 Date: Thu, 24 Jan 2019 15:06:56 +0000 From: Neil Williams <neil.williams(a)linaro.org> To: Chase Qi <chase.qi(a)linaro.org> Cc: lava-users(a)lists.lavasoftware.org Subject: Re: [Lava-users] AOSP multiple node job Message-ID: <CAC6CAR3v_i-vOv4BVe56RHR4PahMihexpNa6v4w44UNb+_PVuw(a)mail.gmail.com> Content-Type: text/plain; charset="UTF-8" On Thu, 24 Jan 2019 at 11:41, Chase Qi <chase.qi(a)linaro.org> wrote: > > Hi, > > In most cases, we don't need multiple node job as we can control AOSP > DUT from lxc via adb over USB. However, here is the use case. > > CTS/VTS tradefed-shell --shards option supports to split tests and run > them on multiple devices in parallel. To leverage the feature in LAVA, > we need multinode job, right? If more than one device needs to have images deployed and booted specifically for this test job, then yes. MultiNode is required. To be sure that each device is at the same stage (as deploy and boot timings can vary), the test job will need to wait for all test jobs to be synchronised to the same point in each test job - synchronisation is currently restricted to POSIX shells. > And in multinode job, master-node lxc > needs access to DUTs from salve nodes via adb over tcpip, right? Not necessarily. From the LXC, the device can be controlled using USB. There is no need for devices to have a direct connection to each other just to use MultiNode. The shards implementation may require that though. > Karsten shared a job example here[1]. This probably is the most > advanced usage of LAVA All MultiNode is a complex usage of LAVA but VLANd used by the networking teams is more complex than your use case. >, and probably also not encouraged? To make it > more clear, the connectivity should look like this. There is a problem in this model: Every DUT will have it's own LXC and that device will be connected to the LXC using USB. > master.lxc <----adb over usb----> master.dut > master.lxc <----adb over tcpip ---> slave1.dut > master.lxc <----adb over tcpip ---> slave2.dut Do not separate the LXC from the DUT - the LXC and it's DUT are a single node. Master DUT has a master LXC. Slave1 DUT has a Slave1 LXC Slave2 DUT has a Slave2 LXC. Depending on the boards in use, you may be able to configure each DUT, including the master DUT, to have TCP/IP networking. That then allows the processes running in the Master node to access the slave nodes. (The following model is based on a theoretical device which doesn't have the crippling USB OTG problem of the hikey - but the hikey can work in this model if the IP addresses are determined statically and therefore are available to each slave LXC.) 0: A program executing in the Master LXC which uses USB to send commands to the master DUT which allow the Master LXC to retrieve the IP address of the master DUT. 1: That program in the Master LXC then uses the MultiNode API (lava-send) to declare that IP address to all the slave nodes. This is equivalent to how existing jobs declare the IP address of the device when using secondary connections. 2: Each slave node waits for the master-ip-addr message and sets that value in a program executing in the slave LXC. The slave LXC is connected to the slave DUT directly using USB so can use this to set the master IP address, if that is required. 3: Each slave node now runs a program in each slave LXC to connect to the slave DUT over USB and extract the slave DUT IP address 4: Each slave node then broadcasts that slave-<ID>-ip-addr message, so the first slave sends slave-1-ip-addr containing an IP address, slave 2 sends slave-2-ip-addr containing a different IP address. 5: The master node is waiting for all of these messages to be sent and retrieves the values in turn. This information is now available to a program executing inside the master LXC. This program could use USB to set these values in the master DUT, if that is required. 6: During this time, all the slave nodes are waiting for the master node to broadcast another message saying that the work on the master is complete. 7: Once the master sends the complete message, each slave node picks up this message from the MultiNode API and the script executing in the slave LXC then ends the Lava Test Definition and the slave test job completes. 8: The master can then do some other stuff and then complete. https://staging.validation.linaro.org/scheduler/job/246447/multinode_defini… https://staging.validation.linaro.org/scheduler/job/246230/multinode_defini… Don't obsess about the LXC either. With upcoming changes for docker support, we could remove the presence of the LXC entirely. The LXC with android devices only exists as a unit of isolation for the benefit of the dispatcher. It has useful side effects but the reason for the LXC to exist is to separate the fastboot operations from the dispatcher operations. For hikey and it's broken USB OTG support: 0: Each slave test job turns off the USB OTG support once the slave LXC has deployed all the test image files and determined that the slave DUT has booted correctly. If not, use lava-test-raise. 1: Next, each slave LXC uses the IP address of it's own slave DUT to check connectivity. If this fails, use lava-test-raise. 2: Each slave LXC uses the MultiNode API to declare the IP address of the slave DUT (because the slave node has determined that this IP is working). 3: The master node is waiting for these messages and these are picked up by the master LXC test definition. 4: The master LXC test definition issues commands to the master DUT - now depending on how the sharding works, this could be over USB (turn the USB OTG off later) or over TCP/IP (turn off the master USB OTG at the start of this test definition). 5: The master DUT has enough information to drive the sharding across the slave DUTs. The slave LXCs are waiting for the master to finish the sharding. (lava-wait) 6: When the master LXC determines that the master DUT has finished the sharding, then the master LXC sends a message to all the slave nodes that the test is complete. 7: Each slave node picks up the completion message in the slave LXC and the test definition finishes. 8: The master node can continue to do other tasks or can also complete it's test definition. > .... > > I see two options for adb over tcpip. > > Option #1: WiFi. adb over wifi can be enabled easily by issuing adb > cmds from lxc. I am not using it for two reasons. Agreed, this doesn't need to rely on WiFi. > > * WiFi isn't reliable for long cts/vts test run. > * In Cambridge lab, WiFi sub-network isn't accessible from lxc > network. Because of security concerns, there is no plan to change > that. > > Option #2: Wired Ethernet. On devices like hikey, we need to run > 'pre-os-command' in boot action to power off OTG port so that USB > Ethernet dongle works. Once OTG port is off, lxc has no access to the > DUT, then test definition should be executed on DUT, right? I am also > having the following problems to do this. Before the OTG is switched, all data from the DUT needs to be retrieved (and set) using the USB connection. What information you need to set depends on how the sharding works. The problem, as I see it, is that the slave DUTs have no way to declare their IP address to the slave LXC once the OTG port is switched. Therefore, you will need to put in a request for the boards to have static IP addresses declared in the device dictionary. Then the OTG can be switched and things become easier because the LXC knows the IP address and can simply declare that to the MultiNode API so that the master LXC can know which IP matches which node. There are already a number of hikey devices with the static_ip device tag and you can specify this device tag in your MultiNode test definition. > > * Without context overriding, overlay tarball will be applied to > '/system' directory and test job reported "/system/bin/sh: Why are you talking about /system ??? MultiNode only operates in a POSIX shell - the POSIX shell is in the LXC and each DUT has a dedicated LXC. In this use case, MultiNode API calls are only going to be made from each LXC. The master LXC sends some information and then receives information from test definitions running in each of the slave LXCs. The overlay is to be deployed to the LXC, not the DUT because this is an Android system. What the android system does is determined either by commands run inside the slave LXC to deploy files (before the OTG switch) or commands run inside the master LXC (with knowledge of the IP address from the MultiNode API) to execute commands on the DUT over TCP/IP. Use the LXC to deploy the files and boot the device, then to declare information about each particular node. Once that is done, whatever thing is controlling the test needs to just use TCP/IP to communicate and use the MultiNode API to send messages and allow some nodes to wait for other nodes whilst the test proceeds. > /lava-247856/bin/lava-test-runner: not found"[2]. > * With the following job context, LAVA still runs > '/lava-24/bin/lava-test-runner /lava-24/0' and it hangs there. It is > tested in my local LAVA instance, test job definition and test log > attached. Maybe my understanding on the context overriding is wrong, I > thought LAVA should execute '/system/lava-24/bin/lava-test-runner > /system/lava-24/0' instead. Any suggestions would be appreciated. > > context: > lava_test_sh_cmd: '/system/bin/sh' > lava_test_results_dir: '/system/lava-%s' > > I checked on the DUT directly, '/system/lava-%s' exist, but I cannot > really run lava-test-runner. The shebang line seems problematic. > > --- hacking --- > hikey:/system/lava-24/bin # ./lava-test-runner > /system/bin/sh: ./lava-test-runner: No such file or directory > hikey:/system/lava-24/bin # cat lava-test-runner > #!/bin/bash > > #!/bin/sh > > .... > # /system/bin/sh lava-test-runner > lava-test-runner[18]: .: /lava/../bin/lava-common-functions: No such > file or directory > --- ends --- > > I had a discussion with Milosz. He proposed the third option which > probably will be the most reliable one, but it is not supported in > LAVA yet. Here is the idea. Milosz, feel free to explain more. > > **Option #3**: Add support for accessing to multiple DUTs in single node job. > > * Physically, we need the DUTs connected via USB cable to the same dispatcher. I don't see that this solves anything and it adds a lot of unnecessary lab configuration - entirely duplicating the point of having ethernet connections to the boards. Assign static IP addresses to each board and when the test job starts, each dedicated LXC can declare the static information according to whichever board was assigned to whichever node. The DUTs only need to be visible to programs running on the master node and that can be done by declaring static IP addresses using the MultiNode API. > * In single node job, LAVA needs to add the DUTs specified(somehow) or > assigned randomly(lets say both device type and numbers defined) to > the same lxc container. Test definitions can take over from here. No - the LXC is used to issue commands to deploy test images to the DUT. The LXC is a transparent part of the dispatcher, it is not just for test definitions. The LXC cannot be used for multiple test jobs, it is part of the one dispatcher. > > Is this can be done in LAVA? Can I require the feature? Any > suggestions on the possible implementations? > > > Thanks, > Chase > > [1] https://review.linaro.org/#/c/qa/test-definitions/+/29417/4/automated/andro… > [2] https://staging.validation.linaro.org/scheduler/job/247856#L1888 > _______________________________________________ > Lava-users mailing list > Lava-users(a)lists.lavasoftware.org > https://lists.lavasoftware.org/mailman/listinfo/lava-users -- Neil Williams ============= neil.williams(a)linaro.org http://www.linux.codehelp.co.uk/ ------------------------------ Message: 3 Date: Thu, 24 Jan 2019 15:09:58 +0000 From: Neil Williams <neil.williams(a)linaro.org> To: cnspring2002 <cnspring2002(a)aliyun.com> Cc: lava-users(a)lists.lavasoftware.org Subject: Re: [Lava-users] reboot during test Message-ID: <CAC6CAR3fV+b1p5T2EVxUCPP7d=Erno-20MNVxPaZPVf3tbK3Yg(a)mail.gmail.com> Content-Type: text/plain; charset="UTF-8" On Thu, 24 Jan 2019 at 12:13, cnspring2002 <cnspring2002(a)aliyun.com> wrote: > > Dear all, > > In test stage, I have a case, when it run, one firmware OOM. So I trigger the device to reboot, Then all left case cannot run . I can not use multiple boot when define job in advance because I can not predict which case will make OOM, what you suggest doing? The out of memory killer is a fatal device error. The test job is not going to be able to continue because the failure mode is unpredictable. The cause of the OOM needs to be determined through standard triage, not automation. (Although automation may help create a data matrix of working and failing combinations and test operations.) -- Neil Williams ============= neil.williams(a)linaro.org http://www.linux.codehelp.co.uk/ ------------------------------ Message: 4 Date: Fri, 25 Jan 2019 15:45:56 +0800 From: Chase Qi <chase.qi(a)linaro.org> To: Neil Williams <neil.williams(a)linaro.org> Cc: lava-users(a)lists.lavasoftware.org Subject: Re: [Lava-users] AOSP multiple node job Message-ID: <CADzYPRFJiX8qKt_NyHZCi0qs5iotx0wg0OMN9o7SOi84sYYTow(a)mail.gmail.com> Content-Type: text/plain; charset="utf-8" Hi Neil, Thanks a lot for your guidance. It is really good to see you back :) On Thu, Jan 24, 2019 at 11:07 PM Neil Williams <neil.williams(a)linaro.org> wrote: > > On Thu, 24 Jan 2019 at 11:41, Chase Qi <chase.qi(a)linaro.org> wrote: > > > > Hi, > > > > In most cases, we don't need multiple node job as we can control AOSP > > DUT from lxc via adb over USB. However, here is the use case. > > > > CTS/VTS tradefed-shell --shards option supports to split tests and run > > them on multiple devices in parallel. To leverage the feature in LAVA, > > we need multinode job, right? > > If more than one device needs to have images deployed and booted > specifically for this test job, then yes. MultiNode is required. To be > sure that each device is at the same stage (as deploy and boot timings > can vary), the test job will need to wait for all test jobs to be > synchronised to the same point in each test job - synchronisation is > currently restricted to POSIX shells. > > > And in multinode job, master-node lxc > > needs access to DUTs from salve nodes via adb over tcpip, right? > > Not necessarily. From the LXC, the device can be controlled using USB. > There is no need for devices to have a direct connection to each other > just to use MultiNode. The shards implementation may require that > though. CTS/VTS sharding shards a run into given number of independent chunks, to run on multiple devices that connected to the same host. The host will be the master lxc in our case. > > > Karsten shared a job example here[1]. This probably is the most > > advanced usage of LAVA > > All MultiNode is a complex usage of LAVA but VLANd used by the > networking teams is more complex than your use case. > > >, and probably also not encouraged? To make it > > more clear, the connectivity should look like this. > > There is a problem in this model: Every DUT will have it's own LXC and > that device will be connected to the LXC using USB. > > > master.lxc <----adb over usb----> master.dut > > master.lxc <----adb over tcpip ---> slave1.dut > > master.lxc <----adb over tcpip ---> slave2.dut > > Do not separate the LXC from the DUT - the LXC and it's DUT are a single node. > > Master DUT has a master LXC. > Slave1 DUT has a Slave1 LXC > Slave2 DUT has a Slave2 LXC. > > Depending on the boards in use, you may be able to configure each DUT, > including the master DUT, to have TCP/IP networking. That then allows > the processes running in the Master node to access the slave nodes. > Yes, that is what I am trying to do. The above connectivity topology I wrote is the goal not the initial state with LAVA design. Master lxc needs access to all the DUT nodes, either via USB or tcpip. > (The following model is based on a theoretical device which doesn't > have the crippling USB OTG problem of the hikey - but the hikey can > work in this model if the IP addresses are determined statically and > therefore are available to each slave LXC.) > > 0: A program executing in the Master LXC which uses USB to send > commands to the master DUT which allow the Master LXC to retrieve the > IP address of the master DUT. > > 1: That program in the Master LXC then uses the MultiNode API > (lava-send) to declare that IP address to all the slave nodes. This is > equivalent to how existing jobs declare the IP address of the device > when using secondary connections. > > 2: Each slave node waits for the master-ip-addr message and sets that > value in a program executing in the slave LXC. The slave LXC is > connected to the slave DUT directly using USB so can use this to set > the master IP address, if that is required. > > 3: Each slave node now runs a program in each slave LXC to connect to > the slave DUT over USB and extract the slave DUT IP address > > 4: Each slave node then broadcasts that slave-<ID>-ip-addr message, so > the first slave sends slave-1-ip-addr containing an IP address, slave > 2 sends slave-2-ip-addr containing a different IP address. > > 5: The master node is waiting for all of these messages to be sent and > retrieves the values in turn. This information is now available to a > program executing inside the master LXC. This program could use USB to > set these values in the master DUT, if that is required. > > 6: During this time, all the slave nodes are waiting for the master > node to broadcast another message saying that the work on the master > is complete. > > 7: Once the master sends the complete message, each slave node picks > up this message from the MultiNode API and the script executing in the > slave LXC then ends the Lava Test Definition and the slave test job > completes. > > 8: The master can then do some other stuff and then complete. > > https://staging.validation.linaro.org/scheduler/job/246447/multinode_defini… > > https://staging.validation.linaro.org/scheduler/job/246230/multinode_defini… > > Don't obsess about the LXC either. With upcoming changes for docker > support, we could remove the presence of the LXC entirely. The LXC > with android devices only exists as a unit of isolation for the > benefit of the dispatcher. It has useful side effects but the reason > for the LXC to exist is to separate the fastboot operations from the > dispatcher operations. > > For hikey and it's broken USB OTG support: > > 0: Each slave test job turns off the USB OTG support once the slave > LXC has deployed all the test image files and determined that the > slave DUT has booted correctly. If not, use lava-test-raise. > > 1: Next, each slave LXC uses the IP address of it's own slave DUT to > check connectivity. If this fails, use lava-test-raise. > > 2: Each slave LXC uses the MultiNode API to declare the IP address of > the slave DUT (because the slave node has determined that this IP is > working). > > 3: The master node is waiting for these messages and these are picked > up by the master LXC test definition. > > 4: The master LXC test definition issues commands to the master DUT - > now depending on how the sharding works, this could be over USB (turn > the USB OTG off later) or over TCP/IP (turn off the master USB OTG at > the start of this test definition). > > 5: The master DUT has enough information to drive the sharding across > the slave DUTs. The slave LXCs are waiting for the master to finish > the sharding. (lava-wait) > > 6: When the master LXC determines that the master DUT has finished the > sharding, then the master LXC sends a message to all the slave nodes > that the test is complete. > > 7: Each slave node picks up the completion message in the slave LXC > and the test definition finishes. > > 8: The master node can continue to do other tasks or can also complete > it's test definition. > > > > .... > > > > I see two options for adb over tcpip. > > > > Option #1: WiFi. adb over wifi can be enabled easily by issuing adb > > cmds from lxc. I am not using it for two reasons. > > Agreed, this doesn't need to rely on WiFi. > > > > > * WiFi isn't reliable for long cts/vts test run. > > * In Cambridge lab, WiFi sub-network isn't accessible from lxc > > network. Because of security concerns, there is no plan to change > > that. > > > > Option #2: Wired Ethernet. On devices like hikey, we need to run > > 'pre-os-command' in boot action to power off OTG port so that USB > > Ethernet dongle works. Once OTG port is off, lxc has no access to the > > DUT, then test definition should be executed on DUT, right? I am also > > having the following problems to do this. > > Before the OTG is switched, all data from the DUT needs to be > retrieved (and set) using the USB connection. > > What information you need to set depends on how the sharding works. > > The problem, as I see it, is that the slave DUTs have no way to > declare their IP address to the slave LXC once the OTG port is > switched. Therefore, you will need to put in a request for the boards That is the problem I had. And that is why I was trying to run test definition on Android DUT directly to enable adb over tcpip and declare IP address. As you mentioned below, it is the wrong direction. > to have static IP addresses declared in the device dictionary. Then > the OTG can be switched and things become easier because the LXC knows > the IP address and can simply declare that to the MultiNode API so > that the master LXC can know which IP matches which node. There are > already a number of hikey devices with the static_ip device tag and > you can specify this device tag in your MultiNode test definition. Brilliant and brand new idea to me. I didn't realize static-ip tag is the solution. I have managed to enable and test adb over tcpip in this way(In my local instance). I have attached my test job definition here in case it is any help for other LAVA users. The following definitions are essential. tags: - static-ip reboot_to_fastboot: false - test: namespace: tlxc timeout: minutes: 10 protocols: lava-lxc: - action: lava-test-shell request: pre-os-command timeout: minutes: 2 Thanks, Chase > > > > > * Without context overriding, overlay tarball will be applied to > > '/system' directory and test job reported "/system/bin/sh: > > Why are you talking about /system ??? MultiNode only operates in a > POSIX shell - the POSIX shell is in the LXC and each DUT has a > dedicated LXC. In this use case, MultiNode API calls are only going to > be made from each LXC. The master LXC sends some information and then > receives information from test definitions running in each of the > slave LXCs. > > The overlay is to be deployed to the LXC, not the DUT because this is > an Android system. What the android system does is determined either > by commands run inside the slave LXC to deploy files (before the OTG > switch) or commands run inside the master LXC (with knowledge of the > IP address from the MultiNode API) to execute commands on the DUT over > TCP/IP. > > Use the LXC to deploy the files and boot the device, then to declare > information about each particular node. Once that is done, whatever > thing is controlling the test needs to just use TCP/IP to communicate > and use the MultiNode API to send messages and allow some nodes to > wait for other nodes whilst the test proceeds. > > > /lava-247856/bin/lava-test-runner: not found"[2]. > > * With the following job context, LAVA still runs > > '/lava-24/bin/lava-test-runner /lava-24/0' and it hangs there. It is > > tested in my local LAVA instance, test job definition and test log > > attached. Maybe my understanding on the context overriding is wrong, I > > thought LAVA should execute '/system/lava-24/bin/lava-test-runner > > /system/lava-24/0' instead. Any suggestions would be appreciated. > > > > context: > > lava_test_sh_cmd: '/system/bin/sh' > > lava_test_results_dir: '/system/lava-%s' > > > > I checked on the DUT directly, '/system/lava-%s' exist, but I cannot > > really run lava-test-runner. The shebang line seems problematic. > > > > --- hacking --- > > hikey:/system/lava-24/bin # ./lava-test-runner > > /system/bin/sh: ./lava-test-runner: No such file or directory > > hikey:/system/lava-24/bin # cat lava-test-runner > > #!/bin/bash > > > > #!/bin/sh > > > > .... > > # /system/bin/sh lava-test-runner > > lava-test-runner[18]: .: /lava/../bin/lava-common-functions: No such > > file or directory > > --- ends --- > > > > I had a discussion with Milosz. He proposed the third option which > > probably will be the most reliable one, but it is not supported in > > LAVA yet. Here is the idea. Milosz, feel free to explain more. > > > > **Option #3**: Add support for accessing to multiple DUTs in single node job. > > > > * Physically, we need the DUTs connected via USB cable to the same dispatcher. > > I don't see that this solves anything and it adds a lot of unnecessary > lab configuration - entirely duplicating the point of having ethernet > connections to the boards. Assign static IP addresses to each board > and when the test job starts, each dedicated LXC can declare the > static information according to whichever board was assigned to > whichever node. > > The DUTs only need to be visible to programs running on the master > node and that can be done by declaring static IP addresses using the > MultiNode API. > > > * In single node job, LAVA needs to add the DUTs specified(somehow) or > > assigned randomly(lets say both device type and numbers defined) to > > the same lxc container. Test definitions can take over from here. > > No - the LXC is used to issue commands to deploy test images to the > DUT. The LXC is a transparent part of the dispatcher, it is not just > for test definitions. The LXC cannot be used for multiple test jobs, > it is part of the one dispatcher. > > > > > Is this can be done in LAVA? Can I require the feature? Any > > suggestions on the possible implementations? > > > > > > Thanks, > > Chase > > > > [1] https://review.linaro.org/#/c/qa/test-definitions/+/29417/4/automated/andro… > > [2] https://staging.validation.linaro.org/scheduler/job/247856#L1888 > > _______________________________________________ > > Lava-users mailing list > > Lava-users(a)lists.lavasoftware.org > > https://lists.lavasoftware.org/mailman/listinfo/lava-users > > > > -- > > Neil Williams > ============= > neil.williams(a)linaro.org > http://www.linux.codehelp.co.uk/

6 years, 5 months

2
1
0 0

reboot during test

by cnspring2002

Dear all, In test stage, I have a case, when it run, one firmware OOM. So I trigger the device to reboot, Then all left case cannot run . I can not use multiple boot when define job in advance because I can not predict which case will make OOM, what you suggest doing?

6 years, 5 months

2
1
0 0

Import junit data into LAVA

by Diego Russo

Hello list, Apologies if this question has been asked already. I have a test framework which spits out a junit file. What’s the best way to import data from the junit file into LAVA? Cheers -- Diego Russo | Staff Software Engineer | Mbed Linux OS ARM Ltd. CPC1, Capital Park, Cambridge Road, Fulbourn, CB21 5XE, United Kingdom http://www.diegor.co.uk - https://os.mbed.com/linux-os/ IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

6 years, 5 months

3
2
0 0

Reboot DUT during job execution

by Diego Russo

Hello, we have the need to perform tests that require reboots of the DUT between their executions. Few examples are to check rootfs upgrades or to check configurations changes to persist . I have few questions: * Does LAVA support those cases? * If yes, does LAVA support multiple reboots? * If yes, how can I write tests in order to run different sets of tests at any boot. * Example: 1) do an upgrade 2) reboot the device 3) Check if the upgrade was successful * How can I structure my pipeline? Thanks -- Diego Russo | Staff Software Engineer | Mbed Linux OS ARM Ltd. CPC1, Capital Park, Cambridge Road, Fulbourn, CB21 5XE, United Kingdom http://www.diegor.co.uk - https://os.mbed.com/linux-os/ IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

6 years, 5 months

2
4
0 0

Re: [Lava-users] How can I run lava-publisher service in docker container?

by Remi Duraffort

> > > When Lava start test or finish the test job, squad zmq SUB socket seems > not receive log message from lava-server. > > It just print out '[2019-01-22 08:26:24 +0000] [DEBUG] > nexell.lava.server2: connected to tcp://192.168.1.20:2222:5500'. > The url is really strange as two port numbers are specified. Should be "tcp://192.168.1.20:5500" Rgds -- Rémi Duraffort LAVA Team, Linaro

6 years, 5 months

1
0
0 0

Reboot DUT during job execution

by Diego Russo

Hello, we have the need to perform tests that require reboots of the DUT between their executions. Few examples are to check rootfs upgrades or to check configurations changes to persist . I have few questions: * Does LAVA support those cases? * If yes, does LAVA support multiple reboots? * If yes, how can I write tests in order to run different sets of tests at any boot. * Example: 1) do an upgrade 2) reboot the device 3) Check if the upgrade was successful * How can I structure my pipeline? Thanks -- Diego Russo | Staff Software Engineer | Mbed Linux OS ARM Ltd. CPC1, Capital Park, Cambridge Road, Fulbourn, CB21 5XE, United Kingdom http://www.diegor.co.uk - https://os.mbed.com/linux-os/ IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

6 years, 5 months

1
0
0 0

Piece of advice to combine interactive boot and Linux testing

by Denis HUMEAU

Dear Lava users, I'm trying to create a job that performs the following steps : Deploy to flasher our custom tool (OK) Boot in bootloader mode - u-boot (OK) Test in interactive mode (OK) From interactive mode start kernel (OK) Launch a test from kernel (KO) The last part fails because I don't know where to check the kernel prompt. I cannot add a boot stage and declare an additional prompt because I don't want to reboot, that would reset the configuration done in interactive mode. Do you have any piece of advice or example of job showing how to proceed to manage the kernel prompt & autologin in such a job? Best regards, Denis

6 years, 5 months

4
18
0 0

check boot stages versions in a Lava job

by Denis HUMEAU

Hello, A request from some Lava users internally. We have 3 boot stages, TF-A, U-boot, and kernel. Is there a way, in a Lava job, to test that these components' versions are the expected ones? That would mean, as far as I understand, not testing the embedded software itself, but the Lava job log... Best regards, Denis

6 years, 5 months

3
3
0 0

How to upgrade lava without data loss?

by Larry Shen

Two questions here: 1. If a master-only instance here, how to upgrade without data loss? 2. If a slave-only instance here, how to upgrade without data loss? Please suggest, thanks.

6 years, 5 months

3
3
0 0

Loading LAVA specific kernel modules fails

by Tim Jaacks

Hello, the lava-dispatcher package installs a file in /etc/modules-load.d/lava-modules.conf, containing the following lines: install ipv6 modprobe ipv6 install brltty /bin/false On my debian 9.3 system, this file leads to failures when the systemd-modules-load service starts: Job for systemd-modules-load.service failed because the control process exited with error code. See "systemctl status systemd-modules-load.service" and "journalctl -xe" for details. The details in journalctl say: Jan 07 15:08:30 a048 systemd[1]: Starting Load Kernel Modules... -- Subject: Unit systemd-modules-load.service has begun start-up -- Defined-By: systemd -- Support: https://www.debian.org/support -- -- Unit systemd-modules-load.service has begun starting up. Jan 07 15:08:30 a048 systemd-modules-load[2088]: Failed to find module 'install ipv6 modprobe ipv6' Jan 07 15:08:30 a048 systemd-modules-load[2088]: Failed to find module 'install brltty /bin/false' Jan 07 15:08:30 a048 systemd[1]: systemd-modules-load.service: Main process exited, code=exited, status=1/FAILURE Jan 07 15:08:30 a048 systemd[1]: Failed to start Load Kernel Modules. Obviously the "install" syntax is supported only in /etc/modprobe.d: https://manpages.debian.org/stretch/kmod/modprobe.d.5.en.html But not in /etc/modules-load.d: https://manpages.debian.org/stretch/systemd/modules-load.d.5.en.html Is this a known issue? Mit freundlichen Grüßen / Best regards Tim Jaacks DEVELOPMENT ENGINEER Garz & Fricke GmbH Tempowerkring 2 21079 Hamburg Direct: +49 40 791 899 - 55 Fax: +49 40 791899 - 39 tim.jaacks(a)garz-fricke.com www.garz-fricke.com WE MAKE IT YOURS! Sitz der Gesellschaft: D-21079 Hamburg Registergericht: Amtsgericht Hamburg, HRB 60514 Geschäftsführer: Matthias Fricke, Manfred Garz, Marc-Michael Braun

6 years, 5 months

2
4
0 0

How to run off DUT tests?

by Diego Russo

Folks, We have the need to run some tests which requires a process off DUT to be executed. Examples are: * port scan like process against the DUT * cli tests which interact with the DUT The workflow could be like: * Deploy the DUT * Boot the DUT * Run a process off DUT (actual test) * Collect test results In our setup we assume there is always a network connection between the DUT and the LAVA dispatcher. How can we achieve such workflow with LAVA? Cheers -- Diego Russo | Staff Software Engineer | Mbed Linux OS ARM Ltd. CPC1, Capital Park, Cambridge Road, Fulbourn, CB21 5XE, United Kingdom http://www.diegor.co.uk - https://os.mbed.com/linux-os/ IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

6 years, 5 months

2
2
0 0

Lavacli : configuration to get lava jobs logs

by Philippe BEGNIC

Dear all, I need to get "jobs logs" using lavacli. To do that, I use the following command : lavacli jobs logs <job_id> I can perform this operation with a user that has a profile "superuser". We have also users with restricted permissions ( they can only submit jobs using lavacli ), and I would like them to get logs using lavacli. So, I have consulted in the Lava django administration page, the users permission that can be set, but I could not find which option could allow a "basic user" to get log jobs using lavacli. Is it possible to allow a user with "basic profile" to do that ? If yes, with user permission have to be set ? Best regards Philippe Begnic

6 years, 5 months

2
2
0 0

zmq example not work

by cnspring2002

Just use the sample command "python zmq_client.py -j 357 --hostname tcp://127.0.0.1:5500 -t 1200" Get the error: Traceback (most recent call last): File "zmq_client_1.py", line 155, in <module> main() File "zmq_client_1.py", line 139, in main publisher = lookup_publisher(options.hostname, options.https) File "zmq_client_1.py", line 109, in lookup_publisher socket = server.scheduler.get_publisher_event_socket() File "/usr/lib/python3.5/xmlrpc/client.py", line 1092, in __call__ return self.__send(self.__name, args) File "/usr/lib/python3.5/xmlrpc/client.py", line 1432, in __request verbose=self.__verbose File "/usr/lib/python3.5/xmlrpc/client.py", line 1134, in request return self.single_request(host, handler, request_body, verbose) File "/usr/lib/python3.5/xmlrpc/client.py", line 1146, in single_request http_conn = self.send_request(host, handler, request_body, verbose) File "/usr/lib/python3.5/xmlrpc/client.py", line 1259, in send_request self.send_content(connection, request_body) File "/usr/lib/python3.5/xmlrpc/client.py", line 1289, in send_content connection.endheaders(request_body) File "/usr/lib/python3.5/http/client.py", line 1103, in endheaders self._send_output(message_body) File "/usr/lib/python3.5/http/client.py", line 934, in _send_output self.send(msg) File "/usr/lib/python3.5/http/client.py", line 877, in send self.connect() File "/usr/lib/python3.5/http/client.py", line 849, in connect (self.host,self.port), self.timeout, self.source_address) File "/usr/lib/python3.5/socket.py", line 694, in create_connection for res in getaddrinfo(host, port, 0, SOCK_STREAM): File "/usr/lib/python3.5/socket.py", line 733, in getaddrinfo for res in _socket.getaddrinfo(host, port, family, type, proto, flags): socket.gaierror: [Errno -2] Name or service not known I wonder for following code, why http://tcp://127.0.0.1:5500/RPC2 will finally be passed to ServerProxy? If I remove http, just tcp://127.0.0.1:5500/RPC2, it will say "OSError: unsupported XML-RPC protocol" If I remove tcp, just http://127.0.0.1:5500/RPC2, it will hang in get_publisher_event_socket I use 2018.11 version, please suggest!!! xmlrpc_url = "http://%s/RPC2" % (hostname) if https: xmlrpc_url = "https://%s/RPC2" % (hostname) server = xmlrpc.client.ServerProxy(xmlrpc_url) try: socket = server.scheduler.get_publisher_event_socket()

6 years, 5 months

2
1
0 0

How can I run lava-publisher service in docker container?

by tomato

I am trying to submit test job using squad. When squad fetch the result of lava, the interval is too long. It takes at least 1 hour. I want to reduce fetch time as short as possible. This is what Squad team answer. Alternatively you can turn on ZMQ notifications in LAVA and run squad listener. This will cause test results to be fetched immediately after the test job finishes in LAVA. Enabling ZMQ publisher: https://master.lavasoftware.org/static/docs/v2/advanced-installation.html#c…. SQUAD listener is a separate process. It doesn't need any additional settings on top of what you already have. So I want to try restart lava-publisher service. I am running lava-server and dispatcher using Linaro lava docker image. On docker container, there is no service named lava-publisher. How can I manage this? Thanks.

6 years, 5 months

2
1
0 0

Differentiate RPi3 B and B+ on LAVA

by Diego Russo

Hello, Our LAVA deployment has both RPi3 B and B+ and we are interested only on the 32bit version. The device type to use looks like: https://git.lavasoftware.org/lava/lava/blob/master/lava_scheduler_app/tests… Does this device type cover both B and B+? I mean, can we use it for B+ as well? If yes, what's the best way to differentiate those on LAVA? Creating a new device type for B+ (which is the same of the above) or using tags? Thanks -- Diego Russo | Staff Software Engineer | Mbed Linux OS ARM Ltd. CPC1, Capital Park, Cambridge Road, Fulbourn, CB21 5XE, United Kingdom http://www.diegor.co.uk - https://os.mbed.com/linux-os/ IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

6 years, 5 months

2
1
0 0

Splitting the lava-server docker image

by Olli Vainola

Hi, I've been using the lava-server docker image (hub.lavasoftware.org/lava/lava/lava-server:2018.10) for a couple of weeks and I had some problems on making the data persistent for the postgres (mapping the volume from host to container). Anyways I decided to take postgres out from the startup by modifying the entrypoint.sh file: ++++++++++++++ Added these lines after start_lava_server_gunicorn -function ++++++++++++++++++++++++++ if [[ ! -z $DJANGO_POSTGRES_SERVER ]]; then txt="s/LAVA_DB_SERVER=\"localhost\"/LAVA_DB_SERVER=\"$DJANGO_POSTGRES_SERVER\"/g" sed -i $txt /etc/lava-server/instance.conf fi if [[ ! -z $DJANGO_POSTGRES_PORT ]]; then txt="s/LAVA_DB_PORT=\"5432\"/LAVA_DB_PORT=\"$DJANGO_POSTGRES_PORT\"/g" sed -i $txt /etc/lava-server/instance.conf fi ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ---------------------------- Commented out these lines ----------------------------- # Start all services #echo "Starting postgresql" #/etc/init.d/postgresql start #echo "done" #echo #echo "Waiting for postgresql" #wait_postgresql #echo "[done]" #echo -------------------------------------------- ---------------------------------------------------- After that I created a new Dockerfile and built a new image: +++++++++++++++++++++++++ Dockerfile +++++++++++++++++++++++ FROM hub.lavasoftware.org/lava/lava/lava-server:2018.10 COPY ./entrypoint.sh /root/ RUN chmod 755 /root/entrypoint.sh ENTRYPOINT ["/root/entrypoint.sh"] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> sudo docker build -t lava-server:mod . Just to get all up and running I made a docker-compose.yml file to kick postgres and lava-server up +++++++++++++++++++++++++++ docker-compose.yml +++++++++++++++++++++++++ version: '3' services: postgres: image: postgres:11 restart: always environment: POSTGRES_DB: lavaserver POSTGRES_USER: lavaserver POSTGRES_PASSWORD: d3e5d13fa15f lava-server: depends_on: - postgres image: lava-server:mod environment: DJANGO_POSTGRES_PORT: 5432 DJANGO_POSTGRES_SERVER: postgres ports: - "80:80" +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ I still feel that there is too much going on inside that lava-server:mod image, since it has following softwares running: * Lavamaster * Gunicorn * Logger * Publisher What do you think, should I still break it into smaller pieces? Pros on this would be that softwares wouldn't die silently (started by using '&' and docker could try to restart them), logs would be in their own logs windows' (docker log CONTAINER) and containers themselves would be more configurable via environment variables. Br Olli Väinölä IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

6 years, 5 months

4
4
0 0

Re: [Lava-users] [Lava-announce] Regular LAVA team design discussions open to developers

by Milosz Wasilewski

On Wed, 16 Jan 2019 at 17:33, Steve McIntyre <steve.mcintyre(a)linaro.org> wrote: > > Hi, > > In founding the LAVA Software Community Project, the team planned to > open up LAVA development more. As already announced by Neil in > September, we have already moved our infrastructure to a GitLab > instance and LAVA developers and users can collaborate there. [1] > > The next step in our process is to also open our regular development > design meetings to interested developers. The LAVA design meeting is > where the team gets together to work out deep technical issues, and to > agree on future development goals and ideas. We run these as a weekly > video conference using Google Hangouts Meet [2], We now wish to > welcome other interested developers to join us there too, to help us > develop LAVA. Steve, the only missing bit is which day and what time? :) milosz > > Summaries of the meetings will be posted regularly to the lava-devel > mailing list [3], and we encourage interested people to subscribe and > discuss LAVA development there. > > [1] https://git.lavasoftware.org/ > [2] https://meet.google.com/qre-rgen-zwc > [3] https://lists.lavasoftware.org/mailman/listinfo/lava-devel > > Cheers, > -- > Steve McIntyre steve.mcintyre(a)linaro.org > <http://www.linaro.org/> Linaro.org | Open source software for ARM SoCs > _______________________________________________ > Lava-announce mailing list > Lava-announce(a)lists.lavasoftware.org > https://lists.lavasoftware.org/mailman/listinfo/lava-announce

6 years, 5 months

2
1
0 0

Change archive and media path

by Axel Lebourhis

Hi everyone, I want to change the default path of MEDIA_ROOT and ARCHIVE_ROOT. So in /etc/lava-server-settings.conf I wrote this : "MEDIA_ROOT": "/data/lava/var/lib/lava-server/default/media", "ARCHIVE_ROOT": "/data/lava/var/lib/lava-server/default/archive", This seems to work but the web UI doesn't display the job output and when I try to download the plain log, it returns an error saying it can't be found. Could you tell me what should I do to make the web ui find the job outputs ? Best regards, Axel

6 years, 6 months

1
1
0 0

Git authentication

by Pete Dyer

In the thread about Git Authentication a solution was proposed using git credentials and the fact that the dispatcher is running as root. See: https://lists.lavasoftware.org/pipermail/lava-users/2018-December/001455.ht… I've worked out that even though the dispatcher is running as root, the environment is purged based upon env.yaml that is sent over from the master. I found that I had to add HOME=/root into env.yaml on the master for the git clone to pick up the files in the /root folder. Hope this helps anyone else trying to do this. Pete IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

6 years, 6 months

1
0
0 0

Re: [Lava-users] Change site issue

by Chuan Su

Dear Remi, I ran those commands you suggested in lava shell and it printed as below : Python 3.5.3 (default, Jan 19 2017, 14:11:04) [GCC 6.3.0 20170118] on linux Type "help", "copyright", "credits" or "license" for more information. (InteractiveConsole) >>> from django.contrib.sites.models import Site >>> Site.objects.all() <QuerySet [<Site: 192.168.100.103>]> >>> Site.objects.count() 1 >>>

6 years, 6 months

2
4
0 0

Login issue

by Chuan Su

Dear all, I was encountered with a login in issue . I've changed domain name and display name to IP , but when I relogin now, I can't reach login page , and it reports as below : 500 Internal Server ErrorSite matching query does not exist. Oops, something has gone wrong! And I set 'DEBUG = True' in settings.conf , and it reported like below : DoesNotExist at /accounts/login/ Site matching query does not exist. | Request Method: | GET | | Request URL: | http://127.0.0.1:8000/accounts/login/?next=/scheduler/job/42 | | Django Version: | 1.11.14 | | Exception Type: | DoesNotExist | | Exception Value: | Site matching query does not exist. | | Exception Location: | /usr/lib/python3/dist-packages/django/db/models/query.py in get, line 380 | | Python Executable: | /usr/bin/python3 | | Python Version: | 3.5.3 | | Python Path: | ['/', '/usr/bin', '/usr/lib/python35.zip', '/usr/lib/python3.5', '/usr/lib/python3.5/plat-x86_64-linux-gnu', '/usr/lib/python3.5/lib-dynload', '/usr/local/lib/python3.5/dist-packages', '/usr/local/lib/python3.5/dist-packages/icsectl-0.2-py3.5.egg', '/usr/lib/python3/dist-packages'] | So I just wanna know how can I set it back? Yours , sincerely Su Chuan

6 years, 6 months

1
0
0 0

RPI3 and disk image

by Diego Russo

Hello LAVA mailing list, I see RPI3 is supported by LAVA but unfortunately it doesn't fit Mbed Linux OS (MBL) case. Our build system produces a WIC image which can be written on the SD card directly (instructions here: https://os.mbed.com/docs/linux-os/v0.5/getting-started/writing-and-booting-…). This is needed for two main reasons: * MBL has its own booting flow * MBL expects a partition layout (not only the rootfs) with multiple partitions in order to work. What's the best way to automate the deployment on MBL on RPI3 via LAVA in order to run our tests? Cheers -- Diego Russo | Staff Software Engineer | Mbed Linux OS ARM Ltd. CPC1, Capital Park, Cambridge Road, Fulbourn, CB21 5XE, United Kingdom http://www.diegor.co.uk - https://os.mbed.com/linux-os/ IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

6 years, 6 months

4
6
0 0

How can I use multiple devices under one device type?

by tomato

Hi, I have trouble with using multiple devices under one device type. https://validation.linaro.org/scheduler/device_type/beaglebone-black I want to test devices using same device type like above. To do this, I did below steps. 1) Add device type $ lava-server manage device-types s5p4418-navi-type 2) Add device type $ lava-server manage devices add --device-types s5p4418-navi-type --worker new_lava_slave s5p4418-navi-ref-qt $ lava-server manage devices add --device-types s5p4418-navi-type --worker new_lava_slave s5p4418-navi-ref-ref 3) Add device dictionary import xmlrpc.client import sys username = "admin" token = "bpexdhlth2sckkst1pslnklklw9enis92oe0ncqc4n7yby6mb5bxapof9ll503gqcqaq3rbq6o1765njgtydwozsjudur9bee8escj4zblqmbds0el0ud10qclbbn2hs" hostname = "192.168.1.20:9099" server = xmlrpc.client.ServerProxy("http://%s:%s@%s/RPC2" % (username, token, hostname), allow_none=True) # s5p4418-navi-ref-qt file = open('/home/lava/bin/nexell-device-dic/s5p4418-navi-ref-qt.jinja2','r') jinja_string = file.read() file.close() server.scheduler.import_device_dictionary("s5p4418-navi-ref-qt", jinja_string) # s5p4418-navi-ref file = open('/home/lava/bin/nexell-device-dic/s5p4418-navi-ref.jinja2','r') jinja_string = file.read() file.close() server.scheduler.import_device_dictionary("s5p4418-navi-ref", jinja_string) updated device dictionary using above python code. 4) Submit job import xmlrpc.client import sys import argparse username = "admin" token = "bpexdhlth2sckkst1pslnklklw9enis92oe0ncqc4n7yby6mb5bxapof9ll503gqcqaq3rbq6o1765njgtydwozsjudur9bee8escj4zblqmbds0el0ud10qclbbn2hs" hostname = "192.168.1.20:9099" def args_parser(): """ args parser. """ parser = argparse.ArgumentParser() parser.add_argument('job_file', help="job_file", type=argparse.FileType('r')) return parser.parse_args() def loadJob(): return args.job_file.read() def process(): print ("Submitting test job to LAVA server") global args args = args_parser() server = xmlrpc.client.ServerProxy("http://%s:%s@%s/RPC2" % (username, token, hostname), allow_none=True) yamlfile = loadJob() #print(yamlfile) server.scheduler.submit_job(yamlfile) process() $ python3 submit.py test.yaml When I submitted test job using above python code, lava server did test on device named s5p4418-navi-ref. I wanted test device named 's5p4418-navi-ref-qt'. But test is run under the device named 's5p4418-navi-ref'. This is test yaml file I submitted. device_type: s5p4418-navi-type job_name: s5p4418-navi-ref-qt-deploy-test timeouts: job: minutes: 60 action: minutes: 30 connection: minutes: 20 priority: medium visibility: public protocols: lava-lxc: name: s5p4418-test template: debian distribution: debian release: jessie arch: amd64 actions: - deploy: failure_retry: 3 namespace: s5p4418-deploy timeout: minutes: 10 to: fastboot images: nexell_ext: url: /home/lava-slave/LAVA-TEST/ dir_name: result-s5p4418-navi-ref-qt deploy_script: /home/lava/bin/nexell-lava-commands.sh deploy_command1: reboot-bootloader deploy_command2: fastboot-download #deploy_command1: test1 #deploy_command2: test2 device_path: '/dev/ttyUSB0' afterdeploy_command1: 'False' afterdeploy_command2: 'False' test: nothing os: oe - boot: namespace: s5p4418-deploy method: fastboot prompts: - 'root@s5p4418-navi-ref:~#' auto_login: login_prompt: 's5p4418-navi-ref login:' username: root parameters: shutdown-message: "reboot: Restarting system" timeout: minutes: 10 nexell_ext: command: /home/lava/bin/nexell-lava-commands.sh dir_name: result-s5p4418-navi-ref-qt command_param: boot command_param2: enter device_path: '/dev/ttyUSB0' - test: namespace: s5p4418-deploy connection: serial timeout: minutes: 3 failure_retry: 3 name: s5p4418-navi-qt-test definitions: - repository: http://git.linaro.org/lava-team/lava-functional-tests.git from: git path: lava-test-shell/smoke-tests-basic.yaml name: smoke-tests

6 years, 6 months

2
1
0 0

Project qa/test-definitions moved to Github Linaro/test-definitions

by Chase Qi

Hi, We are seeing more and more contributor from outside of Linaro on the project, then we decided to move it to https://github.com/Linaro/test-definitions to make contributing easier. https://git.linaro.org/qa/test-definitions.git is a read-only mirror of the Github repo now. New changes in the Github repo will be pushed to the mirror every 10 minutes. https://review.linaro.org/#/admin/projects/qa/test-definitions also has been set to read-only. Please use Github for pull requests: https://github.com/Linaro/test-definitions/pulls Thanks, Chase

6 years, 6 months

1
0
0 0

Linaro & LAVA at FOSDEM 2019

by Neil Williams

https://fosdem.org/2019/ (No registration is necessary for FOSDEM - if you can get to Brussels, you are welcome to just turn up on site. Avoid trying to park near the site unless you get there very early, use public transport.) https://fosdem.org/2019/practical/ https://fosdem.org/2019/stands/ If you haven't been to FOSDEM before, note that there will be a huge number of people attending FOSDEM (well over 8,000 are expected), so the only real chance of meeting up is to have a known meeting point. Hopefully, you'll have a chance to get to some of the 700 sessions which have been arranged over the 2 days. Linaro will have a stand at FOSDEM 2019 in the AW building. The stand will be on the ground floor, along the corridor from AW120 near MicroPython and PINE64. Various Linaro and LAVA people will be around the stand during the event, so this can act as a focal point for anyone wanting to talk about Linaro and or LAVA during FOSDEM. https://fosdem.org/2019/schedule/room/aw1120/ The AW building is near the car park, across from Janson and the H building. Various Linaro and LAVA people have been routinely attending FOSDEM for a number of years. If you are able to get there, we will be happy to see you. -- Neil Williams ============= neil.williams(a)linaro.org http://www.linux.codehelp.co.uk/

6 years, 6 months

1
0
0 0

change overlay deploy destination via ssh

by Krutz, Christoph

Hi lava team, I want to use lava for running shell scripts on several devices which are connected to the lava server via ssh. Is there a way to deploy the overlay to another directory than "/"? Example: The overlay is downloaded to /lava-123.tar.gz and extracted to /lava-123 by default. What if my rootfs "/" is read-only and I want to download and extract the overlay to /tmp or /data, which are read-write mounted? Best regards, Christoph

6 years, 6 months

2
1
0 0

Test Definition to specify somewhere other than GIT

by Pete Dyer

Hi, I have some test files stored in a private GIT repository. As I understand it there is no easy way to get the authentication passed through to the Test Definition, so I am looking for other options. Is there a way that I can write the Test Definition to reference the files elsewhere? I can bundle them into a tarball or zipfile and have them served via http without the need for authentication, but I cannot figure out how to describe this in the Test Definition. Thanks. Pete IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

6 years, 6 months

2
1
0 0

health check and lava-test-case result

by Andrei Narkevitch

Hello, What is the rationale to ignore individual lava-test-case results when running a health check jobs? For example, the following job failed one case: https://validation.linaro.org/scheduler/job/1902316 A failing test case can be an indication of device malfunction (e.g. out of disk space, hw issues). Is it possible to force LAVA to fail a health check and thus put device in a "bad" state if one of the test cases is not successful? Thanks, Andrei Narkevitch Cypress Semiconductors This message and any attachments may contain confidential information from Cypress or its subsidiaries. If it has been received in error, please advise the sender and immediately delete this message.

6 years, 6 months

3
2
0 0

General question on how to debug lava master scheduler.

by Larry Shen

After setup a master & a worker, I add this worker to this master, also add device to this worker. Then submit this job, the web page just show the task is in the status "Submit". The same configures seems work on my local machine(with master & worker on the same machine) I cannot find the root cause as job's device-type which the device have is idle & the health is good. For such kind of issue, any way to debug? I mean if any log I can find for why lava cannot dispatch the task with its 20 seconds poll?

6 years, 6 months

2
2
0 0

Re: [Lava-users] Change site issue

by Remi Duraffort

Hello, please keep the mailing-list in copy when replying. thanks for your help . I will discuss two questions (1. Change site > issue ; 2 secure boot mode ; ) in this reply. > 1) Change site issue : We changed both 'display name' and > 'display name' (example.com ) to IP address , and now we found that we > couldn't login in , when we pressed login in link on home page , it > reported as below: > > > *500 Internal Server Error**Site matching query does not exist.* > > *Oops, something has gone wrong! * > That's really strange. Could you update /etc/lava-server/settings.conf to set "DEBUG" to true, restart lava-server-gunicorn, refresh the page and send back the stack trace. Keep in mind that settings.conf is a json file, so check that the syntax is valid json. > 2) secure boot mode : at present , we solved it by utilizing > ''ramdisk" key words and forcing it not to decompress utee file , but we > don't know if we will utilize this key words in the future , so maybe it's > not a good solution. If we find any better solution after we reviews all > source codes ,we will submit it ! > Right. Rgds -- Rémi Duraffort

6 years, 6 months

1
0
0 0

Secure boot mode

by Chuan Su

Dear all, I'm trying booting my devices under test in a secure boot mode , which means i have to download dtb、kernel 、uTee files and set nfs serverip to mount rootfs in uboot , so there's an extra uTee file . And i checked key words in deploy steps and there was none extra for this action . And we only found 'image' 、’ramdisk‘、'dtb'、'rootfs' in job_templates.py under lava_server directionary . So what should i do to add this action in deploy？

6 years, 6 months

2
1
0 0

Change site issue

by Chuan Su

Dear all, I'm bothered with lava site changing function( at http://10.193.101.30/admin/sites/site/1/change/ , pls change IP address here to yours). When i changed default domain name to IP address and set notification email in my job as below: notify: recipients: - to: method: email email: temp(a)163.com criteria: status: complete verbosity: verbose and then i received messy code as below: Job details and log file: http://???/scheduler/job/19 So I just don't know what's wrong!

6 years, 6 months

2
1
0 0

change overlay deploy destination via ssh

by Krutz, Christoph

Hi lava team, I want to use lava for running shell scripts on several devices which are connected to the lava server via ssh. Is there a way to deploy the overlay to another directory than "/"? Example: The overlay is downloaded to /lava-123.tar.gz and extracted to /lava-123 by default. What if my rootfs "/" is read-only and I want to download and extract the overlay to /tmp or /data, which are read-write mounted? Best regards, Christoph

6 years, 6 months

1
0
0 0

Result app report test case regressions

by Axel Lebourhis

Hi all, At NXP, we would like the result app to report regressions/progressions between 2 tests. We run CTS/VTS job on Android 9, and as we add some features to the OS (DRM for example), we want to make sure we get the expected behavior. For that, we need to know exactly which test cases results are different between the two last jobs. CTS runs about 400 000-500 000 test cases, it's kind of heavy to process manually. For example, the result table could show 2 tables, one would be the same it is atm, and the other would show the list of test cases results that are different from the previous job. What do you think about this ? I think this could be useful to everyone. Should I submit an issue to https://git.lavasoftware.org/lava/lava/ ? Best regards, Axel

6 years, 6 months

3
7
0 0

Docker recommended volume mounts?

by Chris Sauer

Im deploying lava using the docker container and I am looking at the correct paths to mount as volumes as to not lose data between cluster rollovers. (and to have to backups) If I change the instance.conf to point to a database that is outside of the container that gets backed up and includes these paths in my volume mounts is that all I need to do? Or is there additional paths/files that should be included. https://master.lavasoftware.org/static/docs/v2/admin-backups.html If anyone knows anything about this thanks!

6 years, 6 months

4
11
0 0

Best way to run LTP tests on yocto

by Philippe Mazet

Hi all, I am running LTP tests through LAVA on a Yocto based system. If I run a LTP test suite (like syscalls) by directly calling runltp, LAVA will display it as a single TC, with a single PASS/FAIL verdict. I think the test definition here solves this problem: https://git.linaro.org/qa/test-definitions.git/tree/automated/linux/ltp/ltp… But couldn't find a job template that makes use of it. Could you please let me know: * If this could run on yocto? * Where to find an example job that makes use of this? Thanks and regards, Philippe

6 years, 6 months

3
4
0 0

Have health-check job to be registered before submitting test job?

by tomato

Hi, I have a problem for Installing LAVA server and dispatcher using docker images that Linaro offer. I installed both two images(server and dispatcher) on my local pc. When I submit job, submitted job is listed on Lava server. But it remain the status as 'Submitted' and not change. When i visit server {local ip address:port number}/scheduler/device/qemu01, I can see message like below. Is this mean that health-check job have to be registered before submitting test job? If then, how to do? I have looked for the way to figure out this problem, but I couldn't. Although I tried to disable health check on this device and forced to change Health as 'Good', Health status soon change like Good → Bad (Invalid device configuration). Below is what I did for installing LAVA server and dispatcher. - LAVA Server 1) Pull docker image and run. $ |docker pull lavasoftware/lava-server||:2018.11| ||$ docker run -itd --name new_lava_server --cap-add=NET_ADMIN \|| || -p 9099:80 -p 5557:5555 -p 5558:5556 -h new_lava_server \ || || lavasoftware/lava-server||||:2018.11|| ||2) Create superuser|| ||Create id as admin, pw as admin.|| |||| ||$ ||||lava-server manage createsuperuser|| ||3) Create token|| ||Create token for admin account on server web ui.|| 4) Add device type and device $ lava-server manage device-types add qemu 5) Add device dictionary $ lava-server manage devices add --device-type qemu --worker new_lava_slave qemu01 - LAVA dispatcher 1) Pull docker image and run. $ |docker pull lavasoftware/lava-dispatcher||:2018.11| |$ ||docker run -it --name new_lava_slave \| |||-||v||/boot||:||/boot||-||v||/lib/modules||:||/lib/modules||-||v||/home/lava-slave/LAVA-TEST||:||/opt/share||\| |||-||v||/dev/bus/usb||:||/dev/bus/usb||-||v||~/.||ssh||/id_rsa_lava||.pub:||/home/lava/||.||ssh||/authorized_keys||:ro -||v||/sys/fs/cgroup||:||/sys/fs/cgroup||\| |||--device=||/dev/ttyUSB0||\| |||-p 2022:22 -p 5555:5555 -p 5556:5556 \| |||-h new_lava_slave \| |||--privileged \| |||-e LAVA_SERVER_IP=||"192.168.1.44"||\| |||-e||"LOGGER_URL=tcp://192.168.1.44:5557"||\| |||-e||"MASTER_URL=tcp://192.168.1.44:5558"||\| |||-e||"DISPATCHER_HOSTNAME=--hostname=new_lava_slave"||\| |||lavasoftware||/lava-dispatcher||:2018.11| |2) Submit job file| || $ ./submityaml.py -p -k apikey.txt qemu01.yaml |Below is submityaml.py python code.| |apikey.txt file is token created on server. | || #!/usr/bin/python import argparse import os.path import sys import time import xmlrpclib SLEEP = 5 __version__ = 0.5 LAVA_SERVER_IP = "192.168.1.44" def is_valid_file(parser, arg, flag): if not os.path.exists(arg): parser.error("The file %s does not exist!" % arg) else: return open(arg, flag) # return an open file handle def setup_args_parser(): """Setup the argument parsing. :return The parsed arguments. """ description = "Submit job file" parser = argparse.ArgumentParser(version=__version__, description=description) parser.add_argument("yamlfile", help="specify target job file", metavar="FILE", type=lambda x: is_valid_file(parser, x, 'r')) parser.add_argument("-d", "--debug", action="store_true", help="Display verbose debug details") parser.add_argument("-p", "--poll", action="store_true", help="poll job status until job completes") parser.add_argument("-k", "--apikey", default="apikey.txt", help="File containing the LAVA api key") parser.add_argument("--port", default="9099", help="LAVA/Apache default port number") return parser.parse_args() def loadConfiguration(): global args args = setup_args_parser() def loadJob(server_str): """loadJob - read the JSON job file and fix it up for future submission """ return args.yamlfile.read() def submitJob(yamlfile, server): """submitJob - XMLRPC call to submit a JSON file returns jobid of the submitted job """ # When making the call to submit_job, you have to send a string jobid = server.scheduler.submit_job(yamlfile) return jobid def monitorJob(jobid, server, server_str): """monitorJob - added to poll for a job to complete """ if args.poll: sys.stdout.write("Job polling enabled\n") # wcount = number of times we loop while the job is running wcount = 0 # count = number of times we loop waiting for the job to start count = 0 f = open("job_status.txt", "w+") while True: status = server.scheduler.job_status(jobid) if status['job_status'] == 'Complete': f.write("Complete\n") break elif status['job_status'] == 'Canceled': f.write("Canceled\n") print '\nJob Canceled' exit(0) elif status['job_status'] == 'Submitted': sys.stdout.write("Job waiting to run for % 2d seconds\n" % (wcount * SLEEP)) sys.stdout.flush() wcount += 1 elif status['job_status'] == 'Running': sys.stdout.write("Job Running for % 2d seconds\n" % (count * SLEEP)) sys.stdout.flush() count += 1 else: f.write("unkonwn status\n") print "unknown status" exit(0) time.sleep(SLEEP) print '\n\nJob Completed: ' + str(count * SLEEP) + ' s (' + str(wcount * SLEEP) + ' s in queue)' def process(): print "Submitting test job to LAVA server" loadConfiguration() user = "admin" with open(args.apikey) as f: line = f.readline() apikey = line.rstrip('\n') server_str = 'http://' + LAVA_SERVER_IP + ":" + args.port xmlrpc_str = 'http://' + user + ":" + apikey + "@" + LAVA_SERVER_IP + ":" + args.port + '/RPC2/' print server_str print xmlrpc_str server = xmlrpclib.ServerProxy(xmlrpc_str) server.system.listMethods() yamlfile = loadJob(server_str) jobid = submitJob(yamlfile, server) monitorJob(jobid, server, server_str) if __name__ == '__main__': process() |The job file named qemu01.yaml is below.| || |# Your first LAVA JOB definition for an x86_64 QEMU device_type: qemu job_name: QEMU pipeline, first job timeouts: job: minutes: 15 action: minutes: 5 connection: minutes: 2 priority: medium visibility: public # context allows specific values to be overridden or included context: # tell the qemu template which architecture is being tested # the template uses that to ensure that qemu-system-x86_64 is executed. arch: amd64 metadata: # please change these fields when modifying this job for your own tests. docs-source: first-job docs-filename: qemu-pipeline-first-job.yaml # ACTION_BLOCK actions: - deploy: timeout: minutes: 5 to: tmpfs images: rootfs: image_arg: -drive format=raw,file={rootfs} url: https://images.validation.linaro.org/kvm/standard/stretch-2.img.gz compression: gz # BOOT_BLOCK - boot: timeout: minutes: 2 method: qemu media: tmpfs prompts: ["root@debian:"] auto_login: login_prompt: "login:" username: root - test: timeout: minutes: 5 definitions: - repository: http://git.linaro.org/lava-team/lava-functional-tests.git from: git path: lava-test-shell/smoke-tests-basic.yaml name: smoke-tests| | | ||

6 years, 6 months

2
1
0 0

FW: Timeouts in LAVA failing

by Patryk Mungai Ndungu

Hello, I have noticed sometimes when I run healthchecks, LAVA gets stuck when doing a http download of the kernel and ramdisk to run a healthcheck. For example in [1] there seems to be a 3 min timeout for the deploy images section, but LAVA didn’t pick this up, and was stuck there for 17 hours. After the job was cancelled and the device health was manually set to unknown again, the healthcheck succeeds (eg. job 25 on the same lava instance). I am running LAVA 2018.7. [1] https://lava.ciplatform.org/scheduler/job/20 Thanks, Patryk Renesas Electronics Europe Ltd, Dukes Meadow, Millboard Road, Bourne End, Buckinghamshire, SL8 5FH, UK. Registered in England & Wales under Registered No. 04586709.

6 years, 6 months

2
1
0 0

Dependencies between master and worker when restoring from backups

by Tim Jaacks

Hello everyone, I have written a backup script for my LAVA instance. While testing the restore process I stumbled upon issues. Are there any dependencies between the master and workers concerning backups? When the master crashes, but the worker does not, is it safe to restore the master only and keep the worker as it is? Or do I have to keep master and worker backups in sync and always restore both at the same time? Restoring my master as described in the LAVA docs generally works. The web interface is back online, all the jobs and devices are in consistent states. Restoring the worker is relatively easy, according to the docs. I installed the LAVA packages in their previous versions on a fresh (virtual) machine, restored /etc/lava-dispatcher/lava-slave and /etc/lava-coordinator/lava-coordinator.conf. The worker has status "online" in the LAVA web interface afterwards, so the communication seems to work. However, starting a multinode job does not work. The job log says: lava-dispatcher, installed at version: 2018.5.post1-2~bpo9+1 start: 0 validate Start time: 2018-12-18 12:25:14.335215+00:00 (UTC) This MultiNode test job contains top level actions, in order, of: deploy, boot, test, finalize lxc, installed at version: 1:2.0.7-2+deb9u2 validate duration: 0.01 case: validate case_id: 112 definition: lava result: pass Initialising group b6eb846d-689f-40c5-b193-8afce41883ee Connecting to LAVA Coordinator on lava-server-vm:3079 timeout=90 seconds. This comes out in a loop, until the job times out. The lava-slave logfile says: 2018-12-18 12:27:15,114 INFO master => START(12) 2018-12-18 12:27:15,117 INFO [12] Starting job [...] 2018-12-18 12:27:15,124 DEBUG [12] dispatch: 2018-12-18 12:27:15,124 DEBUG [12] env : {'overrides': {'LC_ALL': 'C.UTF-8', 'LANG': 'C', 'PATH': '/usr/local/bin:/usr/local/sbin:/bin:/usr/bin:/usr/sbin:/sbin'}, 'purge': True} 2018-12-18 12:27:15,124 DEBUG [12] env-dut : 2018-12-18 12:27:15,129 ERROR [EXIT] 'NoneType' object has no attribute 'send_start_ok' 2018-12-18 12:27:15,129 ERROR 'NoneType' object has no attribute 'send_start_ok' It is the "job = jobs.create()" call in lava-slave's handle_start() routine which fails. Obviously there is a separate database on the worker (of which I did not know until now), which fails to be filled with values. Does this database have to be backup'ed and restored? What is the purpose of this database? Is there anything I need to know about it concerning backups? Mit freundlichen Grüßen / Best regards Tim Jaacks DEVELOPMENT ENGINEER Garz & Fricke GmbH Tempowerkring 2 21079 Hamburg Direct: +49 40 791 899 - 55 Fax: +49 40 791899 - 39 tim.jaacks(a)garz-fricke.com www.garz-fricke.com WE MAKE IT YOURS! Sitz der Gesellschaft: D-21079 Hamburg Registergericht: Amtsgericht Hamburg, HRB 60514 Geschäftsführer: Matthias Fricke, Manfred Garz, Marc-Michael Braun

6 years, 6 months

2
5
0 0

Re: [Lava-users] Lava job always exits when running a long duration test case without outputs

by Neil Williams

Please make sure you include the mailing list in all replies so that others know when a problem has been fixed (and how it was fixed) On Tue, 18 Dec 2018 at 12:00, Chuan Su <lavanxp(a)126.com> wrote: > > According to your comments , we checked our setups and we found that we utilized ser2net & telnet to communicate with DUT , however , ser2net set default timeout parameter as 600 seconds . When DUT runs a long duration case (more than 600 seconds ) without any log outputting , the connection is usually dropped by ser2net , and telnet program always prints logs as 'Connection closed by foreign host ' . Anyway thanks for your help ! See https://git.linaro.org/lava/lava-lab.git/tree/shared/server-configs/ser2net… The Linaro lab in Cambridge sets all the ser2net configs to have a zero timeout. > Sincerely, > Chuan Su > > > > > > At 2018-12-18 15:59:00, "Neil Williams" <neil.williams(a)linaro.org> wrote: > >On Tue, 18 Dec 2018 at 06:16, Chuan Su <lavanxp(a)126.com> wrote: > >> > >> Dear all, > >> We are encountered with an issue that our job always exits halfway when running a long duration test case (around 20 minutes) which outputs nothing , and lava server reports an InfrastructureError error and prints as below : > >> Connection closed by foreign host.Marking unfinished test run as failed > > > >Connection closed by foreign host means that the serial connection > >failed at the DUT - this is not a problem in the LAVA test job, this > >is an infrastructure failure at your end. The foreign host (the DUT) > >closed the serial connection. There is nothing LAVA can do about that. > >The serial connection to the DUT has simply failed. > > > >If the serial connection is USB, check for logs on the worker like > >/var/log/messages and /var/log/syslog for events related to the serial > >connection. Check that the DUT didn't simply kill the serial > >connection - maybe the DUT went into some kind of suspend mode. > > > >> definition: lava > >> result: fail > >> case: 0_apache-servers1 > >> uuid: 597_1.4.2.4.1 > >> duration: 603.53 > >> lava_test_shell connection dropped.end: 3.1 lava-test-shell (duration 00:10:05) [ns_s1] > >> namespace: ns_s1 > >> extra: ... > >> definition: lava > >> level: 3.1 > >> result: fail > >> case: lava-test-shell > >> duration: 604.55 > >> lava-test-retry failed: 1 of 1 attempts. 'lava_test_shell connection dropped.'lava_test_shell connection dropped. > >> > >> And we just test it with a very simple python script as below: > >> #!/usr/bin/env python3 > >> import time > >> print('Hello,world!') > >> time.sleep(1200) > >> print("Hello,Lava!") > >> We can see 'Hello,world!' string outputs , but there's no more output of this program found on webUI! > >> We just don't know what's wrong , so we have to mail to you for help! > >> Sincerely, > >> Chuan Su > >> > >> > >> > >> > >> _______________________________________________ > >> Lava-users mailing list > >> Lava-users(a)lists.lavasoftware.org > >> https://lists.lavasoftware.org/mailman/listinfo/lava-users > > > > > > > >-- > > > >Neil Williams > >============= > >neil.williams(a)linaro.org > >http://www.linux.codehelp.co.uk/ > > > > -- Neil Williams ============= neil.williams(a)linaro.org http://www.linux.codehelp.co.uk/

6 years, 6 months

1
0
0 0

Default patterns and fixup dicts in Lava-Test Test Definition 1.0

by Neil Williams

https://git.lavasoftware.org/lava/lava/issues/179 If your Lava-Test Test Definition 1.0 YAML files explicitly use a parse: block (like: https://git.linaro.org/qa/test-definitions.git/tree/automated/linux/ltp/ltp…) then this will remain supported in Definition 1.0. If you use the monitors or interactive test actions, this does not affect you at all. If you rely on LAVA to create a TestCase based on a command in the Lava-Test Test Definition just echoing "pass" or "fail", then this is the Default Pattern and this change will directly affect those test jobs. The current Default Pattern and Fixup are lifted directly from V1 (https://git.lavasoftware.org/lava/lava/blob/master/lava_common/constants.py…): # V1 compatibility DEFAULT_V1_PATTERN = "(?P<test_case_id>.*-*)\\s+:\\s+(?P<result>(PASS|pass|FAIL|fail|SKIP|skip|UNKNOWN|unknown))" DEFAULT_V1_FIXUP = { "PASS": "pass", "FAIL": "fail", "SKIP": "skip", "UNKNOWN": "unknown", } We've recently updated the documentation to drop mention of the default pattern support for the following reasons: * It has always been problematic to encode a Python regular expression in YAML. Failures are difficult to debug and patterns are global for the entire test operation. * The move towards more portable test definitions puts the emphasis on parsing the test output locally on the DUT using a customised parser. This has further advantages: * The pattern does not have to be mangled into YAML * The pattern can be implemented by a language other than Python * The pattern can change during the operation of the test shell, e.g. a different pattern may be required for setup than for the test itself. We are now starting to plan for Lava-Test Test Definition 2.0 with an emphasis on requiring portable test scripts and removing more of the lava_test_shell Test Helper scripts. Full information on 2.0 will be available early in 2019. As a first step, the generally unhelpful Default Pattern and Default Fixup dict are likely to be removed. If you need this support, the pattern can be added to your Lava-Test Test Definition 1.0 YAML files. In the next release, it is proposed that unless an explicit pattern is specified in the Lava-Test Test Definition 1.0 YAML file, then no pattern will be implemented. Processes which echo "pass" or "fail" would be ignored and no TestCase would be created. Let us know if there are any thoughts or problems on this proposal. -- Neil Williams ============= neil.williams(a)linaro.org http://www.linux.codehelp.co.uk/

6 years, 6 months

1
0
0 0

Lava job always exits when running a long duration test case without outputs

by Chuan Su

Dear all, We are encountered with an issue that our job always exits halfway when running a long duration test case (around 20 minutes) which outputs nothing , and lava server reports an InfrastructureError error and prints as below : Connection closed by foreign host.Marking unfinished test run as failed definition: lava result: fail case: 0_apache-servers1 uuid: 597_1.4.2.4.1 duration: 603.53 lava_test_shell connection dropped.end: 3.1 lava-test-shell (duration 00:10:05) [ns_s1] namespace: ns_s1 extra: ... definition: lava level: 3.1 result: fail case: lava-test-shell duration: 604.55 lava-test-retry failed: 1 of 1 attempts. 'lava_test_shell connection dropped.'lava_test_shell connection dropped. And we just test it with a very simple python script as below: #!/usr/bin/env python3 import time print('Hello,world!') time.sleep(1200) print("Hello,Lava!") We can see 'Hello,world!' string outputs , but there's no more output of this program found on webUI! We just don't know what's wrong , so we have to mail to you for help! Sincerely, Chuan Su

6 years, 6 months

2
1
0 0

Git authentication

by Axel Lebourhis

Hi everyone, Is it possible to handle git authentication in a test job ? I need LAVA to clone a repo that can't be set to public, and obviously it won't work because of the authentication step. So is it possible to specify a password or a token ? Best regards, Axel

6 years, 7 months

5
12
0 0

LAVA webUI hangs when test case outputs long string in a line

by Chuan Su

Dear , all We found that when lava executed a script which may output a long string (more than 30000 bytes) in a line (only one line break), lava web UI always hung and there was no more lava log outputting and devices under test (short for DUT) were still powered until Lava Job time-out function triggered , however, after checked the whole log file we found that cases behind the hanging case were executed (there's new files generated) . So the problem is that when lava encountered those cases lava web UI always hangs and DUTs may not be powered off when all the cases are completed ! best wishes, Chuan Su

6 years, 7 months

2
2
0 0

LAVA webUI hangs when test case outputs long string in a line

by Chuan Su

Dear , all We found that when lava executed a script which may output a long string (more than 30000 bytes) in a line (only one line break), lava web UI always hung and there was no more lava log outputting and devices under test (short for DUT) were still powered until Lava Job time-out function triggered , however, after checked the whole log file we found that cases behind the hanging case were executed (there's new files generated) . So the problem is that when lava encountered those cases lava web UI always hangs and DUTs may not be powered off when all the cases are completed ! best wishes, Chuan Su

6 years, 7 months

1
0
0 0

Re: [Lava-users] Steps for restoring a backup

by Tim Jaacks

On Mon, 11 Dec 2018 at 11:30, Neil Williams <neil.williams at linaro.org> wrote: > On Tue, 11 Dec 2018 at 11:28, Tim Jaacks <tim.jaacks(a)garz-fricke.com> wrote: > > > > Thanks, the CLI operations are very helpful for automating the process. > > However, the docs say that all devices in "Reserved" state have to > > have their "current job" cleared. I can use "lava-server manage devices details" > > to check whether this field is actually set. There is no command to > > modify it, though. Seems like using the Python API is the only way to > > go here, right? The same applies to setting "Running" jobs to "Cancelled". > > https://git.lavasoftware.org/lava/lava/merge_requests/273 > > This should get into the upcoming 2018.12 release. Thank you very much for your quick help. The "lava-server manage jobs fail" command takes care of clearing the "current job" field of the associated device, do I understand that right? Mit freundlichen Grüßen / Best regards Tim Jaacks DEVELOPMENT ENGINEER Garz & Fricke GmbH Tempowerkring 2 21079 Hamburg Direct: +49 40 791 899 - 55 Fax: +49 40 791899 - 39 tim.jaacks(a)garz-fricke.com www.garz-fricke.com WE MAKE IT YOURS! Sitz der Gesellschaft: D-21079 Hamburg Registergericht: Amtsgericht Hamburg, HRB 60514 Geschäftsführer: Matthias Fricke, Manfred Garz, Marc-Michael Braun

6 years, 7 months

2
4
0 0

MultiNode Tradefed-runner with session retry

by Karsten Tausche

Hi folks, We at Fairphone have developed a variant of the Tradefed-runner in LAVA test-definitions that is meant to run complete Tradefed test suites on multiple devices by making use of the shards feature in Tradefed. The runner is currently in “staging” state. We still want to share now what we are using and developing to see if there are more people with interest in it. Feedback on the general approach taken would also be much appreciated. On the higher level, our setup works as follows: • Use MultiNode to allocate multiple devices for one test submission. • One “master” runs the Tradefed shell, similarly as in the existing runner. • The master connects to the workers’ DUTs via adb TCP/IP. These DUTs are transparently available to Tradefed just in the same way as USB-attached devices. • Workers ensure that their respective DUTs remain accessible to the master, especially in case of WLAN disconnects, reboots, crashes, etc. Major features of our runner: • Support for Android CTS, GTS and STS. • Test run split into “shards” in Tradefed to run tests in parallel on multiple devices. This allows for a major speedup when running large test suites. • Tradefed retry: Rerun test suites until the failure count stabilizes. • No adb root required. • Based on the original Tradefed runner, having at least parts of the common code moved to python libraries. Current limitations: • Test executions are not always stable. This needs further investigation. • Test executions produce more false positives than local test runs. This needs further investigation but is at least partially due to using adb TCP/IP instead of a local USB connection. • Android VTS not implemented (would require only minor changes) Our current changes have been pushed to the tradefed_shards_with_retry topic on Gerrit[1]. Besides the two major changes to add MultiNode adb support and then Tradefed support on top of that, a couple of smaller changes that could be useful on their own have also been pushed. We are looking forward to your feedback and to joint efforts in automating and speeding up Tradefed test executions! Best regards, Karsten for the Fairphone Software Team [1] https://review.linaro.org/q/topic:%22tradefed_shards_with_retry%22+(status:…

6 years, 7 months

2
1
0 0

Re: [Lava-users] Steps for restoring a backup

by Tim Jaacks

On Mon, 10 Dec 2018 at 20:16, Neil Williams <neil.williams at linaro.org> wrote: > Yes, there is a problem there - thanks for catching it. I think the > bulk of the page dates from the last stages of the migration when V1 > data was still around. I'll look at an update of the page tomorrow. > Step 7 is a sanity check that the install of the empty instance has > gone well, Step 9 is to ensure that the newly restored database is put > into maintenance as soon as possible to prevent any queued test jobs > from attempting to start. The critical element of Step 9 is to ensure > that the lava-master service is stopped. > > The emphasis of the section is on ensuring that the instance only > serves a "Maintenance" page, e.g. the default Debian "It works!" > apache page, to prevent access to the instance during the restore. Thanks for pointing that out, Neil. I got the point, that the Apache server has to serve a static site during the restore process. > Accessing the UI would involve having an alternative way to serve the > pages. If that can be arranged, just for admins, (e.g. by changing the > external routing to the box or redirecting DNS temporarily) then the > UI on the instance can be used with the change that the > lava-server-gunicorn service does not need to be stopped (because > access has been redirected). Other services would be stopped. However, > this would involve a fair number of apache config changes, so is best > left to those admins who have such config already on hand. > > The operations can be done from the command line and that's probably > best for these docs. > > Step 7 can be replaced by: > > lava-server manage check --deploy > > Step 9 can be replaced by looping over: > > lava-server manage devices update --health MAINTENANCE --hostname ${HOSTNAME} > > or, if there are a lot of devices: > > lava-server manage maintenance --force > > (This maintenance helper has been fixed in master - soon to be 2018.12 > - so older versions would use the first command & loop.) Thanks, the CLI operations are very helpful for automating the process. However, the docs say that all devices in "Reserved" state have to have their "current job" cleared. I can use "lava-server manage devices details" to check whether this field is actually set. There is no command to modify it, though. Seems like using the Python API is the only way to go here, right? The same applies to setting "Running" jobs to "Cancelled". > I'll look at changing the page to use CLI operations for steps 7 and > 9. Some labs can do the http redirect / routing method but the detail > of that is probably not in scope for this page in the LAVA docs. I'll > add a note that admins have that choice but leave it for those admins > to implement. Mit freundlichen Grüßen / Best regards Tim Jaacks DEVELOPMENT ENGINEER Garz & Fricke GmbH Tempowerkring 2 21079 Hamburg Direct: +49 40 791 899 - 55 Fax: +49 40 791899 - 39 tim.jaacks(a)garz-fricke.com www.garz-fricke.com WE MAKE IT YOURS! Sitz der Gesellschaft: D-21079 Hamburg Registergericht: Amtsgericht Hamburg, HRB 60514 Geschäftsführer: Matthias Fricke, Manfred Garz, Marc-Michael Braun

6 years, 7 months

2
1
0 0

Steps for restoring a backup

by Tim Jaacks

Hello everyone, I am trying to implement a backup and restore routine for our LAVA server, based on the documentation: https://validation.linaro.org/static/docs/v2/admin-backups.html#restoring-a… The creation of the backup is straight-forward. I have problems with the order of the proposed restore steps, though. Step 6 is "Stop all LAVA services". However, afterwards in step 7 it says "Make sure that this instance actually works by browsing a few (empty) instance pages." This should obviously be done before, right? The actual problem is that step 9 says "In the Django administration interface, take all devices which are not Retired into Offline". This cannot be an ordering issue, because the LAVA services actually must not be available during these modifications. How do I use the Django admin interface, while all LAVA services are stopped? Mit freundlichen Grüßen / Best regards Tim Jaacks DEVELOPMENT ENGINEER Garz & Fricke GmbH Tempowerkring 2 21079 Hamburg Direct: +49 40 791 899 - 55 Fax: +49 40 791899 - 39 tim.jaacks(a)garz-fricke.com www.garz-fricke.com WE MAKE IT YOURS! Sitz der Gesellschaft: D-21079 Hamburg Registergericht: Amtsgericht Hamburg, HRB 60514 Geschäftsführer: Matthias Fricke, Manfred Garz, Marc-Michael Braun

6 years, 7 months

2
1
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

lava-users