pabot & Run Process: Random failure and robot_stderr.out says "killed"

349 views
Skip to first unread message

CHIDAMBARANATHAN RAMACHANDRAN

unread,
Feb 13, 2023, 9:38:03 PM2/13/23
to robotframework-users
Team,

We are using "pabot" to parallelize at test case level. Below command for reference:

pabot --pabotlib --testlevelsplit --processes ${PARALLEL_PROCESSES} \
    --verbose --loglevel DEBUG:INFO \
    --outputdir "${ROBOT_REPORTS_DIR}/${ROBOT_TEST_SUITE_REPORT_DIR}" \
    $([ -n "${include_tag}" ] && echo "${include_tag}") \
    $([ -n "${exclude_tag}" ] && echo "${exclude_tag}") \
    "${ROBOT_TESTS_DIR}/${ROBOT_TEST_SUITE_TO_RUN}"



In one of the higher level keyword (that will be consumed by one of the test case), we are using "Run Process" keyword to download a file from a path and use it internally.    

Run Process    curl -k --header "X-JFrog-Art-Api:%{REPO_API_KEY}" --fail <path_to_file>/${archivefile} -o ${archivefile} && ls -lrth ${archivefile}   shell=True

Entire higher level keyword looks like the below:
   
    [Arguments]    ${username}    ${password}    ${archivefile}    ${app_id}=""    ${expected_status_code}=200
    Run Process    curl -k --header "X-JFrog-Art-Api:%{REPO_API_KEY}" --fail <path_to_file>/${archivefile} -o ${archivefile} && ls -lrth ${archivefile}   shell=True
    ${fileData}=    Get Binary File  ${archivefile}
    &{fileParts}=  Create Dictionary
    Set To Dictionary  ${fileParts}  image=${fileData}
    create app session    username=${username}    password=${password}
    POST On Session    appsession    ${app_endpoint}    files=${fileParts}    expected_status=${expected_status_code}


When this keyword is called, sometimes it fails when the "Run Process" keyword is invoked. This is a random behavior. And we do get the entire robot report in green background, except the failed case (in fact it is missing in the report).

In the "pabot_results" folder, "robot_stderr.out" for this particular test case says:
"Killed"

As we are invoking the test automation in Jenkins, Jenkins logs says the below:
[2023-02-13T13:14:55.989Z] ++ testExitCode=252
[2023-02-13T13:14:55.989Z] ++ [[ -n 252 ]]
[2023-02-13T13:14:55.989Z] ++ [[ 252 -ne 0 ]]
[2023-02-13T13:14:55.989Z] ++ echo -e 'Robot command failed with exit code 252'
[2023-02-13T13:14:55.989Z] ++ exit 252
[2023-02-13T13:14:55.989Z] Robot command failed with exit code 252


We were wondering what could be the issue (We got a clue that it is something to do with the combination of "Run Process" and "Pabot"). It would be really helpful, if someone can guide us on debugging this.

Thanks,
Nathan

CHIDAMBARANATHAN RAMACHANDRAN

unread,
Feb 14, 2023, 5:07:50 AM2/14/23
to robotframework-users
some more information:

Versions:
robotframework 4.0.3 
robotframework-pabot 2.0.0

We are running the robot test cases inside a container (we built our own container image that includes robot framework and other required libraries). This container is spinned in the Jenkins pipeline in one of the stage and all the test cases are ran inside the container.

While the "killed" is seen in  "robot_stderr.out", we inspected the jenkins machine in which the container is running and we observe the below logs:


Feb 14 10:39:51 node-10-210-174-200 kernel: [4666225.492762] robot invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
Feb 14 10:39:51 node-10-210-174-200 kernel: [4666225.492765] CPU: 2 PID: 2906382 Comm: robot Not tainted 5.4.0-135-generic #152-Ubuntu
Feb 14 10:39:51 node-10-210-174-200 kernel: [4666225.492766] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
Feb 14 10:39:51 node-10-210-174-200 kernel: [4666225.492767] Call Trace:
Feb 14 10:39:51 node-10-210-174-200 kernel: [4666225.492776]  dump_stack+0x6d/0x8b
Feb 14 10:39:51 node-10-210-174-200 kernel: [4666225.492782]  dump_header+0x4f/0x1eb
Feb 14 10:39:51 node-10-210-174-200 kernel: [4666225.492784]  oom_kill_process.cold+0xb/0x10
Feb 14 10:39:51 node-10-210-174-200 kernel: [4666225.492788]  out_of_memory+0x1cf/0x500
Feb 14 10:39:51 node-10-210-174-200 kernel: [4666225.493057] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=28d71069fb462403900e1e5d7d0d36d7159c6653a67a31d525d2993c37fd0e39,mems_allowed=0,global_oom,task_memcg=/system.slice/netdata.service,task=netdata,pid=2499611,uid=112
Feb 14 10:39:51 node-10-210-174-200 kernel: [4666225.493152] Out of memory: Killed process 2499611 (netdata) total-vm:593480kB, anon-rss:111916kB, file-rss:0kB, shmem-rss:0kB, UID:112 pgtables:564kB oom_score_adj:1000
Feb 14 10:39:55 node-10-210-174-200 kernel: [4666229.989867] robot invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0

CHIDAMBARANATHAN RAMACHANDRAN

unread,
Feb 15, 2023, 12:21:33 AM2/15/23
to robotframework-users
On further analysis we found that:

"Get Binary File" keyword that is used in the "higher level keyword" is the reason behind this failure and not the "Run Process"
    [Arguments]    ${username}    ${password}    ${archivefile}    ${app_id}=""    ${expected_status_code}=200
    Run Process    curl -k --header "X-JFrog-Art-Api:%{REPO_API_KEY}" --fail <path_to_file>/${archivefile} -o ${archivefile} && ls -lrth ${archivefile}   shell=True
    ${fileData}=    Get Binary File  ${archivefile}
    &{fileParts}=  Create Dictionary
    Set To Dictionary  ${fileParts}  image=${fileData}
    create app session    username=${username}    password=${password}
    POST On Session    appsession    ${app_endpoint}    files=${fileParts}    expected_status=${expected_status_code}

"Get Binary File" keyword consumes high memory in proportional to the size of the file passed as input to it.

File size of approximately ~270MB is used as an input. Below is the memory consumption (~12GB) shown as reference:
memory_size.jpg

This has an adverse effect when the same keyword is called in multiple test cases that are running in parellel using Pabot, which leads to "oom-kill" from kernel.

This is referred (similar issue) but it was closed: https://github.com/robotframework/robotframework/issues/3827

This is reproducible in local laptop itself by giving an input file of larger size "Get Binary File" keyword. To invoke "oom-kill", same keyword (with larger file) can be invoked in parallel.

Can there be new keyword introduced "to upload of large files without reading them into memory" like ? or Is there any keyword that does this job already?

Reply all
Reply to author
Forward
0 new messages