[slurm-users] slumrestd 24.05.1: crashes when GET on /slurm/v0.0.41/nodes : unsorted double linked list corrupted

44 views
Skip to first unread message

Josef Dvořáček via slurm-users

unread,
Jul 24, 2024, 8:55:42 AM7/24/24
to slurm...@schedmd.com
Isn't this failure familiar to anyone?

When I ask API endpoint "localhost:6820/slurm/v0.0.41/jobs", slurmrestd segrafults with unsorted double linked list corrupted.

Anyone using this API endpoint without segfaulting?

I do the get using curl:

curl --header X-SLURM-USER-NAME:root --header X-SLURM-USER-TOKEN:$SLURM_JWT -G localhost:6820/slurm/v0.0.41/jobs


In comparison,

curl --header X-SLURM-USER-NAME:root --header X-SLURM-USER-TOKEN:$SLURM_JWT -G localhost:6820/slurm/v0.0.41/nodes

Works well.


josef




čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug:  _on_url: [[localhost]:52909] url path: /slurm/v0.0.41/jobs query: (null)
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: operations_router: [[localhost]:52909] GET /slurm/v0.0.41/jobs
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: slurmrestd: operations_router: [[localhost]:52909] GET /slurm/v0.0.41/jobs
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: slurmrestd: rest_auth/jwt: slurm_rest_auth_p_authenticate: [[localhost]:52909] attempting user_name root token authentication pass through
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: rest_auth/jwt: slurm_rest_auth_p_authenticate: [[localhost]:52909] attempting user_name root token authentication pass through
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug5: _match_path_from_data: skip non-matching subdirectories: registered=1 requested=3
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug5: _match_path_from_data: match failed for ["openapi.json"](0, GET) to ["slurm","v0.0.41","jobs"](0x7F9C64001CB0)
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug5: _match_path_from_data: skip non-matching subdirectories: registered=1 requested=3
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug5: _match_path_from_data: match failed for ["openapi.yaml"](1, GET) to ["slurm","v0.0.41","jobs"](0x7F9C64001CB0)
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug5: _match_path_from_data: skip non-matching subdirectories: registered=1 requested=3
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug5: _match_path_from_data: match failed for ["openapi"](2, GET) to ["slurm","v0.0.41","jobs"](0x7F9C64001CB0)
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug5: _match_path_from_data: skip non-matching subdirectories: registered=2 requested=3
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug5: _match_path_from_data: match failed for ["openapi","v3"](3, GET) to ["slurm","v0.0.41","jobs"](0x7F9C64001CB0)
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug5: _match_path: string attempt match slurm to slurm: SUCCESS
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug5: _match_path: string attempt match v0.0.41 to v0.0.41: SUCCESS
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug5: _match_path: string attempt match shares to jobs: FAILURE
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug5: _match_path_from_data: match failed shares
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug5: _match_path_from_data: match failed for ["slurm","v0.0.41","shares"](4, GET) to ["slurm","v0.0.41","jobs"](0x7F9C64001CB0)
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug5: _match_path: string attempt match slurm to slurm: SUCCESS
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug5: _match_path: string attempt match v0.0.41 to v0.0.41: SUCCESS
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug5: _match_path: string attempt match reconfigure to jobs: FAILURE
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug5: _match_path_from_data: match failed reconfigure
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug5: _match_path_from_data: match failed for ["slurm","v0.0.41","reconfigure"](5, GET) to ["slurm","v0.0.41","jobs"](0x7F9C64001CB0)
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug5: _match_path: string attempt match slurm to slurm: SUCCESS
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug5: _match_path: string attempt match v0.0.41 to v0.0.41: SUCCESS
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug5: _match_path: string attempt match diag to jobs: FAILURE
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug5: _match_path_from_data: match failed diag
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug5: _match_path_from_data: match failed for ["slurm","v0.0.41","diag"](6, GET) to ["slurm","v0.0.41","jobs"](0x7F9C64001CB0)
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug5: _match_path: string attempt match slurm to slurm: SUCCESS
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug5: _match_path: string attempt match v0.0.41 to v0.0.41: SUCCESS
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug5: _match_path: string attempt match ping to jobs: FAILURE
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug5: _match_path_from_data: match failed ping
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug5: _match_path_from_data: match failed for ["slurm","v0.0.41","ping"](7, GET) to ["slurm","v0.0.41","jobs"](0x7F9C64001CB0)
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug5: _match_path: string attempt match slurm to slurm: SUCCESS
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug5: _match_path: string attempt match v0.0.41 to v0.0.41: SUCCESS
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug5: _match_path: string attempt match licenses to jobs: FAILURE
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug5: _match_path_from_data: match failed licenses
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug5: _match_path_from_data: match failed for ["slurm","v0.0.41","licenses"](8, GET) to ["slurm","v0.0.41","jobs"](0x7F9C64001CB0)
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug5: _match_path_from_data: method skip for ["slurm","v0.0.41","job","submit"](9, GET != POST) to ["slurm","v0.0.41","jobs"](0x7F9C64001CB0)
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug5: _match_path_from_data: match failed for ["slurm","v0.0.41","job","submit"](9, GET) to ["slurm","v0.0.41","jobs"](0x7F9C64001CB0)
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug5: _match_path_from_data: method skip for ["slurm","v0.0.41","job","allocate"](10, GET != POST) to ["slurm","v0.0.41","jobs"](0x7F9C64001CB0)
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug5: _match_path_from_data: match failed for ["slurm","v0.0.41","job","allocate"](10, GET) to ["slurm","v0.0.41","jobs"](0x7F9C64001CB0)
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug5: _match_path: string attempt match slurm to slurm: SUCCESS
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug5: _match_path: string attempt match v0.0.41 to v0.0.41: SUCCESS
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug5: _match_path: string attempt match jobs to jobs: SUCCESS
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug5: _match_path_from_data: match successful for ["slurm","v0.0.41","jobs"](11, GET) to ["slurm","v0.0.41","jobs"](0x7F9C64001CB0)
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug5: operations_router: [[localhost]:52909] found callback handler: (0x0) callback_tag=0 path=/slurm/v0.0.41/jobs parser=data_parser/v0.0.41
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug4: _resolve_mime: [[localhost]:52909] did not provide a known content type header. Assuming URL encoded.
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug5: _parse_http_accept_entry: found */* with q=1.000000
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug4: _resolve_mime: [[localhost]:52909] accepts */* with q=1.000000
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug4: _resolve_mime: [[localhost]:52909] found accepts */*=application/json with q=1.000000
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug3: _resolve_mime: [[localhost]:52909] mime read: application/x-www-form-urlencoded write: application/json
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug3: _call_handler: [[localhost]:52909] BEGIN: calling ctxt handler: 0x7F9C9D294A36[0] for path: /slurm/v0.0.41/jobs
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug:  wrap_openapi_ctxt_callback: [[localhost]:52909] GET using data_parser/v0.0.41
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug4: xsignal: Swap signal PIPE[13] to 0x1 from 0x408376
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug4: xsignal: Swap signal PIPE[13] to 0x408376 from 0x1
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug:  accounting_storage/slurmdbd: _connect_dbd_conn: Sent PersistInit msg
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug4: xsignal: Swap signal PIPE[13] to 0x1 from 0x408376
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: debug4: xsignal: Swap signal PIPE[13] to 0x408376 from 0x1
čec 24 14:37:55 slurmserver2.koios.lan slurmrestd[1502900]: malloc(): unsorted double linked list corrupted
čec 24 14:37:55 slurmserver2.koios.lan systemd[1]: Started Process Core Dump (PID 1502951/UID 0).

Josef Dvořáček via slurm-users

unread,
Jul 24, 2024, 9:24:16 AM7/24/24
to slurm...@schedmd.com
Ok, answering myself..
It seems that endpoint /slurm/v0.0.39/jobs works well.
Not sure why, but I'm ok to live with that, so perhaps it will help to someone too.
cheers
josef

(this time via socket)

WORKS OK:

# curl -si --header X-SLURM-USER-NAME:root --header X-SLURM-USER-TOKEN:$SLURM_JWT --unix-socket /var/spool/slurm/restd/rest http://localhost:8080/slurm/v0.0.39/jobs | head
HTTP/1.1 200 OK
Content-Length: 579361
Content-Type: application/json

{
  "meta": {
    "plugin": {
      "type": "openapi\/v0.0.39",
      "name": "Slurm OpenAPI v0.0.39",
      "data_parser": "v0.0.39"
#

CRASHES slurmrestd (and no response):

# curl -si --header X-SLURM-USER-NAME:root --header X-SLURM-USER-TOKEN:$SLURM_JWT --unix-socket /var/spool/slurm/restd/rest http://localhost:8080/slurm/v0.0.41/jobs | head
#





Daniel Letai via slurm-users

unread,
Jul 24, 2024, 1:31:08 PM7/24/24
to slurm...@lists.schedmd.com

This is a know issue and resolved in 24.05.2 in the patches labeled "Always allocate pointers despite skipping parsing"

For example:

https://github.com/SchedMD/slurm/commit/5b07b6bda407431215606b93e57d0a9b7f4c9b53


The same patch also applies to 0.0.40 and 0.0.42

-- 
Regards,

Daniel Letai
+972 (0)505 870 456

Josef Dvořáček via slurm-users

unread,
Aug 1, 2024, 8:24:42 AM8/1/24
to slurm...@lists.schedmd.com
I can confirm that after update to recently released 24.05.2 the API endpoint 

GET /slurm/v0.0.41/jobs

works now well.

cheers

josef

From: Daniel Letai via slurm-users <slurm...@lists.schedmd.com>
Sent: Wednesday, 24 July 2024 19:29
To: slurm...@lists.schedmd.com <slurm...@lists.schedmd.com>
Subject: [slurm-users] Re: slumrestd 24.05.1: crashes when GET on /slurm/v0.0.41/nodes : unsorted double linked list corrupted
 
Reply all
Reply to author
Forward
0 new messages