Monitor memory usage on Supercomputer

269 views
Skip to first unread message

HUI WANG

unread,
Sep 29, 2020, 4:31:04 AM9/29/20
to basilisk-fr
Dear All,

I am doing basilisk simulation on supecomputer, but I randomly reveiced the error warning below, and the job got killed

slurmstepd: error: Step 435379.0 exceeded memory limit (128295296 > 121241600), being killed

So I was wondering how can I check or record the total usage of memory of a job through the entire running? I suppose it is possible to write a file or use some kind of commands to check how many memory has been used for a job every step, let's say for example 'the evolution of the usage of a job'? In this case, I could request the proper amount of resources from the cluster. Does the basilisk have this function?

Any help would be appreciated. Thanks

Best
Hui

Stephane Zaleski

unread,
Sep 30, 2020, 2:54:37 AM9/30/20
to basilisk-fr
Dear Hui,

  Here is a procedure that would be useful for all sorts of queries to this mailing list. Go to the basilisk website. See the little 
"Search" button on the attached image ?  Type memory in the box and click the button. The first search result is what you want. 

  Best regards

Stephane Z.





Message has been deleted

Stephane Zaleski

unread,
Sep 30, 2020, 7:18:46 AM9/30/20
to basilisk-fr
I do not know. Look at the web site page and the comments of the code in detail. Also, it is probably easy to write an awk script that will summarise the results of the various processors. Depending on the supercomputer you use, the operating system and the queuing system may also offer you memory monitoring tools.

All of this is small beer. The really important issue is that sometimes basilisk requires more memory than one would expect. So far in my group we found no simple description, explanation or fix. So if you can provide any one of these three items everybody would probably be grateful.

SZ

Stephane Popinet

unread,
Aug 10, 2021, 5:56:30 AM8/10/21
to basil...@googlegroups.com, Arthur Ghigo, Wachs, Anthony
Dear all,

I have just released a new version with a significantly improved memory
allocation scheme for tree grids. The improvement should be most
significant for high levels of refinement using a large number of
parallel processes.

The new version (obviously) passes the test suite but I would be
interested in getting (positive and/or negative) feedback on
speed/memory performance improvements/degradations for large computations.

As well as any unexpected behaviour/crashes, in particular when used in
combination with periodic boundary conditions and OpenMP.

See here for details, as usual:

http://basilisk.fr/src/?history

enjoy,

Stephane

Arthur Ghigo

unread,
Aug 31, 2021, 1:07:26 PM8/31/21
to Stephane Popinet, basil...@googlegroups.com, Wachs, Anthony
Hi Stephane and everyone,

I have just gotten around to testing the new version of Basilisk with improved memory allocation. I don't have any quantitative data yet but qualitatively it works really well!! Thank you!

I am studying the Stokes flow in the gap between a sphere and a wall, which requires both a large domain (L0/d=256) and a high maximum level of refinement (> 13) to put enough cells in the gap between the sphere and the wall. So this is in a way a worst case scenario for memory allocation.

With the previous version of Basilisk, there seemed to be a glass ceiling in 3D at a value of the maximum level of refinement of 14, after which memory consumption would grow a lot and simulations would crash. I tried adding more procs but that did not solve the problem (maybe I didn't add enough). I don't know if anyone else experienced the same thing? I think the high memory cost was not linked to the number of cells used (which was not that high) but rather the inherent memory cost of the data structure needed to support 14 levels of refinement.

With the new version of Basilisk, so far I have run all my tests on 1 node (48 procs), which gives me access to roughly 182GB of memory. I have used a maximum level of refinement from 13 to 18 for a domain L0/d=256 in 3D and so far all the simulations are running! This is a huge improvement compared to the previous version of Basilisk.

Arthur

--
You received this message because you are subscribed to the Google Groups "basilisk-fr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to basilisk-fr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/basilisk-fr/e4266fc2-5848-3eaf-f7ec-ab89cf10ea8a%40basilisk.fr.


--
Best,
Arthur Ghigo
Reply all
Reply to author
Forward
0 new messages