Hi.
I am trying to get utilization, etc. information per SLURM partition. I tried to get list of nodes per partition using pyslurm.partition().find_id("part_name")["nodes"] but I cannot feed that list to any method to get information that I want. I was hoping that I can change the show-cluster-util.py to filter based on cluster name but could not do it.
Any help will be greatly appreciated.
Thanks
Hossein
Dear Giovanni,
Many thanks for these useful tools and prompt reply to the questions. I really appreciate all your time and effort and I hope I can contribute.
I really like show-cluster-util but I want to be able to filter its result based on a partition name. e.g., show-cluster-util -p part_name. We have a few partitions (as we added new nodes we grouped them as partitions) and I want to get stat about usage of each partition.
Maybe I can modify your salljobs script to include utilization info at the end.
Thanks again
Hossein
--
You received this message because you are subscribed to the Google Groups "pyslurm" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
pyslurm+u...@googlegroups.com.
To post to this group, send email to pys...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/pyslurm/177cf671-c9bf-45d9-88b2-17d46109fde5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Wow!! That is great. Very nice. Thank you so much. I should read the code to understand how you did it : )
Two questions:
· Does Partition CPU % (Alloc + Unalloc) mean Allocated + Idle? should this and Parition CPU % Unallocatable add to 100%? For me those add up to 81%.
Total Allocated CPUs : 4586
Total Idle CPUs : 2131
Total Down CPUs : 2828
Total Unallocatable CPUs : 787
Total Eligible CPUs : 7504
Total Configured CPUs : 10332
Partition CPU % Unallocatable : 10%
Partition CPU % (Alloc + Unalloc) : 71%
· On one of partitions I receive the following assertion error:
Traceback (most recent call last):
File "show-cluster-util", line 253, in <module>
metrics = get_util(nodes)
File "show-cluster-util", line 139, in get_util
all_metrics["total_nodes_down"] == all_metrics["total_nodes_config"]
AssertionError
but sinfo -p part_name shows this:
PARTITION AVAIL TIMELIMIT NODES STATE
part_name up infinite 1 drng
part_name up infinite 6 resv
part_name up infinite 24 idle
Any idea?
Thanks again
Hossein
From:
<pys...@googlegroups.com> on behalf of Giovanni <giovann...@gmail.com>
Date: Friday, December 16, 2016 at 2:43 PM
To: pyslurm <pys...@googlegroups.com>
Subject: Re: Getting utilization, etc. info for a partition
Hi Hossein,
Checkout the "perpart" branch of slurmtools. I modified show-cluster-util to have an optional -p flag to specify a single partition option. Let me know if that is helpful and what you were looking for. I'll then merge it into master.
Giovanni
--
You received this message because you are subscribed to the Google Groups "pyslurm" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
pyslurm+u...@googlegroups.com.
To post to this group, send email to pys...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pyslurm/556655fe-cfd0-4f58-aea2-250d3b310126%40googlegroups.com.