NHC Status Update

25 views
Skip to first unread message

Michael Jennings

unread,
Feb 1, 2023, 4:42:58 PM2/1/23
to n...@lbl.gov
Hi folks! I hope you all are having a great 2023 so far.

Those of you who've been paying attention to the NHC project on GitHub
will already know much of what I'm about to say, but for those of you
who don't, I hope this will come as great news! :-)

You'll probably remember that I was working with LANL legal
(technically the Feynman Center for Innovation, or FCI) to figure out
how to contribute, and allow my teammates to contribute, to NHC as a
public open source project. Unfortunately, after months of
back-and-forth with them and lots of waiting, I ran into a major
roadblock.

I won't bore you with all the gory details (unless you really want to
hear them). The important part is that we got it figured out! To
that end, I recently made changes to NHC's license terms (in the
LICENSE file) to add copyright and licensing details for the US
Government/DOE through Triad National Security, LLC, the M&O
contractor[1] for LANL. These terms will apply to all contributions
made to NHC by LANL personnel going forward, myself included.


(TL;DR) So thanks to the ASC Facilities, Operations, and User
Support (FOUS) program leadership in the ASC PO as well as the HPC
program manager for the ASC FOUS project, we now have both funding and
approval to continue work on NHC! :-) :-)

Yay!


So now what? Well, as you can see on GitHub, I've already been
working to push up and merge in some of my more recent work. I'm also
going through the Issues and Pull Requests for NHC (many of which have
been open for an unforgivably long time, I'm sorry to say...) and
merging in those that I can. It's likely that some of the PRs I want
to include will need to be rebased and merged in by hand, but as the
situation is ultimately my fault, I will be taking responsibility for
doing that work...because now I can!

In addition to all that, as I've mentioned previously, I have roughly
40 or so new checks that have been contributed by fellow LANL HPC
staff, one in particular (hi Graham!), that are waiting to be merged.
Additionally, our new Crossroads[2] platform (currently comprising
Tycho, Razorback, and Rocinante), along with our current institutional
supercomputer Chicoma[3] and the upcoming Nvidia Grace-based Venado,
will sport many fresh and novel hardware and software components that
will drive a whole new generation of NHC features! I expect the HPE
Cray EX platform to provide us with many exciting and exasperating
challenges in the next few years. ;-)


I also wanted to mention one new item in particular: I know that many
of you, particularly those with Cray XC-based systems, have
experienced the pain of how long it takes NHC to parse /proc/cpuinfo.
I'd like to draw your attention to https://github.com/mej/nhc/pull/121
and ask for your feedback. PR#121 implements my proposed fix for this
problem, both by completely changing the way /proc files are read and
by making the data source into a configurable variable.

I welcome any feedback you may have. And if you are able to test it
out, especially on a system that's been impacted by this before, I'd
love to hear from you!

I know the 1.4.3 release of NHC was a long time coming, and many of
you have been coming up with your own checks, modules, fixes, or other
changes these past few years. If you would like to contribute any of
your work, the "merge window" for the 1.5 release (currently slated
for the June/July timeframe, depending on velocity) is wide open! And
I promise I won't let them sit and gather dust.... :-]

More to come soon!
Michael

[1] - https://www.directives.doe.gov/terms_definitions/m-o-contract
[2] - https://www.hpcwire.com/2022/10/22/los-alamos-installs-sapphire-rapids-based-tycho-first-phase-of-crossroads/
[3] - https://www.hpcwire.com/2022/03/17/los-alamos-chicoma-supercomputer-to-host-75-new-projects/
[4] - https://www.hpcwire.com/2022/05/30/nvidias-grace-superchips-to-debut-on-venado-supercomputer/

--
Michael E. Jennings (he/him) <m...@lanl.gov> https://hpc.lanl.gov/
HPC Platform Integration Engineer - Platforms Design Team - HPC Design Group
Ultra-Scale Research Center (USRC), 4200 W Jemez #301-25 +1 (505) 606-0605
Los Alamos National Laboratory, P.O. Box 1663, Los Alamos, NM 87545-0001
Reply all
Reply to author
Forward
0 new messages