Total CPU/mesh is no longer reported in the .out fie

370 views
Skip to first unread message

Dave McGill

unread,
Dec 27, 2015, 10:10:06 AM12/27/15
to FDS and Smokeview Discussions
Good morning All,

I just noticed that "Total CPU" is no longer reported, on a mesh by mesh basis, in the Run Time Diagnostics of the .out file . I found that quite useful for balancing the load between meshes. Can we get it back?

Thanks 

Dave

Randy McDermott

unread,
Dec 27, 2015, 1:46:02 PM12/27/15
to fds...@googlegroups.com
Dave,

Look at the _cpu.csv file.
--
You received this message because you are subscribed to the Google Groups "FDS and Smokeview Discussions" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fds-smv+u...@googlegroups.com.
To post to this group, send email to fds...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/fds-smv/0e00bea2-bff6-400f-a4f3-6d2081dbb75b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Sent from my iPhone

Dave McGill

unread,
Dec 27, 2015, 1:59:15 PM12/27/15
to fds...@googlegroups.com

It's a Christmas miracle!

Thanks Randy  ☺

Dave McGill

unread,
Dec 28, 2015, 9:48:11 AM12/28/15
to FDS and Smokeview Discussions
Hi Randy,

It appears that the _cpu.csv file is only generated at the completion of a simulation.  For large simulations running many days, it is useful to look at the cpu utilization in the early days, to determine if there is a huge imbalance in the time spent on each mesh, and if so, do some rebalancing of the load. Would it be possible to add a feature to the DUMP namelist that would print out the _cpu.csv file at periodic intervals? (If so, I'll start an issue.)

Thanks

Dave

Kevin

unread,
Dec 28, 2015, 9:53:29 AM12/28/15
to FDS and Smokeview Discussions
OK, start an issue. In the past few months, we have been given some time on a massive compute cluster at Oak Ridge National Labs. I have been running FDS calcs using thousands of meshes/cores. Many of the MPI exchanges of diagnostic info were starting to really add up and slow down the calcs. For just a dozen or so meshes, it is not noticeable, but for thousands it is. So I am now being very frugal with MPI calls. With the way things are going these days, I anticipate that within ten years these kinds of calcs I'm doing now are going to be common.

As a work-around, just run your case 100 time steps or so and plot the CPU usage in the _cpu.csv file. Unless particles are involves, the CPU usage is not going to radically change in the longer run.

Dave McGill

unread,
Dec 28, 2015, 10:38:07 AM12/28/15
to FDS and Smokeview Discussions
With that many processors, at what point does the overhead slow things down to the point where there is no speed gain? Is this is a supercomputer, and not the sort of cluster we are used to dealing with?


Kevin

unread,
Dec 28, 2015, 10:52:54 AM12/28/15
to FDS and Smokeview Discussions
Take a look at Section 3.2 in the latest FDS User's Guide. That shows the results of a strong scaling test we did on our cluster here at NIST. It shows that the efficiency drops to about 50% when using about 200 cores. This is, if you divide a single case into 200 meshes and run on 200 cores, you get a speed up of about 100. We found similar results on the Oak Ridge cluster. It's a different architecture, but I think the results are dictated by FDS, not the specific type of computer. I'm trying to figure out now at what point you completely bottom out -- that is, the run time actually increases with increased number of cores. For the types of cases I am running, where the original one mesh case has about 4 million cells, you bottom out when you subdivide to meshes of about 10x10x10. But I want to find a better way of expressing this.

Sascha Gottfried

unread,
Jan 4, 2016, 4:15:06 AM1/4/16
to FDS and Smokeview Discussions
Hi Dave,

thank for for spotting these changes. Kevin, thank you for pointing out your motivation to remove these computations. I am a big fan of these values, since you can compute metrics like "parallel efficiency" mentioned in FDS User Guide. Dave is absolutely right regarding things like load balancing. As you said in a common case scenario the benefits of these values in Runtime Diagnostics should outweight the communication overhead, since users can adopt a load balancing strategy while running their case multiple times. Long-time professional users or core developers are probably the most proficient people with regard to efficient domain decomposition. People with less skills are likely to do less efficient domain decomposition with disadvantages for speed and costs. Based on these assumptions we have developed tools to assess diagnostics values to fine-tune domain decomposition and estimate simulation runtime. Since a common optimization strategy is called "optimize for the common case" this change is rather an optimization for an edge-case. Kevin, what do you mean?

As a side-note:
Omiting these diagnostics values from chid.out files will break Kristopher Overholts tool "FDS Runtime Estimator".

Happy new year and best regards
Sascha

Dave McGill

unread,
Jan 4, 2016, 6:57:59 AM1/4/16
to FDS and Smokeview Discussions
Good Morning All,

Another limitation to the new system is that when a job is stopped "gracefully" by adding a jobname.stop file, the _cpu.csv file is not written. 

Dave


Kevin

unread,
Jan 4, 2016, 9:15:24 AM1/4/16
to FDS and Smokeview Discussions
Adding the _cpu.csv file after a graceful stop is easy. I'll do that.

But the other matter is not as easy. Here's the longer story on the CPU usage output --

When we originally set up the timing arrays for MPI runs, we indexed the arrays based on mesh number because at the time, we always mapped meshes to MPI processes in a 1 to 1 manner. Later, we added flexibility to put more than one mesh on an MPI process, and then we added OpenMP and MPI compatibility. Long story short, the timing stuff had to be overhauled. The result was to create timing arrays based on MPI process. That allows us to better assess ALL jobs, not just those that map 1 to 1. It also allowed us to look for "lost" time more easily. If you look in the _cpu.csv file, you'll see the time spent by each MPI process in the various major routines, VELO, MASS, etc. But when we started running really massive MPI jobs on the Oak Ridge super computer, we found that MAIN used an increasingly large amount of time as the number of MPI processes increased into the 1000s. The CPU time spent in MAIN (the main routine that controls overall program flow) is not mesh specific. It is MPI process specific. So keeping track of time spent per mesh is imprecise. It is much better to track time per MPI process, and the side benefit is not having to exchange this information process to process. The information is dumped to the _cpu.csv file by each MPI process individually at the end of the run.

I prefer the new output format because it allows us to produce the results of scaling studies in a way that is consistent with other codes. Parsing the .out file was becoming impractical. The new output can be read into a spreadsheet program and the results plotted in different ways. Examples are shown in the User's Guide.

BTW, Kris Overholt has left the field. If someone else wants to adopt his calculators, that would be great. The Runtime Estimator can still work by using the wall clock time stamp and total run time in the .out file.

Dave McGill

unread,
Jan 4, 2016, 9:23:52 AM1/4/16
to FDS and Smokeview Discussions
Kevin,

Is writing the _cpu.csv file periodically something that can be done easily?

Dave

Cian Davis

unread,
Jan 4, 2016, 9:41:22 AM1/4/16
to fds...@googlegroups.com
At the same time as the restart file (i.e. using DT_RESTART) would seem to be the logical interval.

Regards,
Cian

--
You received this message because you are subscribed to the Google Groups "FDS and Smokeview Discussions" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fds-smv+u...@googlegroups.com.
To post to this group, send email to fds...@googlegroups.com.

Sascha Gottfried

unread,
Jan 5, 2016, 2:20:35 PM1/5/16
to FDS and Smokeview Discussions
Kevin,

after reading your answer multiple times and comparing chid.out from versions 6.3.0 and 6.3.1 I start to understand what you mean and why you went this way for good reasons. I did not read any _cpu.csv yet but I understand that this is the new source to do stuff like load balancing. I also noticed the new line in chid.out that provides information for runtime estimation computation.

Similar changes in a widely adopted output format need to be documented in an official source, like FDS User Guide or your blog. Until today I was not aware on the changes in chid.out. I checked it now - Release notes just mention the new _cpu.csv. This forum post could be a good start for any future documentation efforts on this topic. But some developer advocation would have been essential as well, since larger changes to the existing output format were applied without any concept for deprecation to ensure consumers (3rd party tools) of the existing output format can move/migrate to the new output format. This is an important technical aspect of any open-source project to attract 3rd party developers in order to create a larger software-ecosystem on top of FDS/SMV. Ok, that's my thoughts for now, looks like I need to understand the new method/output format.

BTW Happy new year!
Sascha

Jake O'Shannessy

unread,
Feb 9, 2016, 5:17:57 PM2/9/16
to FDS and Smokeview Discussions
On this topic, is the general plan to move more of the .out functionality into more structured data files like has been done in this case (i.e. cpu data)? I also rely quite heavily on parsing the entire .out file to give me the current progress of a model and the various diagnostic info it provides, but it is quite difficult to parse a moving target, especially if you have to support multiple versions. Obviously this is not an issue for the CSV files as they don't have these minor breaking changes. The .smv, and .sf files have been quite easy to parse but I guess .out files are more for human readability and don't have to be parsed by Smokeview.

Kevin

unread,
Feb 10, 2016, 8:33:02 AM2/10/16
to FDS and Smokeview Discussions
I added the ability to write the _cpu.csv file periodically (every DT_CPU s). This is in the next minor release, which we're working on now.

Kevin

unread,
Feb 10, 2016, 8:45:44 AM2/10/16
to FDS and Smokeview Discussions
I can work with you on reformatting the .out file if you'd like. The .out file has evolved over 20 years to suit the needs of the moment. For example, in order to do formal scaling studies of the MPI functionality, I needed to write the CPU times out in a way that was easier to parse. I never intended the .out file to parsed, so I might continue to restructure it as things evolve.
Reply all
Reply to author
Forward
0 new messages