Pressure Solvers

Randy McDermott

unread,

Oct 11, 2007, 8:34:46 AM10/11/07

to FDS and Smokeview Discussions, kil...@featflow.de

Hi Susanne,

Just wanted to introduce myself and tell you a little about what I
plan to work on. As I see it, there are at least three different
approaches to this problem:

1. A modification of the current FDS approach, which utilizes a direct
solve for the pressure on each mesh (here I mean that each mesh is
assigned to its own processor), and then uses some approximations for
the patch boundary conditions that yield reasonable results. In the
end, an error is committed at the patch boundary, but I believe the
error will be small enough for our purposes and that we can
characterize the order of the error.

2. On a shared memory machine, simply recode the direct solver
(FISHPACK) using Open MP. With the trend in CPUs being to increase
the number of processors on a single board, this will likely be an
optimal solution for many of our users who do not have large
distributed memory machines.

3. Solve the Poisson equation on a distributed memory machine using
MPI.

I am currently working on option #1. My understanding is that you are
currently working on option #3. This is great, because I believe in
the long run there will be different situations in which the various
methods listed above will be optimal.

My status on solving the problem using option #1 is that I have a
scheme working for incompressible flow that is stable and guarantees
mass conservation between meshes. I have designed my method in
Matlab. It remains that we implement the scheme in FDS and that we
characterize the order of the patch boundary errors.

Once this is done (which still may take some time), my plan is to go
to work on option #2. Hopefully, I will be able to find a code that
is already available. If you (or anyone) know(s) of one, please let
me know. Otherwise we will just have to modify the existing code or
more likely start from scratch (I am not a big fan of modifying other
people's codes -- by the time I figure out the code, I could have
written my own which I would understand much more thoroughly).

Once that is done, my interests will actually shift in a different
direction: adaptive mesh refinement. Here things may take a whole new
twist. I will likely pick a framework (like SAMRAI out of Lawrence,
Livermore) and begin building an AMR code from there. Of course, fast
global elliptic solvers will be an important piece of this code as
well. And I will be interested to discuss the options with you at
that point.

So, I think this illustrates that our work is really quite
complimentary and I think it is great that you are taking such an
interest in the global solver. I am sure you have much more expertise
in this area than I do.

Take care,
Randy

fds4hhpberlin

unread,

Oct 12, 2007, 3:59:44 AM10/12/07

to FDS and Smokeview Discussions

Hello Randall,

many thanks for your comprehensive messsage!

I think that the optimization of the pressure solver is a very
important topic in order to improve the overall efficiency of FDS,
especially if you think of applications with millions of unknowns and
a very fine grid resolution. Nice to have somebody to exchange and
discuss some ideas now!

As I told before in the thread on 'parallel scaling tests', message
38, it is not enough to only look at the parallel efficiency of an
algorithm. An algorithm with a parallel efficiency of only 60%-70% and
a good numerical efficiency may be superior to an algorithm with
nearly 100% and a poor numerical efficiency. The more complex the
problem is, the more you will need some kind of global data transfer,
particularly if you want to use a big number of subgrids.

If I understand it correctly, you will save the former way of solving
the poisson problem with local FFT-solvers based on crayfishpack on a
MIMD-machine. After the local solves, the global solution is achieved
by some kind of data exchange at inner boundaries. Does this kind of
postprocessing include global data transfer? As I saw there already
exists an experimental code CORRECT_PRESSURE in MAIN_MPI. Are you just
working on improving this code? What is the basic idea behind it? How
can the mass conservation be guaranteed? You told that you have
already designed a corresponding code under Matlab? Are there some
articles describing this algorithm? I would be very interested in it!!

You are right: I am working at a strategy of your type #3. It is based
on coupling the local solves by a surrounding global method of
multigrid type, which maybe uses local FFT-solves or whatever to
calculate the local solutions. Just at the moment I am implementing a
separate master solver. I have the hope that this master can also be
used in other parts of the code in order to guarantee a better
coupling of the single subgrids. But this is future work...

This kind of poisson solver would also allow for local grid
refinements! You could use different grid resolution on different
subgrids (macros), also including local adaptivity. If you use
adaptive mesh refinement, there is a big need for a very strong and
robust elliptic solver!! In the first step, I would like to begin
with some kind of 'macrowise' adaptivity. Do you have already special
adaptivity concepts in mind?

I would be very happy to discuss all these topics in detail with you
in the future!!

So, have a nice time in Italy!

Susan

Kevin

unread,

Oct 12, 2007, 8:52:46 AM10/12/07

to FDS and Smokeview Discussions

The 64 mesh "scaling_test" example is a case where the local FFT solve
alone works very well. Look at the flow vectors in Smokeview, and you
will find that it is very difficult to see inconsistencies in the
flow.

However, there are other scenarios/geometries where the current solver
does not work well, tunnels especially. In such cases, we need a
global solution. The first idea is to simply integrate the Poisson
equation over each mesh and solve this trivially small system of
linear equations to get a volume flux at each mesh boundary, and then
do another local FFT solve to ensure that the volume flux (int u dot
dS) is consistent mesh to mesh. This is VOLUME conservation. Mass
conservation could be ensured by sharing desnities and species
concentrations appropriately.

I have coded this in the serial version of FDS (avoiding MPI details
for now). It is invoked by PRESSURE_CORRECTION, but this is not a
feature we recommend. It is unstable in most situations. It only works
for a very narrow range of conditions. Randy is currently testing
ideas in MatLab and we are going to transfer these ideas to FDS when
they are ready.

Our short term goal is NOT to use adaptive gridding. I believe this
would overly complicate FDS to the point where we could not maintain
it. If we improve the robustness of the current multi-mesh scheme,
then, as shown by Christian's 64 mesh scaling test, we will have a
very powerful solver.

> > Randy- Hide quoted text -
>
> - Show quoted text -

Message has been deleted

fds4hhpberlin

unread,

Oct 15, 2007, 7:34:25 AM10/15/07

to FDS and Smokeview Discussions

Indeed, the 64-mesh-example works very well! In this case the local
FFT methods seem to be
strong enough to solve the overall problem. Nevertheless, the
underlying problem is
very easy and I would suggest to use a more complex test scenario with
fire and transversal ventilation in order to avoid to much symmetry.
Then, the progression of temperature in special measuring points could
be compared for the serial and parallel version. At the moment I am
just designing a corresponding data file which I will suggest you
briefly.

Besides, I will try to study your strategies to preserve volume and
mass conservation in the pressure solver. How do you store the data
concerning the coarse grid and where do you solve the corresponding
coarse grid problem in your pressure correction scheme? I suppose that
this is additionally done on one of the processors related to the
single submeshes, for example on submesh-processor 1? In this case,
you will probably have considerable waiting times on the other
processors?! Or is the coarse grid problem solved in a completely
distributed way over all processors, each processor only dealing with
a very small amount of data and using frequent communications?

I agree that the use of adaptive grid refinement strategies would
complicate the current development
of FDS substantially. If I understand it correctly, Randall considers
adaptive gridding as a long-term goal. But I could imagine some kind
of macro-wise grid adaptivity, where you may have very different local
grid resolutions on single submeshes, coupled by a strong global
solver.

> > - Show quoted text -- Hide quoted text -

fds4hhpberlin

unread,

Oct 18, 2007, 8:50:55 AM10/18/07

to FDS and Smokeview Discussions

Hello,

in order to examine the quality of the pressure solver, I would like
to suggest a parallel benchmark test. For this reason, I have designed
a geometry file (see the file bench64.fds) which is based on the
'scaling test'-example from the 'Parallel scaling tests'-thread. As
before, it uses the regular 4x4x4-subdivision of the unit cube, but
now with an additional fire scenario, a ventilation from the left and
an outflow on the right. I have inserted various measuring points,
nearly all positioned in the middle of the single subdomains
(intentionally not at the subdomain-borders) for a detailed
comparision of the serial and parallel version. To avoid too much
symmetry within the problem, the ventilation and the outflow are
positioned sideways.

I had the opportunity to run the parallel version for bench64.fds on
32 double AMD Opteron DP 250, 2.4 GHz-processors (which are monocore
processors!) like I did before for the original scaling_test-example.
The serial version is still running, so that I don't have comparative
data up to now. For the original scaling_test-example I got completely
consistent execution times for all subdomains. But in this case, the
local execution times differ quite a lot. See the file
bench64_execution_times.out, where I have sorted the macros
corresponding to increasing execution times.

I would be very interested in the execution times on the JUMP in
Jülich. If everybody agreed to the bench64-constellation, I would like
to ask Christian Rogsch for a test run on the JUMP in Jülich.
Christian, would this be possible? Thanks a lot in advance!

What I want to point out at this example is the following: It doesn't
suffice to only look at at the parallel efficiency of a problem. For
this more complex example it is clear from the problem that the
execution times can't be the same on all subdomains and that there are
more or less local waiting times dependent on the computational
complexity on the different macros. This fact will always deteriorate
the overall parallel efficiency. It lays in the nature of problems
coming from CFD that you don't know before how the problem will
evolve. And the center of computational load may change frequently
during the whole process, so that it is not alway possible to adjust
the grid exactly before starting the calculation. One could think of a
dynamic load balancing which is a huge topic for
itself.

Anyway, if rating a parallel algorithm, the consistency to the serial
version must be compared at first! The subdivision into single macros
always breaks up the global physical connectivity. This is expecially
the case for the poisson problem which is based on the diffusion
operator. You won't get rid of introducing several strategies of
recoupling the local problems which normally require more or less
global communication, as discussed before. To my opinion the parallel
efficiency should only be considered in a second step, if the
consistency to the serial algorithm is approved and the unavoidable
methods to guarantee the consistency are thoroughly inserted into the
code.

On Oct 15, 1:34 pm, fds4hhpberlin <s.kil...@hhpberlin.de> wrote:
> Indeed, the 64-mesh-example works very well! In this case the local
> FFT methods seem to be
> strong enough to solve the overall problem. Nevertheless, the
> underlying problem is
> very easy and I would suggest to use a more complex test scenario with
> fire and transversal ventilation in order to avoid to much symmetry.
> Then, the progression of temperature in special measuring points could
> be compared for the serial and parallel version. At the moment I am
> just designing a corresponding data file which I will suggest you
> briefly.
>
> Besides, I will try to study your strategies to preserve volume and

> mass conservation in thepressuresolver. How do you store the data

> concerning the coarse grid and where do you solve the corresponding

> coarse grid problem in yourpressurecorrection scheme? I suppose that

> > > I think that the optimization of thepressuresolveris a very

> > > separate mastersolver. I have the hope that this master can also be

> > > used in other parts of the code in order to guarantee a better
> > > coupling of the single subgrids. But this is future work...
>

> > > This kind of poissonsolverwould also allow for local grid

> > > refinements! You could use different grid resolution on different
> > > subgrids (macros), also including local adaptivity. If you use
> > > adaptive mesh refinement, there is a big need for a very strong and

> > > robust ellipticsolver!! In the first step, I would like to begin

> > > with some kind of 'macrowise' adaptivity. Do you have already special
> > > adaptivity concepts in mind?
>
> > > I would be very happy to discuss all these topics in detail with you
> > > in the future!!
>
> > > So, have a nice time in Italy!
>
> > > Susan
>
> > > On Oct 11, 2:34 pm, Randy McDermott <randy.mcderm...@gmail.com> wrote:
>
> > > > Hi Susanne,
>
> > > > Just wanted to introduce myself and tell you a little about what I
> > > > plan to work on. As I see it, there are at least three different
> > > > approaches to this problem:
>
> > > > 1. A modification of the current FDS approach, which utilizes a direct

> > > > solve for thepressureon each mesh (here I mean that each mesh is

> > > > interest in the globalsolver. I am sure you have much more expertise

Christian Rogsch

unread,

Oct 18, 2007, 9:15:23 AM10/18/07

to FDS and Smokeview Discussions

Hi Susanne,

I will test your new case on the JUMP in Jülich.
Have you also made some test with 1, 2, 4, 8, 16, 32 meshes?

I will post the results...

> > > > > characterize the- Zitierten Text ausblenden -
>
> - Zitierten Text anzeigen -...
>
> Erfahren Sie mehr »

fds4hhpberlin

unread,

Oct 18, 2007, 9:49:53 AM10/18/07

to FDS and Smokeview Discussions

Hello Christian,

many thanks for your prompt reply! No, I didn't make the tests for a
smaller number of meshes, but I am pleased to run all the different
tests. So, if the correct form of the geometry file is fixed, I will
start the tests so that we can compare all the results.

Many thanks!

Susanne

On Oct 18, 3:15 pm, Christian Rogsch <rog...@uni-wuppertal.de> wrote:
> Hi Susanne,
>
> I will test your new case on the JUMP in Jülich.
> Have you also made some test with 1, 2, 4, 8, 16, 32 meshes?
>
> I will post the results...
>
> On 18 Okt., 14:50, fds4hhpberlin <s.kil...@hhpberlin.de> wrote:
>
> > Hello,
>

> > in order to examine the quality of thepressuresolver, I would like

> ...
>
> read more »

Kevin

unread,

Oct 23, 2007, 7:40:36 AM10/23/07

to FDS and Smokeview Discussions

Sorry, Susanne, I have been on the road for several weeks.

>
> Besides, I will try to study your strategies to preserve volume and
> mass conservation in the pressure solver. How do you store the data
> concerning the coarse grid and where do you solve the corresponding
> coarse grid problem in your pressure correction scheme? I suppose that
> this is additionally done on one of the processors related to the
> single submeshes, for example on submesh-processor 1? In this case,
> you will probably have considerable waiting times on the other
> processors?! Or is the coarse grid problem solved in a completely
> distributed way over all processors, each processor only dealing with
> a very small amount of data and using frequent communications?

No, my pressure correction scheme is very crude now. I use node 0 to
solve the system of linear equations that arise when you write the PDE
using just the meshes themselves as nodes. I just use a simple GJ
algorithm to do it. Also, this only works in the serial version. The
linear equation is solved in main.f90, not main_mpi.f90. I have been
working with the serial version just to make things easier. I'll worry
about MPI issues later. First I need to make sure this works. The
finite differencing of the linear system is done in pres.f90, and the
control of the procedure is done from main.f90. But this is not well
documented yet. I need more time to go back and look at things. Randy
McD says he has a working algorithm in Matlab, but it's going to take
time to convert to FDS. This is what we plan to do in the next few
months.

>
> I agree that the use of adaptive grid refinement strategies would
> complicate the current development
> of FDS substantially. If I understand it correctly, Randall considers
> adaptive gridding as a long-term goal. But I could imagine some kind
> of macro-wise grid adaptivity, where you may have very different local
> grid resolutions on single submeshes, coupled by a strong global
> solver.
>

Yes, for now, the only adaptive gridding we might consider is in more
efficiently doing a global solve. I still do not want to abandon the
FFT based local solve.

fds4hhpberlin

unread,

Oct 23, 2007, 11:40:09 AM10/23/07

to FDS and Smokeview Discussions

Hello Kevin,

nice to hear from you again! I will try to understand the different
steps of you pressure correction scheme. When will Randall come back?
Did he already start with the conversion of the matlab code into
Fortran90?

Did you already take a look to the parallel benchmark geometry
bench64.fds up to now? Probably, it could be better to use a small
modification of the parallel benchmark geometry bench64.fds (see
bench64_v2.fds). In order to avoid the burner crossing the submesh-
boundaries, I inscribed the burner completely within a submesh
adjacent to the border of the room. Besides, I inserted some
additional vector slices. Please give me a feedback concerning the new
geometry, so that Christian and me can start with our comparision
tests.

Have a nice day, Susan

Kevin

unread,

Oct 24, 2007, 7:51:09 AM10/24/07

to FDS and Smokeview Discussions

I have been away from the office and have not had a chance to look at
bench64. The purpose of scaling_test was to test the MPI and computer
efficiency. We are currently using simple 2, 4 and 8 mesh
configurations to improve the pressure solver. I do not know if
bench64 will tell us anything new. The efficiency of the parallel
algoorithm is somewhat independent of its accuracy. Even if the
velocity field is not consistent at mesh boundaries, the calc should
run "efficiently" in the sense that all CPUs will work together and
finish the job.

fds4hhpberlin

unread,

Oct 25, 2007, 5:18:00 AM10/25/07

to FDS and Smokeview Discussions

I understand that your main focus consisted in checking the
communication structure and the correctness of the data transfer,
which was proven by scaling test. So, let's wait with the comparison
tests until your pressure correction scheme works. Then, it will be
unavoidable to test the consistency of the parallel with the serial
version! The parallel efficiency can only be rated reliably if the
numerical correctness has been proven.

Randy McDermott

unread,

Oct 31, 2007, 10:41:28 AM10/31/07

to FDS and Smokeview Discussions

Hi Susan,

Sorry. I am not as diligent at checking the discussion board as
Kevin. I am back and now working on other issues related to stability
of low speed variable density schemes.

You are correct that I envision AMR as a long-term activity.

Cheers,
Randy

fds4hhpberlin

unread,

Nov 1, 2007, 4:32:46 AM11/1/07

to FDS and Smokeview Discussions

Hello Randy,

no problem! I hope you had a nice time in Italy ...

I am just working on the introduction of an additional master process
to FDS which will be responsible for the global coupling in the
pressure solver. Up to now, this master-slave-structure is already
working and I am just implementing the corresponding coarse grid
structure. Concerning the ingoing boundary conditions it would be very
nice to discuss the details with you. So, let's keep in touch, okay?

Have a nice time,

Susan, hhpberlin

Randy McDermott

unread,

Mar 31, 2008, 7:40:54 AM3/31/08

to Susanne Kilian, Kevin McGrattan, fds...@googlegroups.com

Hi Susan,

Sorry to hear about your being ill. Hope the holiday somewhat made up for it!

I wanted to make a comment regarding any tests that you perform on the
new code. We should be careful not to judge the formulation based on
the tests alone. I am also interested in your analytical assessment
of the algorithm. Do you think it should be "embarrassingly parallel"
up to the point at which the coarse linear solver starts to slow down
(I can't see this happening until many 1000s of meshes, which we will
not get to in practice for some time). My guess is that it should
scale very well and that any poor performance we see may point to
other problems with the code (or architecture) which we are just now
in a position to diagnose.

I look forward to hearing from you.
Cheers,
Randy

On Mon, Mar 31, 2008 at 5:43 AM, Susanne Kilian <s.ki...@hhpberlin.de> wrote:
> Hello Randy,
>
> many thanks for your mail! And sorry, that my answer comes so late (I was in a short holiday and then ill for some time).
>
> Naturally, I will read your new explanations concerning the "Multiple mesh considerations" and "Domain decomposition strategy", I am very interested in it. We intend to perform some test calculations in the near future and I will report on the results.
>
> What will be the main topics of your future research?
>
> Have a nice day
>
> Susan
>
>
>
>
> _____________________________________
> Von: Randy McDermott [randy.m...@gmail.com]
> Gesendet: Mittwoch, 26. März 2008 19:36
> An: Susanne Kilian
> Cc: Kevin McGrattan
> Betreff: Re: Pressure Solvers
>
> Hi Susan,
>
> At long last I have at least partially written up the new domain
> decomposition strategy used in FDS 5.1.4. See the current FDS 5 Tech
> Guide under "Multiple Mesh Considerations (On-Going Research)" and the
> appendix "Domain Decomposition Strategy". I hope this helps answer
> your questions about how we are doing things in parallel. I would be
> appreciative of any feedback you have as I will soon be trying to put
> a paper together on this approach.
>
> It is clear that we have some scaling issues to deal with in
> practice. But my guess is that this is due to the legacy of the
> serial FDS code... in principle, the algorithm should be
> embarrassingly parallel. At the moment I think we are stopping to do
> mesh exchanges more often than will ultimately be required, but this
> may not be the biggest problem.
>
> Cheers,
> Randy

>
>
> On Nov 1 2007, 4:32 am, fds4hhpberlin <s.kil...@hhpberlin.de> wrote:
> > Hello Randy,
> >
> > no problem! I hope you had a nice time in Italy ...
> >
> > I am just working on the introduction of an additional master process

> > to FDS which will be responsible for the global coupling in thepressuresolver. Up to now, this master-slave-structure is already

> > working and I am just implementing the corresponding coarse grid
> > structure. Concerning the ingoing boundary conditions it would be very
> > nice to discuss the details with you. So, let's keep in touch, okay?
> >
> > Have a nice time,
> >
> > Susan, hhpberlin
> >
> > On Oct 31, 3:41 pm, Randy McDermott <randy.mcderm...@gmail.com> wrote:
> >
> > > Hi Susan,
> >
> > > Sorry. I am not as diligent at checking the discussion board as
> > > Kevin. I am back and now working on other issues related to stability
> > > of low speed variable density schemes.
> >
> > > You are correct that I envision AMR as a long-term activity.
> >
> > > Cheers,
> > > Randy
> >
> > > On Oct 23, 11:40 am, fds4hhpberlin <s.kil...@hhpberlin.de> wrote:
> >
> > > > Hello Kevin,
> >
> > > > nice to hear from you again! I will try to understand the different

> > > > steps of youpressurecorrection scheme. When will Randall come back?

fds4hhpberlin

unread,

Mar 31, 2008, 8:24:58 AM3/31/08

to FDS and Smokeview Discussions

Hi Randy,

no, I don't think that the algorithm has to be 'embarrassingly
parallel' at all! I even think, that this isn't possible in case of
underlying algorithmic structure. To my opinion, it is justifiable to
have moderate performance losses if you think of the possible (huge)
enlargement of treatable problems.

So, I will read your new explanations on the multiple mesh
considerations soon and hope that this will be the starting point of a
new interesting discussion!

Many greetings,

Susan

On Mar 31, 1:40 pm, "Randy McDermott" <randy.mcderm...@gmail.com>
wrote:
> Hi Susan,
>

> Sorry to hear about your being ill. Hope the holiday somewhat made up for it!
>
> I wanted to make a comment regarding any tests that you perform on the
> new code. We should be careful not to judge the formulation based on
> the tests alone. I am also interested in your analytical assessment
> of the algorithm. Do you think it should be "embarrassingly parallel"
> up to the point at which the coarse linear solver starts to slow down
> (I can't see this happening until many 1000s of meshes, which we will
> not get to in practice for some time). My guess is that it should
> scale very well and that any poor performance we see may point to
> other problems with the code (or architecture) which we are just now
> in a position to diagnose.
>
> I look forward to hearing from you.
> Cheers,
> Randy
>

> On Mon, Mar 31, 2008 at 5:43 AM, Susanne Kilian <s.kil...@hhpberlin.de> wrote:
> > Hello Randy,
>
> > many thanks for your mail! And sorry, that my answer comes so late (I was in a short holiday and then ill for some time).
>
> > Naturally, I will read your new explanations concerning the "Multiple mesh considerations" and "Domain decomposition strategy", I am very interested in it. We intend to perform some test calculations in the near future and I will report on the results.
>
> > What will be the main topics of your future research?
>
> > Have a nice day
>
> > Susan
>
> > _____________________________________

> > Von: Randy McDermott [randy.mcderm...@gmail.com]

fds4hhpberlin

unread,

Apr 1, 2008, 11:06:35 AM4/1/08

to FDS and Smokeview Discussions

Hi, Randy,

I am a little bit confused by what is exactly meant with the
definition of "embarrassingly parallel". In the new Technical
Reference Guide the new algorithm is described as embarrassingly
parallel once the boundary conditions are defined. But this holds only
true for several times within a single time step from one mesh
exchange to the next one and from one coarse grid solve to the next
one, resp. The coarse grid solve for itself is highly "not
embarrassingly parallel". The original definition of this term says
that "there is no essential dependency (or communication) between
those parallel tasks" (wikipedia). Is "embarrassingly parallel" an
attribute of single parts of an algorithm or of the whole algorithm, I
am not really sure.

The new algorithm is some kind of an additive Schwarz method
(originally they use an overlap ...) with a coarse grid correction by
which one can get rid of the troublesome dependency of the number of
subgrids. Due to my experience this methods works very fine if the
underlying grid structure is more ore less equidistant (or not too
anisotropic); especially, if the subgrid sizes don't differ so much.
Are there already experiences with subgrids of very different sizes or
different grid resolutions at adjacent subgrids in the FDS-community?
Many years ago I worked a lot with Schwarz-like methods (additive,
multiplicative, hybrid, without and with coarse grid correction) and I
often could see a strong dependency on the degree of anisotropy of the
used (coarse) grid.

Randy McDermott

unread,

Apr 7, 2008, 10:03:46 AM4/7/08

to FDS and Smokeview Discussions, Susanne, Kevin McGrattan

Hi Susan,

Good to hear from you. Sorry I had not checked the discussion group.
You are correct that the coarse grid solve is not 'embarrassingly
parallel'. I mentioned this in my post (March31) and perhaps I should
be more clear about this in the Tech Guide. But the coarse solve is M
x M, where M = number of meshes, and symmetric positive-definite. So,
I would guess that CG would be optimal. At the moment the coarse
solve is not done in parallel. As I mention in the Tech Guide, we
simply use a direct LU. For the problem sizes we are considering (we
have only run up to 256 processes) the cost of this solve is trivial.
We will be very happy if we get to the point where we are running on
several thousands of processors and this starts to slow us down.

Regarding anisotropic (i.e. pancake-shaped) grids, these should not be
used in LES (large-eddy simulation) anyway and so I would not be too
concerned with that case. However, the direct FFT solver has no
problem with anisotropy, especially since our linear equation does not
contain cross derivatives (and of course we use a Cartesian grid in
FDS). The issue of having different fine grid spacing (what you
referred to as "subgrids of different sizes") does not cause us
problems. There is a simple averaging procedure that is used. We
will deal with this issue in more detail as we start to implement
adaptive mesh refinement (AMR). The "FDS community" is just now
getting use to the new constraints that Kevin has placed on adjacent
mesh resolutions (that each fine cell may only have an integer number
of adjoining fine cells on the other mesh). You mentioned "very
different sizes" for adjacent grid spacings... this should be avoided
anyway for the sake of the accuracy of the simulation (I would not use
a refinement ratio of more than 4:1, for example).

If you have seen a similar algorithm, I would appreciate if you could
give me the reference. I have not seen anyone who has taken this
approach to Navier-Stokes hydrodynamics. Our approach is more than
just a parallel solution to the Poisson equation. I keep trying to
tell everyone that we have not developed a new "pressure solver"; we
are using the same solver as before. One of the keys to the method is
how we treat the force terms on the right hand side of the time
integration scheme and subsequently how we project the velocity field
to match the proper divergence constraint. This is all independent of
the coarse grid solve, which only enforces global volume conservation.

We are currently in the process of running scaling tests and have seen
fairly good results for weak scaling up to 128 processors.

I look forward to chatting more when you have had time to look at the
scheme in more detail.

Cheers,
Randy

On Apr 1, 11:06 am, fds4hhpberlin <s.kil...@hhpberlin.de> wrote:
> Hi, Randy,
>
> I am a little bit confused by what is exactly meant with the

> definition of "embarrassinglyparallel". In the new Technical
> Reference Guide the new algorithm is described asembarrassinglyparallelonce the boundary conditions are defined. But this holds only

> true for several times within a single time step from one mesh
> exchange to the next one and from one coarse grid solve to the next

> one, resp. The coarse grid solve for itself is highly "notembarrassinglyparallel". The original definition of this term says

> that "there is no essential dependency (or communication) between

> thoseparalleltasks" (wikipedia). Is "embarrassinglyparallel" an

> > > > your questions about how we are doing things inparallel. I would be

> > > > appreciative of any feedback you have as I will soon be trying to put
> > > > a paper together on this approach.
>
> > > > It is clear that we have some scaling issues to deal with in
> > > > practice. But my guess is that this is due to the legacy of the
> > > > serial FDS code... in principle, the algorithm should be

> > > > embarrassinglyparallel. At the moment I think we are stopping to do

> > > > > > > Did you already take a look to theparallelbenchmark geometry

> > > > > > > bench64.fds up to now? Probably, it could be better to use a small

> > > > > > > modification of theparallelbenchmark geometry bench64.fds (see

fds4hhpberlin

unread,

Apr 7, 2008, 11:29:16 AM4/7/08

to FDS and Smokeview Discussions

Hi, Randy,

thanks for your reply! I really believe that the solution of the small
coarse grid problem doesn't consume so much time. Is it right, that
you solve the coarse grid problem on processor 1 (or 0, depending on
the numbering strategy)? So, some kind of communication from all
processors to processor 1 and the other way back is necessary ?! This
could be much more time consuming for large coarse grids than the
solution itself. But anyway, it seems to be unavoidable to me. I also
have good experiences with LU as coarse grid solver.

What I meant with very different grid sizes is 'different macro grid
sizes', namely the differences between the diameters of the single
subdomains (macro level), not the difference between the local fine
grid resolitions (micro level). I agree that there shouldn't be big
differences between adjacent macros concerning the fine grid
resolutions! What I experienced with the additive Schwarz solvers is
that they were rather sensitive with respect to submeshes of different
sizes (different macro sizes). But probably the new FDS pressure
algorithm deals better with it.

Is it correct that FFT can deal with deformed grids? Here, I don't
mean adaptive refined grids but grids where the nodes are shifted
towards some specified directions, so that the grid numbering is still
'linewise'. Is there a limit to the amount of grid deforming for which
FFT still works?

When do you expect to start with the implementation of AMR? Are there
already first concepts planned?

Many greetings from Berlin

Susan

> ...
>
> read more »

Randy McDermott

unread,

Apr 7, 2008, 11:58:37 AM4/7/08

to Susanne Kilian, fds...@googlegroups.com, Kevin McGrattan

Hi Susan,

All good points. We actually perform the coarse solve redundantly on
each mesh! So, all information needs to be 'gathered' but not 'sent'
after the solve.

Regarding different coarse mesh sizes, yes, this is probably our
biggest issue and will become even more of an issue with AMR. We do
not do any real 'load balancing' at the moment. It is up to the user
to specify the meshes in a logical way so that one mesh is not
performing more work than the others. Getting around this may require
some major rethinking in how we do things.

Regarding mesh stretching, Kevin can correct me, but I think FDS
allows stretching in two of the three directions (in X and Z). There
is some history there that I do not know the details of. But
basically Kevin hacked into the FFT solver and added the stretching
factors.

The sooner I can get to AMR the better. But my guess is that I will
not get to think about it seriously until this fall. There are a few
more important issues to deal with. The other day Kevin and I played
around with embedding a mesh with another and things seemed to work,
but there is no information transferred from the fine mesh back to the
coarse mesh. So there is currently only a one-way coupling. For fun,
just try it some time... embed a fine grid inside a coarse grid and
see what happens. So, this is the beginning.

Ok, hope this answered your questions. Talk to you soon.
Randy

> ________________________________________
> Von: Randy McDermott [randy.m...@gmail.com]
> Gesendet: Montag, 7. April 2008 16:28
> An: Susanne Kilian
> Cc: fds...@googlegroups.com; Kevin McGrattan
> Betreff: Re: Embarrassingly parallel

>
> Hi Susan,
>
> Good to hear from you. Sorry I had not checked the discussion group.

> You are correct that the coarse solve is not embarrassingly parallel.
> I mentioned this in my post on March 31, but perhaps I should be more
> clear in the Tech Guide. The coarse solve is M x M, where M = number
> of meshes, and is symmetric positive-definite. So, I would guess that
> CG would be optimal. Right now we simply use a direct LU because we
> are only considering runs up to 256 processors, in which case even a
> direct coarse solve is trivial. If we ever get to point where this
> part of the algorithm is slowing us down, we will be very happy! Most
> FDS users only use O(10) processors.
>
> Regarding anisotropy of the grid... anisotropic (i.e., pancake-shaped)
> grids should be avoided in large-eddy simulation. However, the direct
> FFT solver has no problem with such grids; the Poisson equation has no
> cross derivatives and the grid is a structured, Cartesian grid with
> little or no stretching. On adjacent meshes I would not use a
> refinement ratio of more than 4:1, and these mesh interfaces should be
> positioned away from regions where accuracy is important. Once we
> implement adaptive mesh refinement, this will be done automatically.
>
> Good luck with the Linux cluster. I look forward to chatting more
> once you have had a chance to study the scheme in detail.
>
> Cheers,
> Randy
>
> On Mon, Apr 7, 2008 at 7:55 AM, Susanne Kilian <s.ki...@hhpberlin.de> wrote:
> >
> >
> >
> >
> > Hi, Randy,
> >
> >
> >
> > sorry, I have forgotten to send my last discussion-group-posting to your
> > email adress as well. So, I am not sure that you already read it and I
> > would like to attach it again subsequently.
> >
> >
> >
> > In this week I will be concerned with the maintenance of our Linux-Cluster
> > (we have expanded …), but I hope that I will find the time to read the
> > description of your new pressure solver next week.
> >
> >
> >
> > So, I am looking forward to new interesting discussions with you.
> >
> >
> >
> > Have a good time
> >
> >
> >
> > Susan
> >
> >
> >
> >
> >
> > My posting:

> >
> >
> >
> > Hi, Randy,
> >
> > I am a little bit confused by what is exactly meant with the

> > definition of "embarrassingly parallel". In the new Technical

> > Reference Guide the new algorithm is described as embarrassingly

> > parallel once the boundary conditions are defined. But this holds only

> > true for several times within a single time step from one mesh
> > exchange to the next one and from one coarse grid solve to the next
> > one, resp. The coarse grid solve for itself is highly "not

> > embarrassingly parallel". The original definition of this term says

> > that "there is no essential dependency (or communication) between

> > those parallel tasks" (wikipedia). Is "embarrassingly parallel" an

> > attribute of single parts of an algorithm or of the whole algorithm, I
> > am not really sure.
> >
> > The new algorithm is some kind of an additive Schwarz method
> > (originally they use an overlap ...) with a coarse grid correction by
> > which one can get rid of the troublesome dependency of the number of
> > subgrids. Due to my experience this methods works very fine if the
> > underlying grid structure is more ore less equidistant (or not too
> > anisotropic); especially, if the subgrid sizes don't differ so much.
> > Are there already experiences with subgrids of very different sizes or
> > different grid resolutions at adjacent subgrids in the FDS-community?
> > Many years ago I worked a lot with Schwarz-like methods (additive,
> > multiplicative, hybrid, without and with coarse grid correction) and I
> > often could see a strong dependency on the degree of anisotropy of the
> > used (coarse) grid.
> >
> >
> >
> >
> >
> >
> >
> >
> >

> > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> >
> > hhpberlin
> > Ingenieurgesellschaft für Brandschutz mbH
> >
> > Dipl.-Math.
> > Dr. Susanne Kilian
> > Wissenschaftliche Beraterin
> > Geschäftsbereich Ingenieurmethoden
> >
> > Rotherstraße 19, 10245 Berlin
> > Phone +49 30 895955-0 , Fax -100
> > s.ki...@hhpberlin.de
> > www.hhpberlin.com
> > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> >
> > Amtsgericht Berlin-Charlottenburg HRB 78 927
> > Geschäftsführerin: Dipl.-Ing. Margot Ehrlicher
> > Beirat: Prof. Dr.-Ing. Dietmar Hosser
> > Dr.-Ing. Karl-Heinz Schubert
> >
> > Diese E-Mail ist vertraulich. Wenn Sie nicht
> > der rechtmäßige Empfänger sind, dürfen Sie
> > den Inhalt weder kopieren, verbreiten noch
> > benutzen. Sollten Sie diese E-Mail versehent-
> > lich erhalten haben, senden Sie sie bitte an
> > uns zurück und löschen sie anschließend.
> > This E-Mail is confidential. If you are not the
> > intended recipient, you must not copy, disclos
> > or use its contents. If you have received it in
> > error, please inform us immediately by return
> > E-Mail and delete the document.
> >
> >
> >
> >
>

Kevin

unread,

Apr 7, 2008, 1:28:47 PM4/7/08

to FDS and Smokeview Discussions

Randy is correct about the pressure solver. We use a slightly modified
form of the CRAYFISHPAK solvers. The default solver does FFT in the y
and z directions, and cyclic reduction in x. To stretch in a direction
other than x, we just "fool" the solver by inputing the RHS and
boundary conditions in a different order. y becomes x, for example. To
stretch in two directions, we use a solver appropriate for spherical
geometries, hacked to do what we want instead.

On Apr 7, 11:58 am, "Randy McDermott" <randy.mcderm...@gmail.com>
wrote:

> > Von: Randy McDermott [randy.mcderm...@gmail.com]

> > On Mon, Apr 7, 2008 at 7:55 AM, Susanne Kilian <s.kil...@hhpberlin.de> wrote:
>
> > > Hi, Randy,
>
> > > sorry, I have forgotten to send my last discussion-group-posting to your
> > > email adress as well. So, I am not sure that you already read it and I
> > > would like to attach it again subsequently.
>
> > > In this week I will be concerned with the maintenance of our Linux-Cluster

> > > (we have expanded ...), but I hope that I will find the time to read the

> > > s.kil...@hhpberlin.de

> > > www.hhpberlin.com
> > > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
>
> > > Amtsgericht Berlin-Charlottenburg HRB 78 927
> > > Geschäftsführerin: Dipl.-Ing. Margot Ehrlicher
> > > Beirat: Prof. Dr.-Ing. Dietmar Hosser
> > > Dr.-Ing. Karl-Heinz Schubert
>
> > > Diese E-Mail ist vertraulich. Wenn Sie nicht
> > > der rechtmäßige Empfänger sind, dürfen Sie
> > > den Inhalt weder kopieren, verbreiten noch
> > > benutzen. Sollten Sie diese E-Mail versehent-
> > > lich erhalten haben, senden Sie sie bitte an
> > > uns zurück und löschen sie anschließend.
> > > This E-Mail is confidential. If you are not the
> > > intended recipient, you must not copy, disclos
> > > or use its contents. If you have received it in
> > > error, please inform us immediately by return

> > > E-Mail and delete the document.- Hide quoted text -

fds4hhpberlin

unread,

Apr 8, 2008, 4:31:33 AM4/8/08

to FDS and Smokeview Discussions

Hi,

concerning the redundant solutions of the coarse grid problem: I see
that there is an advantage when you don't have to broadcast the
results of a locally solved coarse grid problem to all other subgrids.
But on the other hand the initial gathering process is more time
consuming because each processor has to talk with each other, isn't
it? But with respect to the implementation it may indeed be easier.

I think, attaining a fine load balancing is a very intractable
problem. It not only depends on the number of unknowns within a single
subdomain but also on the iteration times of the local solvers which
depend strongly on the underlying problem and may change during the
progression of the program. Besides the (direct) local FFT methods,
are there any iterative solvers used on the single subdomains within
the other program components which may produce very different
iteration times?

Nice idea with the "strechted" FFT-solver! Do you know how sensitive
this procedure is? Do you see differences in the accuracy if the grid
is strechted? Till which amount this strechting can be done?

Many greetings

Susan

On Apr 7, 7:28 pm, Kevin <mcgra...@gmail.com> wrote:
> Randy is correct about thepressuresolver. We use a slightly modified
> form of the CRAYFISHPAKsolvers. The default solver does FFT in the y

> > > > description of your newpressuresolver next week.

Kevin

unread,

Apr 8, 2008, 8:18:03 AM4/8/08

to FDS and Smokeview Discussions

The cost of any transfer of info machine to machine in an MPI run lies
mainly in the fact that all processes have to wait for each other to
arrive at that point in the time step. On a fast network, the actual
transfer of data is a small cost. Now that I am a manager, I
appreciate how unproductive it is to force my group to all meet at a
given time and place to transfer amongst ourselves a small amount of
info. With this in mind, when we apply the PRESSURE_CORRECTION, we
know that each mesh must solve its own Poisson equation first, then we
need to exchange those results, then we need to set up the
coefficients for the coarse linear solve on each mesh, then we must
"gather" this info together to form the coarse solve matrix via
another exchange. These two exchanges happen very close to one another
in the time step, which is good because once you've stopped all the
processes, you want to exchange as much info as possible. For this
meeting analogy, if I do have a meeting and force everyone to stop
what they're doing, I want to exchange as much info as possible.

You are right about load balancing -- the calculation time is limited
by the largest mesh or the slowest processor. But this is unavoidable
with a transient calculation. As for iterative solvers, we have not
found any Poisson solvers that come even close to the speed and
accuracy of our solver. And now with the implementation of the
pressure correcting strategy, we do not view the local, fine Poisson
solve as the bottleneck. In fact, you will notice that the pressure
solving is still about 10% of the entire CPU cost. For a low speed
hydro code, this is a very good price!

We're still working on ways to optimize speed. I haven't announced it
yet, but I have set up the MPI version of FDS to allow one to assign
more than one mesh to a process. In fact, that is how the evacuation
routine works. In an evac calc, we solve a set of 2D potential flows
that provide the people with a default directional field. Originally
in an MPI calc, each of these fairly trivial calcs was assigned to its
own process. Now they are all combined. This capability will give us
some flexibility in assigning small meshes to the same process. In
addition, we want to see if we can, via something like OpenMP or
automatic compiler optimization, speed up the individual processes by
exploiting duo/quad core architectures.

As for stretched grids -- the accuracy is the same whether we stretch
or not. That is, we solve the linear system of equations arising from
the discretization of the Poisson equation to machine accuracy.
However, the "accuracy" of the actual fluid flow calc is only as good
as the longest dimension of the stretched grid cell, a point that many
people have chosen to ignore. In fact, the whole reason for going
multi-mesh was to lessen the need for stretched grids. I use this
feature less and less.

> > > > > _ _ _ _- Hide quoted text -
>
> - Show quoted text -...
>
> read more »

fds4hhpberlin

unread,

Apr 8, 2008, 9:02:30 AM4/8/08

to FDS and Smokeview Discussions

Hi Kevin,

many thanks for your detailed answer.

Your description of the information flow in your working group is a
really nice example of a parallel application! Yes, if the
synchronization of the single processes has already been done in the
first exchange step, then the synchronization for the directly
following 2. exchange is nearly given automatically.

With my question concerning 'iterative solvers' I meant the other
parts of the program (not the direct pressure solver) such like the
combustion model or the radiation transport. Do you use iterative
solvers there which may produce very different iteration times on the
single submeshes? This could be another problem for a good load
balancing. But I agree: some inequalities within the load balancing
are unavoidable and can only be optimized.

It's a good idea to allow the assignment of more than 1 meshes to one
process. Will it be the same logic than assigning more than one
processes to on processor? Do you want to use some kind of analogons
of the serial FDS-routine for small collections of submeshes (combined
by a surrounding loop)?

On Apr 8, 2:18 pm, Kevin <mcgra...@gmail.com> wrote:
> The cost of any transfer of info machine to machine in an MPI run lies
> mainly in the fact that all processes have to wait for each other to
> arrive at that point in the time step. On a fast network, the actual
> transfer of data is a small cost. Now that I am a manager, I
> appreciate how unproductive it is to force my group to all meet at a
> given time and place to transfer amongst ourselves a small amount of
> info. With this in mind, when we apply the PRESSURE_CORRECTION, we
> know that each mesh must solve its own Poisson equation first, then we
> need to exchange those results, then we need to set up the
> coefficients for the coarse linear solve on each mesh, then we must
> "gather" this info together to form the coarse solve matrix via
> another exchange. These two exchanges happen very close to one another
> in the time step, which is good because once you've stopped all the
> processes, you want to exchange as much info as possible. For this
> meeting analogy, if I do have a meeting and force everyone to stop
> what they're doing, I want to exchange as much info as possible.
>
> You are right about load balancing -- the calculation time is limited
> by the largest mesh or the slowest processor. But this is unavoidable

> with a transient calculation. As for iterativesolvers, we have not
> found any Poissonsolversthat come even close to the speed and
> accuracy of our solver. And now with the implementation of thepressurecorrecting strategy, we do not view the local, fine Poisson

> > are there any iterativesolversused on the single subdomains within

> ...
>
> read more »

Kevin

unread,

Apr 8, 2008, 9:51:14 AM4/8/08

to FDS and Smokeview Discussions

FDS is explicit. We do not iterate the other parts of the program. We
just do a second order accurate predictor-corrector. Randy calls it a
projection scheme. I call it 2nd order Runge-Kutta. So, theoretically,
each mesh does the same amount of work if the number of cells is the
same.

When we assign multiple meshes to one process in an MPI calc, we use
the exact same data exchange mesh to mesh as we do for the serial
version of FDS running multiple meshes. That is, you can use the MPI
version of FDS, ask for only one process (mpiexec -n 1 ....), and it
will behave exactly like the serial version. If it were not for the
pain involved in installing the MPI libraries, we could get rid of
main.f90 and only compile main_mpi.f90. But then everyone would have
to install MPI, and many FDS users do not want to do this because it
is a bother.

To run the MPI version of FDS serially, add MPI_PROCESS=0 to each MESH
line. MPI numbers processes 0,1,2,..., and you are telling FDS to
assign all the meshes to process 0. Then involve MPI with -n or -np 1,
meaning that you want 1 process to be launched. I just tried the
hallways.fds test case and it worked.

> > > > > > An: Susanne Kilian- Hide quoted text -

fds4hhpberlin

unread,

Apr 9, 2008, 3:44:58 AM4/9/08

to FDS and Smokeview Discussions

Thanks, this was the information I needed. I knew about the explicit
predictor-corrector scheme, but I wasn't sure that really all other
parts of the program are explicit as well. So, this makes the load
balancing a bit easier.

I will check the option with MPI_PROCESS=0, good mechanism! So, I
suppose that I can mix this option? For example: when I have 4 meshes
and want to assign 2 meshes to 1 process at a time, do I have to set
MPI_PROCESS=0 for mesh 1 and 2, and MPI_PROCESS=1 for mesh 3 and 4 and
then start the whole programm with '... -np 2' ??

> > With my question concerning 'iterativesolvers' I meant the other
> > parts of the program (not the directpressuresolver) such like the

> > combustion model or the radiation transport. Do you use iterative

> >solversthere which may produce very different iteration times on the

> ...
>
> read more »

Kevin

unread,

Apr 9, 2008, 8:17:24 AM4/9/08

to FDS and Smokeview Discussions

One piece of information I forgot to tell you. The PROCESS-MESH
association must be such that the process number must either stay the
same or increase as the mesh number increases, and it must start at 0.
For example

Mesh 1 on Process 0
Mesh 2 on Process 0
Mesh 3 on Process 1
Mesh 4 on Process 2

is OK, and you would use '... -np 3'. However,

Mesh 1 on Process 0
Mesh 2 on Process 1
Mesh 3 on Process 0
Mesh 4 on Process 2

is not OK. The reason is that I had to modify my MPI gathers and
reductions to account for the non-uniform "counting" and
"displacements". If you are familiar with MPI, I have to use
MPI_GATHERV instead of MPI_GATHER, for example. The V stands for
Vector, meaning I now have to send multiple array elements instead of
single ones because now a single process might need to accept
information for multiple meshes. In any case, to make the book-keeping
easier, I adopted the convention above. I suppose it could be
generalized, but I don't think it is really necessary, at least not
now.

By the way, it is somewhat confusing to number meshes 1,2,3,... and
processes 0,1,2,..., but I am a Fortran programmer and MPI is mainly
written for C code, with alternative calls for us old-timers. If MPI
sends out an error, it tells you (sometimes) the process number, which
should not be confused with the mesh number.

> > > > > > > coarse mesh. So there is currently only a one-way coupling. For fun,- Hide quoted text -

Reply all

Reply to author

Forward