MPI & output to file

1,087 views
Skip to first unread message

Antoon van Hooft

unread,
Jul 7, 2016, 1:17:30 AM7/7/16
to basilisk-fr
is i Dear all,

I am currently using MPI parralelism to run Basilisk on the dutch national supercomputer. This works fine apart from one of the output that i try to write to a file.
I use the following event to write some diagnostics into a file: .
char namm[50];
event timeseries(t+=3)
{
  sprintf
(namm,"./data/timeseries3dkleingrof.dat");
     
 
if (t==0)
       FILE
* fpp = fopen (namm, "w");
       fprintf
(fpp,"t\t\ttwall\t\tspeed\t\tdt\t\ti\t\t#n\t\te\t\thst\t\tbufx\t\tdissipation\t\tkoldelrat\n");
      fclose
(fpp);
   
}
  FILE
*  fpp = fopen (namm, "a");
 
for (int ll = 0 ;ll<1;ll++)
   
{
      fprintf
(fpp,"%g\t\t%g\t\t%g\t\t%g\t\t%d\t\t%d\t\t%g\t\t%g\t%g\t\t%g\t\t%g\n",t,perf.t,perf.speed,dt,i,n(),energy(),hst(),bf(),diss(),kolrat());
      fclose
(fpp);
   
}
}

However when i use mpicc and mpirun i get "np" times the lines with data. (displaying the same information). On the super computer i use the method suggested in the pulsed jet atomisation (portable file, mpicc and srun) then i do not get the first line with text, and the output data itself becomes a mess. It apears every process is running this event by itself. Another output event like:

event logfile(t+=1; t<=temp)
{
 
if (i ==0)
   fprintf
(ferr,
         
"t\tdt\tmgp.i\tmgpf.i\tmgu.i\tgrid\tperf.t\tperf.speed\n");
  fprintf
(ferr, "%g\t%g\t%d\t%d\t%d\t%ld\t%g\t%g\n",
       t
, dt, mgp.i, mgpf.i, mgu.i,
       grid
->tn, perf.t, perf.speed);
 
}

does work correctly.

What am i doing wrong?

thanks in advance

Antoon

Stephane Popinet

unread,
Jul 7, 2016, 4:00:31 AM7/7/16
to basil...@googlegroups.com
Hi Antoon,

> It apears every process is running this event by itself.

Yes, this is indeed what happens. A simple fix is to do:

event timeseries(t+=3)
{
static FILE * fp = fopen ("./data/timeseries3dkleingrof.dat", "w");
if (t == 0)
fprintf (fp, "t\t\ttwall\t\tspeed\t\tdt\t\ti\t\t#n\t\te\t\thst"
"\t\tbufx\t\tdissipation\t\tkoldelrat\n");
fprintf (fp, "%g\t\t%g\t\t%g\t\t%g\t\t%d\t\t%d"
"\t\t%g\t\t%g\t%g\t\t%g\t\t%\n",
t,perf.t,perf.speed,dt,i,n(),
energy(),hst(),bf(),diss(),kolrat());
}

the important part is "static" which ensures that the file remains open
between calls (and also that the file is only written into by a single
process). Note also that you should use a better text editor which knows
how to indent C properly (e.g. emacs).

Note also that using a subdirectory (like "data/" here) is not
recommended since it makes your code not portable: to run it, you will
need to first "manually" make sure that a "data/" directory exists,
otherwise your code will crash. fopen() does not automatically create
the directory hierarchy for you.

> Another output event like:
> |
> eventlogfile(t+=1;t<=temp)
> {
> if(i ==0)
> fprintf (ferr,
> "t\tdt\tmgp.i\tmgpf.i\tmgu.i\tgrid\tperf.t\tperf.speed\n");
> fprintf (ferr,"%g\t%g\t%d\t%d\t%d\t%ld\t%g\t%g\n",
> t,dt,mgp.i,mgpf.i,mgu.i,
> grid->tn,perf.t,perf.speed);
>
> }
>
> |
> does work correctly.

Yes, that's because "ferr" (and "fout") are defined to be standard error
(resp. output i.e. "stderr" and "stdout") only on a single process.

cheers

Stephane

Antoon van Hooft

unread,
Jul 7, 2016, 8:19:59 AM7/7/16
to basilisk-fr, pop...@basilisk.fr
Dear Stephane, Thank you for your comment, it works now, Although i can only access the output when the simulation has ended.

I have not been able to solve a similar problem in the following bit of code, This routine exports a slice of scalar data into a file. (The incorrect indentation is a result of copy pasting from emacs to the browser)

  sprintf(names,"./data/slicegrTt=%g.dat",t);
 
static FILE * fpgrT = fopen (names, "w");
  sprintf
(names,"./data/sliceTt=%g.dat",t);
 
static FILE * fpT = fopen (names, "w");
  scalar grT
[];
 
double zp = Z0+L0/2;
 
 
for (double yp = Y0+(L0/pow(2,MAXLEVEL+1)) ; yp<3 ; yp+=L0/(pow(2,MAXLEVEL)))
   
{
     
for (double xp = X0+(L0/pow(2,MAXLEVEL+1)) ; xp<L0 ; xp+=(L0/pow(2,MAXLEVEL)))
   
{
     
     
Point point = locate (xp,yp,zp);
      fprintf
(fpgrT,"%g\t",grT[]);
      fprintf
(fpT,"%g\t",T[]);
   
}
      fprintf
(fpgrT,"\n");
      fprintf
(fpT,"\n");
         
   
}
  fclose
(fpgrT);
  fclose
(fpT);

But i get an error when the inner loop variable  (xp) exceeds a certain value: xp > L0/2. Furthermore, the code appears to crash on the "fprintf" command.
I get this error massage:

[antoon-XPS-15-9550:10482] *** Process received signal ***
[antoon-XPS-15-9550:10482] Signal: Segmentation fault (11)
[antoon-XPS-15-9550:10482] Signal code: Address not mapped (1)
[antoon-XPS-15-9550:10482] Failing at address: (nil)
[antoon-XPS-15-9550:10482] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x113d0)[0x7f45f64c93d0]
[antoon-XPS-15-9550:10482] [ 1] ./mpitest[0x4e0b7f]
[antoon-XPS-15-9550:10482] [ 2] ./mpitest[0x419833]
[antoon-XPS-15-9550:10482] [ 3] ./mpitest[0x419ba4]
[antoon-XPS-15-9550:10482] [ 4] ./mpitest[0x4836fa]
[antoon-XPS-15-9550:10482] [ 5] ./mpitest[0x4d4798]
[antoon-XPS-15-9550:10482] [ 6] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f45f610f83


Switching back to openMP parralelism the "static" file pointers produce errors. This might be due to the face that i am re-define-ing them for diffent output files at different times.

what is the solution to this problem?

Stephane Popinet

unread,
Jul 7, 2016, 10:29:19 AM7/7/16
to basilisk-fr
> I have not been able to solve a similar problem in the following bit of
> code, This routine exports a slice of scalar data into a file.

I believe this is because you do not understand what "static" means. In
the first code I send you, the file is opened only once (when the event
is first executed i.e. at i = 0), the value of the corresponding file
pointer (fp) is kept between calls to the event (this is what "static"
means i.e. it means "keep this in static memory, don't keep this in the
"stack" or "dynamic memory" where it would be allocated when entering
the function and deallocated when leaving the function"). Note that this
is standard C, not Basilisk stuff.

Now in the code below, you see that you declare fpgrT and fpT also as
"static", however they both depend on time "t" and so _must_ be opened
every time the event is called (not just once at the beginning): so they
clearly are not "static".

What happens next? you close both files at the end of the event. The
next time the event is called, it will attempt to use the same file
pointer (remember that because they are static their value is kept
between calls), however this file is closed, hence the "segmentation
fault" error.

So "static" clearly does not do what you need for this type of output.

The problem you have here is that different processes hold the different
bits of data you need for "locate()" to work. i.e. "locate()" only
operates on the local processes not on all processes in parallel. So if
(xp,yp,zp) belongs to the subdomain allocated to processor, say, 2, all
the other processes will not be able to locate it and point.level will
bet set to -1 on these processes, the values you want to access (i.e.
T[] and grT[]) will thus be undefined.

Additionally, all the processes will try to write their data (undefined
or not) in the same file, more or less simulatenously, which will result
in a big mess. Also, you have no control in which order the processes
will write their data... Welcome to the world of asynchronous parallel
programming!

A simpler way of doing this is for example:

1- only output when point.level is not -1 (after locate).
2- output in separate files for each process (use sprintf and pid() for
the number of the process)
3- concatenate files (using a script after running the thing)

> sprintf(names,"./data/slicegrTt=%g.dat",t);
> staticFILE *fpgrT =fopen (names,"w");
> sprintf(names,"./data/sliceTt=%g.dat",t);
> staticFILE *fpT =fopen (names,"w");
> scalar grT[];
> doublezp =Z0+L0/2;
>
> for(doubleyp =Y0+(L0/pow(2,MAXLEVEL+1));yp<3;yp+=L0/(pow(2,MAXLEVEL)))
> {
> for(doublexp =X0+(L0/pow(2,MAXLEVEL+1));xp<L0
> ;xp+=(L0/pow(2,MAXLEVEL)))
> {
>
> Pointpoint =locate (xp,yp,zp);
> fprintf(fpgrT,"%g\t",grT[]);
> fprintf(fpT,"%g\t",T[]);
> }
> fprintf(fpgrT,"\n");
> fprintf(fpT,"\n");
>
> }
> fclose(fpgrT);
> fclose(fpT);
> |
>
> But i get an error when the inner loop variable (xp) exceeds a certain
> value: xp > L0/2. Furthermore, the code appears to crash on the
> "fprintf" command.
> I get this error massage:
>
> |
> [antoon-XPS-15-9550:10482]***Processreceived signal ***
> [antoon-XPS-15-9550:10482]Signal:Segmentationfault (11)
> [antoon-XPS-15-9550:10482]Signalcode:Addressnotmapped (1)
> [antoon-XPS-15-9550:10482]Failingat address:(nil)

Antoon van Hooft

unread,
Jul 11, 2016, 6:51:54 AM7/11/16
to basilisk-fr, pop...@basilisk.fr
Hallo Stephane,

Thank you for your answer. I have tried to make it work but in the end i was unable to follow your suggestion.

now i solved it by copying and slightly modifying the "output_field()" routine:

  scalar grT[];
 
foreach()
   
{
      grT
[]= pow(sq((T[1]-T[-1])/ (2*Delta))+sq((T[0,1]-T[0,-1])/ (2*Delta))+sq((T[0,0,1]-T[0,0,-1])/ (2*Delta)),0.5);
   
}  
  scalar
* list = {T,grT};
  sprintf
(names,"./data/verslicet=%g.dat",t);
  FILE
*fpver =fopen (names,"w");
 
int nn = (pow(2,MAXLEVEL));
 
int len = list_len(list);
 
double ** field = matrix_new (nn, nn, len*sizeof(double));
 
double zp = L0/2.01;
 
double stp = L0/nn;
 
for (int i = 0; i < nn; i++)
   
{
     
double xp = stp*i + X0 + stp/2.;
     
for (int j = 0; j < nn; j++)
   
{
     
double yp = stp*j + Y0 + stp/2.;
     
Point point = locate (xp, yp,zp);
     
int k = 0;
     
for (scalar s in list)
        field
[i][len*j + k++] = point.level >= 0 ? s[] : nodata;
   
}
   
}
 
if (pid() == 0)
   
{ // master
     
@if _MPI
    MPI_Reduce
(MPI_IN_PLACE, field[0], len*nn*nn, MPI_DOUBLE, MPI_MIN, 0,
            MPI_COMM_WORLD
);
     
@endif
   
int k = 0;
     
for (scalar s in list)
   
{
     
for (int i = 0; i < nn; i++) {
       
for (int j = 0; j < nn; j++) {
          fprintf
(fpver, "%g\t", field[i][len*j + k]);
       
}
        fputc
('\n', fpver);
     
}
      fflush
(fpver);
      k
++;
   
}
   
}
 
@if _MPI
 
else // slave
    MPI_Reduce
(field[0], NULL, len*nn*nn, MPI_DOUBLE, MPI_MIN, 0,
        MPI_COMM_WORLD
);
 
@endif
    matrix_free
(field);
}

this appears to work.


Reply all
Reply to author
Forward
0 new messages