IPC and RAPL measuring in MPI - per process or total?

Ekaterina Tutlyaeva

unread,

Mar 23, 2017, 9:31:42 AM3/23/17

to ptools-perfapi

Dear support,

Sorry for the newbie quiestion, but I'm really can't figure it out..

If I have MPI-parallelized code and want to measure High Level PAPI_ipc, for example, using:
if (mpi_rank==0){ retval = PAPI_ipc( EventSet);}
my MPI-code
if (mpi_rank==0){retval = PAPI_ipc( EventSet, values);}

These values will contain measured data for all processes in my MPI code between, or only for MPI process with rank 0?
Is there way to collect IPC for all MPI-processes?

And similar question for RAPL events.
if (mpi_rank==0){ retval = PAPI_start( RaplEvents);}
my MPI-code
if (mpi_rank==0){retval = PAPI_stop( RaplEvents, values);}
In this case, values will contains measured data for all executed code, excluding called libraries listed in LD_LIBRARY_PATH?

Best regards,
Ekaterina

Vince Weaver

unread,

Mar 23, 2017, 12:28:26 PM3/23/17

to Ekaterina Tutlyaeva, ptools-perfapi

On Thu, 23 Mar 2017, Ekaterina Tutlyaeva wrote:

> If I have MPI-parallelized code and want to measure High Level PAPI_ipc,
> for example, using:
> if (mpi_rank==0){ retval = PAPI_ipc( EventSet);}
> my MPI-code
> if (mpi_rank==0){retval = PAPI_ipc( EventSet, values);}
>
> These values will contain measured data for all processes in my MPI code
> between, or only for MPI process with rank 0?
> Is there way to collect IPC for all MPI-processes?

With MPI you are possibly running across many machines in a cluster, so
the only way to get the full results you want is to gather the PAPI
results on each rank and at the end combine them.

Depending on what you're trying to measure, it might make more sense to
use the lower-level interface and maeasure PAPI_TOT_CYC and PAPI_TOT_INS
and calculate IPC manually.

> And similar question for RAPL events.
> if (mpi_rank==0){ retval = PAPI_start( RaplEvents);}
> my MPI-code
> if (mpi_rank==0){retval = PAPI_stop( RaplEvents, values);}
> In this case, values will contains measured data for all executed code,
> excluding called libraries listed in LD_LIBRARY_PATH?

RAPL is more complex, as it is package-wide not limited per processor.

Just like before you will need to measure in each rank and then add up.

However, if your cluster assigns multiple ranks per node things can get
complex as if two ranks are running on the same processor package you will
end up double counting the results. I don't know if there's an easy way
to deal with that issue, short of tracking the node/package values for
each rank and making sure there are no duplicates.

Vince

Servat, Harald

unread,

Mar 23, 2017, 1:23:01 PM3/23/17

to Vince Weaver, Ekaterina Tutlyaeva, ptools-perfapi

My comments prefixed with HS>

> And similar question for RAPL events.
> if (mpi_rank==0){ retval = PAPI_start( RaplEvents);}
> my MPI-code
> if (mpi_rank==0){retval = PAPI_stop( RaplEvents, values);}
> In this case, values will contains measured data for all executed code,
> excluding called libraries listed in LD_LIBRARY_PATH?

RAPL is more complex, as it is package-wide not limited per processor.

Just like before you will need to measure in each rank and then add up.

However, if your cluster assigns multiple ranks per node things can get
complex as if two ranks are running on the same processor package you will
end up double counting the results. I don't know if there's an easy way
to deal with that issue, short of tracking the node/package values for
each rank and making sure there are no duplicates.

Vince

HS > One alternative you could try is to use MPI_Get_processor_name() to identify in which MPI rank runs and then create a communicator containing one (representative) MPI rank per host. The representative rank would be responsible for monitoring the consumption for the whole node (adding if the node has multiple sockets) and these could be collected using the previously created communicator. I'm sure you can implement the same in multiple ways, though.

HS > Regarding the accounting of libraries, my understanding is that they will be counted. IIRC, the only thing you can tune here is whether to exclude the kernel / hypervisor executions (see http://icl.cs.utk.edu/projects/papi/wiki/PAPI3:PAPI_set_domain.3 -- might be outdated). Probably PAPI team could shed more light into this.

Best,

--
You received this message because you are subscribed to the Google Groups "ptools-perfapi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ptools-perfap...@icl.utk.edu.
To post to this group, send email to ptools-...@icl.utk.edu.
Visit this group at https://groups.google.com/a/icl.utk.edu/group/ptools-perfapi/.

-----------------------------------------------------------
Intel Corporation Iberia, S.A.
Registered Office: Torre Picasso, 25th Floor,
Plaza Pablo Ruiz Picasso, no. 1, 28020 Madrid

Este mensaje se dirige exclusivamente a su destinatario y puede
contener informacion privilegiada o confidencial. Si no es vd.
el destinatario indicado, queda notificado de que la lectura,
utilizacion, divulgacion y,o copia sin autorizacion esta prohibida
en virtud de la legislacion vigente. Si ha recibido este mensaje por
error, le rogamos que nos lo communique inmediatamente por
esta misma via y proceda a su destruccion.

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

Ekaterina Tutlyaeva

unread,

Mar 23, 2017, 3:44:17 PM3/23/17

to ptools-...@icl.utk.edu

Dear Vince,

thank you for your answer!

With MPI you are possibly running across many machines in a cluster, so
the only way to get the full results you want is to gather the PAPI
results on each rank and at the end combine them.

As for now, I work only with one machine (Intel Xeon Phi, KNL), so I have multiple MPI ranks per one machine. So I shouldn't sum up the results? The results in MPI rank =0 will be obtained for total program context?

Depending on what you're trying to measure, it might make more sense to
use the lower-level interface and maeasure PAPI_TOT_CYC and PAPI_TOT_INS and calculate IPC manually.

Thank you! I want to obtain maximum available stats to study the energy dependence on the instructions/cache and so on in my app...

There is interesing link in previous letter:

(see http://icl.cs.utk.edu/projects/papi/wiki/PAPI3:PAPI_set_domain.3 -- might be outdated). Probably PAPI team could shed more light into this.

Can I set, for example, PAPI_DOM_KERNEL context to get all PAPI_events in general context?

Best regards,

Ekaterina

--

__________

С уважением,

Тютляева Екатерина

--

__________

С уважением,

Тютляева Екатерина

Ekaterina Tutlyaeva

unread,

Mar 23, 2017, 3:47:37 PM3/23/17

to Servat, Harald, Vince Weaver, ptools-perfapi

One alternative you could try is to use MPI_Get_processor_name() to identify in which MPI rank runs and then create a communicator containing one (representative) MPI rank per host.

Thank you very much for the idea!

Now I have simpler case, with one node only, multiple MPI ranks per node. I just have newbie qestion - want to be sure, that I really monitoring the consumption for the whole node using one MPI rank, not the local process consumption..

The multiple nodes is the next step, thank you for the solution!

Best regards,

Ekaterina

2017-03-23 20:22 GMT+03:00 Servat, Harald <harald...@intel.com>:

My comments prefixed with HS>

> And similar question for RAPL events.
> if (mpi_rank==0){ retval = PAPI_start( RaplEvents);}
> my MPI-code
> if (mpi_rank==0){retval = PAPI_stop( RaplEvents, values);}
> In this case, values will contains measured data for all executed code,
> excluding called libraries listed in LD_LIBRARY_PATH?

RAPL is more complex, as it is package-wide not limited per processor.

Just like before you will need to measure in each rank and then add up.

However, if your cluster assigns multiple ranks per node things can get
complex as if two ranks are running on the same processor package you will
end up double counting the results. I don't know if there's an easy way
to deal with that issue, short of tracking the node/package values for
each rank and making sure there are no duplicates.

Vince

HS > One alternative you could try is to use MPI_Get_processor_name() to identify in which MPI rank runs and then create a communicator containing one (representative) MPI rank per host. The representative rank would be responsible for monitoring the consumption for the whole node (adding if the node has multiple sockets) and these could be collected using the previously created communicator. I'm sure you can implement the same in multiple ways, though.

HS > Regarding the accounting of libraries, my understanding is that they will be counted. IIRC, the only thing you can tune here is whether to exclude the kernel / hypervisor executions (see http://icl.cs.utk.edu/projects/papi/wiki/PAPI3:PAPI_set_domain.3 -- might be outdated). Probably PAPI team could shed more light into this.

Best,

--
You received this message because you are subscribed to the Google Groups "ptools-perfapi" group.

To unsubscribe from this group and stop receiving emails from it, send an email to ptools-perfapi+unsubscribe@icl.utk.edu.

To post to this group, send email to ptools-...@icl.utk.edu.
Visit this group at https://groups.google.com/a/icl.utk.edu/group/ptools-perfapi/.

-----------------------------------------------------------
Intel Corporation Iberia, S.A.
Registered Office: Torre Picasso, 25th Floor,
Plaza Pablo Ruiz Picasso, no. 1, 28020 Madrid

Este mensaje se dirige exclusivamente a su destinatario y puede
contener informacion privilegiada o confidencial. Si no es vd.
el destinatario indicado, queda notificado de que la lectura,
utilizacion, divulgacion y,o copia sin autorizacion esta prohibida
en virtud de la legislacion vigente. Si ha recibido este mensaje por
error, le rogamos que nos lo communique inmediatamente por
esta misma via y proceda a su destruccion.

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

Servat, Harald

unread,

Mar 24, 2017, 3:56:23 AM3/24/17

to Ekaterina Tutlyaeva, Vince Weaver, ptools-perfapi

Hello,

As Vince pointed out, the RAPL (energy) counters are per package (socket) not per core. This means that a reading to a counter within a package will return the same value indistinctly from which core you request the data and that this value will be the aggregate per package (socket).

So you have to discover how many sockets you have in your system (you can probably accomplish that by querying the available RAPL counters in PAPI) and then each representative MPI rank may report the consumption. And here, it is up to you to determine if you want to provide the data by socket or if you prefer to accumulate the values from the different sockets into a single one. That depends on your use case.

Best,

To unsubscribe from this group and stop receiving emails from it, send an email to ptools-perfap...@icl.utk.edu.

To post to this group, send email to ptools-...@icl.utk.edu.
Visit this group at https://groups.google.com/a/icl.utk.edu/group/ptools-perfapi/.

-----------------------------------------------------------
Intel Corporation Iberia, S.A.
Registered Office: Torre Picasso, 25th Floor,
Plaza Pablo Ruiz Picasso, no. 1, 28020 Madrid

Este mensaje se dirige exclusivamente a su destinatario y puede
contener informacion privilegiada o confidencial. Si no es vd.
el destinatario indicado, queda notificado de que la lectura,
utilizacion, divulgacion y,o copia sin autorizacion esta prohibida
en virtud de la legislacion vigente. Si ha recibido este mensaje por
error, le rogamos que nos lo communique inmediatamente por
esta misma via y proceda a su destruccion.

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

--

__________

С уважением,

Тютляева Екатерина

--

You received this message because you are subscribed to the Google Groups "ptools-perfapi" group.

To unsubscribe from this group and stop receiving emails from it, send an email to ptools-perfap...@icl.utk.edu.

To post to this group, send email to ptools-...@icl.utk.edu.
Visit this group at https://groups.google.com/a/icl.utk.edu/group/ptools-perfapi/.

Ekaterina Tutlyaeva

unread,

Mar 24, 2017, 4:08:01 AM3/24/17

to Servat, Harald, Vince Weaver, ptools-perfapi

Thank you!

I finally got it, I have KNL CPU with only one socket (Package), so the energy values are the same for all processes.

But what about other (not RAPL) events, such as PAPI_TOT_INS, PAPI_TOT_CYC, PAPI_L1_DCM etc? Is it true, that reading this counters also returns the same value indistinctly from core/MPI-process?

Best regards,

Ekaterina

Servat, Harald

unread,

Mar 24, 2017, 8:52:07 AM3/24/17

to Ekaterina Tutlyaeva, Vince Weaver, ptools-perfapi

No, the regular performance counters are associated per core and the OS is responsible to keep track of them per process. So that means that each (MPI) process – and in fact each thread if any -- will have its own PAPI_TOT_INS, PAPI_L1_DCM, ....

From: Ekaterina Tutlyaeva [mailto:x...@rsc-tech.ru]
Sent: Friday, 24 March, 2017 09:08
To: Servat, Harald <harald...@intel.com>
Cc: Vince Weaver <vincent...@maine.edu>; ptools-perfapi <ptools-...@icl.utk.edu>
Subject: Re: [ptools-perfapi] IPC and RAPL measuring in MPI - per process or total?

Thank you!

Best regards,

Ekaterina

--

You received this message because you are subscribed to the Google Groups "ptools-perfapi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ptools-perfap...@icl.utk.edu.
To post to this group, send email to ptools-...@icl.utk.edu.
Visit this group at https://groups.google.com/a/icl.utk.edu/group/ptools-perfapi/.

Ekaterina Tutlyaeva

unread,

Mar 27, 2017, 2:36:18 AM3/27/17

to Servat, Harald, Vince Weaver, ptools-perfapi

No, the regular performance counters are associated per core and the OS is responsible to keep track of them per process. So that means that each (MPI) process – and in fact each thread if any -- will have its own PAPI_TOT_INS, PAPI_L1_DCM, ....

Thank you! Finally it's absolutely clear for me, thank you!!

Best regards

2017-03-24 15:52 GMT+03:00 Servat, Harald <harald...@intel.com>:

No, the regular performance counters are associated per core and the OS is responsible to keep track of them per process. So that means that each (MPI) process – and in fact each thread if any -- will have its own PAPI_TOT_INS, PAPI_L1_DCM, ....

From: Ekaterina Tutlyaeva [mailto:x...@rsc-tech.ru]
Sent: Friday, 24 March, 2017 09:08
To: Servat, Harald <harald...@intel.com>
Cc: Vince Weaver <vincent...@maine.edu>; ptools-perfapi <ptools-...@icl.utk.edu>
Subject: Re: [ptools-perfapi] IPC and RAPL measuring in MPI - per process or total?

Thank you!

I finally got it, I have KNL CPU with only one socket (Package), so the energy values are the same for all processes.

But what about other (not RAPL) events, such as PAPI_TOT_INS, PAPI_TOT_CYC, PAPI_L1_DCM etc? Is it true, that reading this counters also returns the same value indistinctly from core/MPI-process?

Best regards,

Ekaterina

--

You received this message because you are subscribed to the Google Groups "ptools-perfapi" group.

To unsubscribe from this group and stop receiving emails from it, send an email to ptools-perfapi+unsubscribe@icl.utk.edu.

To post to this group, send email to ptools-...@icl.utk.edu.
Visit this group at https://groups.google.com/a/icl.utk.edu/group/ptools-perfapi/.

-----------------------------------------------------------
Intel Corporation Iberia, S.A.
Registered Office: Torre Picasso, 25th Floor,
Plaza Pablo Ruiz Picasso, no. 1, 28020 Madrid

Este mensaje se dirige exclusivamente a su destinatario y puede
contener informacion privilegiada o confidencial. Si no es vd.
el destinatario indicado, queda notificado de que la lectura,
utilizacion, divulgacion y,o copia sin autorizacion esta prohibida
en virtud de la legislacion vigente. Si ha recibido este mensaje por
error, le rogamos que nos lo communique inmediatamente por
esta misma via y proceda a su destruccion.

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

Reply all

Reply to author

Forward