[Patch] Improved ENERGY group for overflow issue on SandyBridge and IvyBridge

51 views
Skip to first unread message

Yunqi Zhang

unread,
Mar 31, 2013, 3:45:53 AM3/31/13
to likwid-d...@googlegroups.com
Hi,

This is a really awesome and helpful project. But I have to say I've been struggling about the MSR overflow for the energy consumption profiling for a while, so I'm trying to fix the problem.

It is a simple check if the value of the register file at the end of the run is smaller than the beginning and add it by 0xFFFFFFFF (max int32_t) if so. Though it is not perfectly fixed, it significantly reduces the possibility of overflow by 50%.

Hope it helps. Thanks!

Yunqi
fixed_energy_overflow.patch

Sonal Aggarwal

unread,
Sep 28, 2014, 9:41:57 AM9/28/14
to likwid-d...@googlegroups.com
How do we use this patch with the existing likwid installation( stable version) 

Thomas Röhl

unread,
Sep 29, 2014, 5:03:01 AM9/29/14
to likwid-d...@googlegroups.com
Hi Guys,

Thanks for your contribution, but this is a too simplistic approach to handle overflows in the RAPL counters. Since the RAPL counter registers are read-only, you cannot zero them in the beginning of a measurement phase. Therefore you cannot assume that an overflow requires the incrementation by 0xFFFFFFFF. Moreover, the value after one overflow has to be (0xFFFFFFFF - start) + stop.

An example of your approach:
Start value: 0xFFFFFFFF - 0x1
End value: 0x1

The real value would be 2 but with your approach you get 0xFFFFFFFF+2, a tremendously wrong value. Although this is a extremely constructed example, something like this can happen.

The only reliable method is to start a thread that reads the RAPL counters more frequently to detect the overflows. We talked about this problem in the LIKWID team but currently there is no conclusion how to handle this. Currently we do not use extra threads in LIKWID because it is not needed to measure all the hardware performance counters except the RAPL ones. Since RAPL counters are socket-specific, only one thread is needed per socket and every time it reads the RAPL counters, the CPU will be delayed in the application compared to the others and its cache gets trashed, hence this CPU will run your application "slower" than the other CPUs on the socket. The delay may be small but in cases of barriers and synchronous communication, it will affect all other CPUs and finally application runtime.

The current development version (trunk in SVN) uses a similar approach as your supplied patch but it is not in a usable state by now. I will check if a slightly adapted patch fits into the v3.1 branch because we want to release a bugfix release in the near future. If you like to write the threaded approach we could this about integrating it as an optional feature for everyone who requires fully reliable RAPL results.

Greetings,
Thomas

Yunqi Zhang

unread,
Sep 29, 2014, 9:37:56 AM9/29/14
to likwid-d...@googlegroups.com
Yes, Thomas is correct. It would be great if likwid can implement asynchronous thread to read RAPL counters periodically. 

What I ended up doing later on was to directly read the RAPL counter periodically, which seems to be a good solution if you want more reliable measurement. And here is a good example:

Yunqi

--

---
You received this message because you are subscribed to a topic in the Google Groups "likwid-developers" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/likwid-developers/WUUPzkIKIlY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to likwid-develop...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sonal Aggarwal

unread,
Oct 1, 2014, 12:14:52 PM10/1/14
to likwid-d...@googlegroups.com
Hi Guys,

Thanks for the response, it helped cleared the confusion.

Zhang, the propram rapl-read.c is reading the RAPL counters, but as Thomas mentioned are you running that as a separate thread per socket?

Thanks

Yunqi Zhang

unread,
Oct 1, 2014, 12:44:39 PM10/1/14
to likwid-d...@googlegroups.com
Hi Sonal,

To my understanding, each socket will have its own RAPL register, which is the reason why you need multiple threads to read the registers from different socket.

In rapl-read.c, there is a parameter "-c", which let you specify which core you want to monitor. You can leverage that to monitor the socket you want.

Suppose you have a 2-socket machine, and the cores on the first socket are 0-3 and 4-7 on the second socket:

first socket: 0, 1, 2, 3
second socket: 4, 5, 6, 7

Correct me if I'm wrong but I think you can run a rapl-read with "-c 0" to measure socket 0, and another rapl-read with "-c 4" to measure socket 1. And the measurement you get from "-c 0" should be the same as "-c 1", "-c 2" and "-c 3" (any core on the same socket).

Yunqi

Sonal Aggarwal

unread,
Oct 1, 2014, 12:52:48 PM10/1/14
to likwid-d...@googlegroups.com
Yes, I think that is right.

A very trivial question: For instance I have a two socket machine and I have an application running (whose energy value I want to obtain). So I make two threads (rapl-read.c) each running per socket and dumping the energy values from time to time in a file.

How do I know how much energy my application has consumed?

I m sorry if it sounds like a stupid question.

Sonal Aggarwal

Thomas Röhl

unread,
Oct 2, 2014, 7:03:57 AM10/2/14
to likwid-d...@googlegroups.com
Hi,

I forgot to say, that you can use LIKWID with a separate thread for RAPL already.

You can use the Timeline mode of likwid-perfctr:
./likwid-perfctr -C <CPU on socket 0>,<CPU on socket 1> -g PWR_PKG_ENERGY:PWR0 -t <interval with suffix s or ms> <executable>

This will start one extra thread that reads the RAPL counters on both sockets periodically and directly prints the values to the console. The main thread executes the <executable> given on the commandline. This method it rarely used, so maybe it does not work flawlessly.
The output lines should look like:
PWR_PKG_ENERGY <current timestamp> <result of socket 0> <result of socket 1>

In order to get the energy consumed for the whole run, you have to check the results and if one is smaller than the one in the previous line, an overflow happened. If overflows occured, get the values before the overflows and sum them up. I think you miss some Joules/Watts because the overflow will not happen directly after the read but for a first insight, it should be working.

Greetings,
Thomas
Reply all
Reply to author
Forward
0 new messages