SThorm Power Budget Control - Mechanisms & Policy

Patrick Bellasi

unread,

Apr 8, 2013, 6:36:15 AM4/8/13

to bosp-...@googlegroups.com

The BqrbequeRTRM SThorm branch has been pushed with an interesting new addition: the really first implementation of the mechanisms required to support a Power Budgeting control policy.

This support is based on an extension of the SThorm PIL in order to:

1. collect power traces from the fabric

2. filter power readings to get a smoothed average of the fabric power consumption

3. tune the power budget availability in order to match the system configured maximum allowed power consumption.

This extension exploits the recently introduced command interface to allows:

1. the tuning of the power budget control policy

2. the setting of the configured system power budget

This interface could also be exploited to feed power readings for platforms not supporting real HW monitors, which could be conveniently used to setup and run testing scenarios like the one proposed in this video:

http://www.youtube.com/watch?v=oaa6I1IVA8w

... as usual, best viewed in FullHD resolution ;-)

Enjoy,

Patrick

--
#include <best/regards.h>

Patrick Bellasi
Post-Doc at Politecnico di Milano
http://home.dei.polimi.it/bellasi

Giuseppe Massari

unread,

Apr 8, 2013, 6:50:39 AM4/8/13

to bosp-...@googlegroups.com

Cool!

And this is just the begin...

2013/4/8 Patrick Bellasi <derk...@gmail.com>

--
You received this message because you are subscribed to the Google Groups "BOSP Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bosp-devel+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Giuseppe Massari

Twitter | Google+ | Facebook | Skype: massanga999

Germain HAUGOU

unread,

Apr 8, 2013, 8:27:43 AM4/8/13

to bosp-...@googlegroups.com, Patrick Bellasi (derkling@gmail.com)

Hi Patrick,

I’ve just sent you a SDK which reports the power consumption in the PIL. For that I extended the PIL with these new fields:

typedef struct PlatformDescriptor {

// Total energy in nJ consumed since the bootstrap

uint64_t totalEnergy;

// Last time in us the power values were updated

int64_t timestamp;

// The fabric side will put this field to 1 when a new value is available

// It should be set to 0 by host side when the value has been read

int32_t fabricUpdate;

// The host side should put this value to 1 once it is ready to read

// a new value. The fabric will never put a new value while this field

// is at 0. Once the power value is updated, the fabric set this field to

// to 0

int32_t hostReady;

Is it close to what you have already implemented ? In any case, if it is too far , I can change the interface. Did you push your modified PIL somewhere ?

Here the test sample code I’ve used to validate this extension:

int64_t lastTimestamp = 0;

int i;

for (i=0; i<10; i++) {

page->pdesc.hostReady = 1;

while(!page->pdesc.fabricUpdate);

page->pdesc.fabricUpdate = 0;

printf("Read new value energy=%lldnJ time %lldus power=%fmW\n", page->pdesc.totalEnergy, page->pdesc.timestamp - lastTimestamp, (float)(page->pdesc.totalEnergy)/(page->pdesc.timestamp - lastTimestamp));

lastTimestamp = page->pdesc.timestamp;

}

Normally this protocol should make sure that you don’t miss any value update and also that we don’t have any race condition between the host and the fabric sides. If you see any potential issue, let me know.

In case this interface is fine, could you try it on your side before Jens try it on SVC ?

Ciao,

Germain

--

Giuseppe Massari

unread,

Apr 8, 2013, 8:58:17 AM4/8/13

to bosp-...@googlegroups.com, Patrick Bellasi (derkling@gmail.com)

2013/4/8 Germain HAUGOU <germain...@st.com>

Hi Patrick,

I’ve just sent you a SDK which reports the power consumption in the PIL. For that I extended the PIL with these new fields:

typedef struct PlatformDescriptor {
// Total energy in nJ consumed since the bootstrap

uint64_t totalEnergy;
// Last time in us the power values were updated

int64_t timestamp;
// The fabric side will put this field to 1 when a new value is available

// It should be set to 0 by host side when the value has been read
int32_t fabricUpdate;
// The host side should put this value to 1 once it is ready to read

// a new value. The fabric will never put a new value while this field
// is at 0. Once the power value is updated, the fabric set this field to
// to 0
int32_t hostReady;

Do we really need fabricUpdate and hostReady?

I mean... for instance, from the BBQ side, we could read a sample of power consumption every 100ms, while from the STHORM side the value is updated every 10ms. Therefore we would have 10 updates from STHORM in the "BBQ period" of 100ms.

My question is... do we expect not negligible variations in these 10 values?

If the answer is YES: Ok, it is worth to consider the synchronization protocol above.
Conversely, if the answer is NO, from the BBQ side, it could be sufficient just to check that the value updated by the STHORM run-time is updated (timestamp changed?),, and thus get the power consumption value.

The other point is... we expect to execute the power management policy with a granularity of seconds. Considering also the previous question, is this good enough for an effective power management of STHORM?

Patrick Bellasi

unread,

Apr 8, 2013, 9:23:05 AM4/8/13

to Giuseppe Massari, bosp-...@googlegroups.com

On Mon, Apr 8, 2013 at 2:58 PM, Giuseppe Massari <joe.ma...@gmail.com> wrote:

2013/4/8 Germain HAUGOU <germain...@st.com>

Hi Patrick,

Hi Germain!

Do we really need fabricUpdate and hostReady?

I mean... for instance, from the BBQ side, we could read a sample of power consumption every 100ms, while from the STHORM side the value is updated every 10ms. Therefore we would have 10 updates from STHORM in the "BBQ period" of 100ms.

My question is... do we expect not negligible variations in these 10 values?

If the answer is YES: Ok, it is worth to consider the synchronization protocol above.
Conversely, if the answer is NO, from the BBQ side, it could be sufficient just to check that the value updated by the STHORM run-time is updated (timestamp changed?),, and thus get the power consumption value.

The other point is... we expect to execute the power management policy with a granularity of seconds. Considering also the previous question, is this good enough for an effective power management of STHORM?

I agree with the considerations by Giuseppe.

For this really first control policy I would suggest to keep thinks as simple as possible, thus, just checking the timestamp it should be enough from the BBQ side.

However, I could see just one strict need for the sync protocol you are suggesting:

since the two totalEnergy and timestamp values are 64bit wide... there could be a change for overlapping write/read opeartions, i.e. BBQ reading a value while is going to be updated by the fabric side, thus turning out on a wrong read.

If this is the case, than I suppose that the sync protocol proposed by Germain is mandatory, by the way it is also a quite simple and light-way protocol.

Moreover, right now the PIL extension is based on POWER reading, i.e. we expect to read a [Watt] value and not an energy value in [Joule]. Energy is for sure much more interesting, but the control policy right now is Power based, not Energy based.

Could you report the Power value as well?... this is what we will use right now, and only in a second time we will investigate on how to exploit the Energy metric.

Ciao Patrick

Patrick Bellasi

unread,

Apr 9, 2013, 10:12:16 AM4/9/13

to Germain Haugou, bosp-...@googlegroups.com

Here it is the mail you missed, I'm posting it on the BOSP ML to keep track of your comments.

Germain HAUGOU

unread,

Apr 9, 2013, 11:18:32 AM4/9/13

to Patrick Bellasi, bosp-...@googlegroups.com

Hi Patrick,

indeed I’ve put this protocol to avoid 2 issues:

- The 64 bits types. BBQ could read 32bits of the old value and 32 bits of the new

- I can’t atomically update the power and the timestamp, thus it is possible to have a race condition on this and BBQ could for example read the previous value or miss a value.

Thus I think it is safer to keep this protocol, it will avoid strange behavior.

For what concerns the power vs energy, note that as you have the timestamp, you can convert from power to energy considering the timestamp of the previous measure.

Ciao,

Germain

Patrick Bellasi

unread,

Apr 9, 2013, 11:48:13 AM4/9/13

to Germain HAUGOU, bosp-...@googlegroups.com

On Tue, Apr 9, 2013 at 5:18 PM, Germain HAUGOU <germain...@st.com> wrote:

Hi Patrick,

indeed I’ve put this protocol to avoid 2 issues:
- The 64 bits types. BBQ could read 32bits of the old value and 32 bits of the new

I can see just a single entry "totalEnergy" in the struct you use.

What do you mean by old and new value?

- I can’t atomically update the power and the timestamp, thus it is possible to have a race condition on this and BBQ could for example read the previous value or miss a value.

Right, but you could update the timestamp only after the new energy value has been updated, while BBQ check first the timestamp and only after a modification has been noticed read the energy value.

That way there should not be no race conditions, eventually just the risk to loss some samples...

Thus I think it is safer to keep this protocol, it will avoid strange behavior.

However, just to have a fast convergence for the use-case, lets go for the solution you suggested.

We will implement the readings from the new PIL as soon as possible.

For what concerns the power vs energy, note that as you have the timestamp, you can convert from power to energy considering the timestamp of the previous measure.

Ok, sure. Indeed having an energy reading is for sure more accurate than reading a spot Power.

By the way, with Edoardo we are testing better the fabric_quota assignment code.

It seems there are some problems, probably the OCL code is running in just one of the 4 cluster, even if 4 WG are enqueued at the same time...

Moreover, checking the PIL data at run-time it seems that they are not properly configured, we are still investigating if it's a BBQ-side or fabric-side issue. We will report it back to you as soon as we have some more clue.

Ciao,

Germain

Ciao Patrick

Germain HAUGOU

unread,

Apr 9, 2013, 1:01:43 PM4/9/13

to Patrick Bellasi, bosp-...@googlegroups.com

From: Patrick Bellasi [mailto:derk...@gmail.com]
Sent: Tuesday, April 09, 2013 5:48 PM
To: Germain HAUGOU
Cc: bosp-...@googlegroups.com
Subject: Re: SThorm Power Budget Control - Mechanisms & Policy

On Tue, Apr 9, 2013 at 5:18 PM, Germain HAUGOU <germain...@st.com> wrote:

Hi Patrick,

indeed I’ve put this protocol to avoid 2 issues:

- The 64 bits types. BBQ could read 32bits of the old value and 32 bits of the new

I can see just a single entry "totalEnergy" in the struct you use.

What do you mean by old and new value?

I mean the fabric will first issue a 32bits write to the LSB and then a second 32 bits write to the MSB. Between these 2 writes, BBQ could read the 32bits LSB of the new value which is being updated and the 32bits MSB of the old value being replaced. I must agree the probability is really low !

- I can’t atomically update the power and the timestamp, thus it is possible to have a race condition on this and BBQ could for example read the previous value or miss a value.

Right, but you could update the timestamp only after the new energy value has been updated, while BBQ check first the timestamp and only after a modification has been noticed read the energy value.

That way there should not be no race conditions, eventually just the risk to loss some samples...

This is what I mean, due to the non-atomic update, you can miss the previous or the current power value.

Thus I think it is safer to keep this protocol, it will avoid strange behavior.

However, just to have a fast convergence for the use-case, lets go for the solution you suggested.

We will implement the readings from the new PIL as soon as possible.

For what concerns the power vs energy, note that as you have the timestamp, you can convert from power to energy considering the timestamp of the previous measure.

Ok, sure. Indeed having an energy reading is for sure more accurate than reading a spot Power.

By the way, with Edoardo we are testing better the fabric_quota assignment code.

It seems there are some problems, probably the OCL code is running in just one of the 4 cluster, even if 4 WG are enqueued at the same time...

I remember that when we had a look, multiview was issuing come commands with only 1 work-group even when configured with 4 work-groups. Are you sure this is not the same issue ?

Moreover, checking the PIL data at run-time it seems that they are not properly configured, we are still investigating if it's a BBQ-side or fabric-side issue. We will report it back to you as soon as we have some more clue.

Ciao,
Germain

Ciao Patrick

Patrick Bellasi

unread,

Apr 9, 2013, 1:31:22 PM4/9/13

to Germain HAUGOU, bosp-...@googlegroups.com

On Tue, Apr 9, 2013 at 7:01 PM, Germain HAUGOU <germain...@st.com> wrote:

By the way, with Edoardo we are testing better the fabric_quota assignment code.

It seems there are some problems, probably the OCL code is running in just one of the 4 cluster, even if 4 WG are enqueued at the same time...

I remember that when we had a look, multiview was issuing come commands with only 1 work-group even when configured with 4 work-groups. Are you sure this is not the same issue ?

We will continue tomorrow the investigation.

So far what we have verified is that _on the board_:

1. BBQ setups a constraint for a 100% fabric quota

2. MV (by code) pushes 4 WG per frame

However the frame processing time is more-or-less the same of the case with 25% fabric quota, while MV (by code) still pushes 4WG per frame.

We will update tomorrow during the telco.

Ciao Patrick

Germain HAUGOU

unread,

Apr 9, 2013, 2:14:07 PM4/9/13

to Patrick Bellasi, bosp-...@googlegroups.com

Could you send me a binary package containing MV + BBQ that I can use to reproduce the issue ?

Ciao,

Germain

From: Patrick Bellasi [mailto:derk...@gmail.com]
Sent: Tuesday, April 09, 2013 7:31 PM
To: Germain HAUGOU
Cc: bosp-...@googlegroups.com
Subject: Re: SThorm Power Budget Control - Mechanisms & Policy

On Tue, Apr 9, 2013 at 7:01 PM, Germain HAUGOU <germain...@st.com> wrote:

Patrick Bellasi

unread,

Apr 9, 2013, 2:27:22 PM4/9/13

to Germain HAUGOU, bosp-...@googlegroups.com

On Tue, Apr 9, 2013 at 8:14 PM, Germain HAUGOU <germain...@st.com> wrote:

Could you send me a binary package containing MV + BBQ that I can use to reproduce the issue ?

Tomorrow morning I'll build that package with Edoardo and let have you a link for the download.

Ciao,
Germain

Ciao Patrick

Edoardo Paone

unread,

Apr 10, 2013, 6:23:45 AM4/10/13

to bosp-...@googlegroups.com, Germain HAUGOU

Hi Germain,

I'm using the STHORM SDK build 145, the latest I received from you.

I built a package MV + BBQ that you can use to reproduce the issue related to the fabric quota with MV:

http://dl.dropbox.com/u/56609774/update.zip

Use this tool to deploy it on the board:
https://www.dropbox.com/s/vfi44qqqo48j9b2/apply_update.sh

Then, with a running STHORM board already connected through ADB, you could deploy BBQ by just issuing this command:
$ ./apply_update.sh /path/to/update.zip

Then, the following commands to start BBQ:

1) p12run --platform=sthorm --sthorm-steps="stop run"

2) adb shell /data/bosp/sbin/barbeque

Two versions of MV are deployed on the board, which I use to refer to as version A and version B:

A) /data/bosp/mview/mview1/

B) /data/bosp/mview/mview2/

The command to run MV is the same for version A and B, except for the path:

A) adb shell LD_PRELOAD=/data/bosp/lib/bbque/libbbque_rtlib.so OCL_KERNELS_PATH=/data/bosp/mview/mview1/ busybox time /data/bosp/mview/mview1/mview_ahead --max_hypo_value=18 --bbq_awm_id=3 --num_cycles=20

B) adb shell LD_PRELOAD=/data/bosp/lib/bbque/libbbque_rtlib.so OCL_KERNELS_PATH=/data/bosp/mview/mview2/ busybox time /data/bosp/mview/mview2/mview_ahead --max_hypo_value=18 --bbq_awm_id=3 --num_cycles=20

As you can see, I added the 'busybox time' command to profile the overall execution time (20 cycles).

You can change the AWM with the command line parameter 'bbq_awm_id'. There are three possible AWMs:

--bbq_awm_id=0 for 25% fabric quota

--bbq_awm_id=1 for 50% fabric quota

--bbq_awm_id=2 for 75% fabric quota

--bbq_awm_id=3 for 100% fabric quota

If you want to run two instances, then add the parameter 'base_dir' with the instance ID, e.g.:

A0) adb shell LD_PRELOAD=/data/bosp/lib/bbque/libbbque_rtlib.so OCL_KERNELS_PATH=/data/bosp/mview/mview1/ busybox time /data/bosp/mview/mview1/mview_ahead --max_hypo_value=18 --bbq_awm_id=1 --num_cycles=20 --base_dir=0

A1) adb shell LD_PRELOAD=/data/bosp/lib/bbque/libbbque_rtlib.so OCL_KERNELS_PATH=/data/bosp/mview/mview1/ busybox time /data/bosp/mview/mview1/mview_ahead --max_hypo_value=18 --bbq_awm_id=1 --num_cycles=20 --base_dir=1

You can notice, in the example above, that I assign the AWM 1 (50% fabric quota) to each instance. In this way, the two instances should be able to run "in parallel" with half of the fabric quota.

I made some measurements on both version A and B, in all 4 AWMs and also with two instances running in parallel. See the attached file mv_sthorm.txt

As you can see, the execution time does not change when I change the AWM but maybe this is correct if the application runs alone on the fabric. The strange thing is that the execution time when there are two instances running should be at least double the time of a single instance with 100% fabric quota. Is that right?

-Edoardo

mv_sthorm.txt

Patrick Bellasi

unread,

Apr 10, 2013, 7:03:18 AM4/10/13

to bosp-...@googlegroups.com, Germain HAUGOU

On Wed, Apr 10, 2013 at 12:23 PM, Edoardo Paone <pa...@elet.polimi.it> wrote:

Hi Germain,

Hi Germain!

Meanwhile you run tests on your side, could you provide us the OpenCL application you used to test the fabric_quota assignment, so that we can run some tests locally as well?

Thanks Patrick

Germain HAUGOU

unread,

Apr 10, 2013, 7:06:12 AM4/10/13

to bosp-...@googlegroups.com

Hi Patrick,

I used the face detect kernel which can be launched from tests/vision/Face-Detect.

However to test the quotas, I’ve directly made patches in the runtime to put faked quotas, so at the end it will be difficult for you to replicate that.

Bye,

Germain

From: bosp-...@googlegroups.com [mailto:bosp-...@googlegroups.com] On Behalf Of Patrick Bellasi
Sent: Wednesday, April 10, 2013 1:03 PM
To: bosp-...@googlegroups.com
Cc: Germain HAUGOU
Subject: Re: SThorm Power Budget Control - Mechanisms & Policy

On Wed, Apr 10, 2013 at 12:23 PM, Edoardo Paone <pa...@elet.polimi.it> wrote:

--

Patrick Bellasi

unread,

Apr 10, 2013, 7:18:41 AM4/10/13

to bosp-...@googlegroups.com, Germain Haugou

On Wed, Apr 10, 2013 at 1:06 PM, Germain HAUGOU <germain...@st.com> wrote:

Hi Patrick,

I used the face detect kernel which can be launched from tests/vision/Face-Detect.

However to test the quotas, I’ve directly made patches in the runtime to put faked quotas, so at the end it will be difficult for you to replicate that.

Good to know!

Edoardo will have a check to this code with the goal to possibly integrate it with BBQ and have another test application to stress the integration. Maybe it could become also an interesting BBQ regression test for your building system.

If the code should be too complex to integrate with BBQ in few minutes... maybe the ImageDifference_Collaborative sample is another candidate for integration, what to you think? This examples is able to properly stress all the four clusters in parallel?

Let's sync this afternoon on this point...

Bye,
Germain

Ciao Patrick

Edoardo Paone

unread,

Apr 10, 2013, 11:37:52 AM4/10/13

to bosp-...@googlegroups.com, Germain Haugou

Hi Germain,

I'm trying to integrate the FaceDetect application with BBQ for ARM-xp70.

These libraries are required by the application:

libp12Convert_OCL.so

libp12DetectMultiScale_OCL.so

libSTHORM_hal.so

I found the first two in the SDK, in modules/vision/host/arm/

but I cannot find libSTHORM_hal.so

I think this library is needed to run DaceDetect. Can you provide it to me?

Thanks,

Edoardo

Germain HAUGOU

unread,

Apr 10, 2013, 11:46:04 AM4/10/13

to bosp-...@googlegroups.com

Hi Edoardo,

in fact this is a compilation issue. When we compile face detection, we put an invalid flag pointed to a non-existent library. At the end on our side the linker is not complaining and the execution runs fine. We saw the issue this week because we have tried to run it through gdb and it was complaining about that. This issue is now fixed, I will send you a SDK soon. On your side, is it blocking the linking phase or the execution ?

Germain

From: bosp-...@googlegroups.com [mailto:bosp-...@googlegroups.com] On Behalf Of Edoardo Paone
Sent: Wednesday, April 10, 2013 5:38 PM
To: bosp-...@googlegroups.com
Cc: Germain HAUGOU
Subject: Re: SThorm Power Budget Control - Mechanisms & Policy

Hi Germain,

Edoardo Paone

unread,

Apr 10, 2013, 11:52:05 AM4/10/13

to bosp-...@googlegroups.com, Germain Haugou

Only the execution phase is blocking, I get this error:

link_image[1921]: 1149 could not load needed library '/data/bosp/facedetect/libp12DetectMultiScale_OCL.so' for '/data/bosp/facedetect/detectTest_ahead' (link_image[1936]: 1149 could not load needed library 'libp12Convert_OCL.so' for 'libp12DetectMultiScale_OCL.so' (link_image[1936]: 1149 could not load needed library 'libSTHORM_hal.so' for 'libp12Convert_OCL.so' (load_library[1091]: Library 'libSTHORM_hal.so' not found)))CANNOT LINK EXECUTABLE

Edoardo

Germain HAUGOU

unread,

Apr 10, 2013, 11:55:56 AM4/10/13

to Edoardo Paone, bosp-...@googlegroups.com

This is very strange on our side there was no issue. This library does not exist since a long time, you can just take any library and copy it with the same name, this should work.

Germain

Germain HAUGOU

unread,

Apr 10, 2013, 12:03:33 PM4/10/13

to Patrick Bellasi, bosp-...@googlegroups.com

Hi Patrick,

I think the OpenCL examples are not good candidates because they don’t stress a lot the architecture. I hope you will be able to integrate face detection because with it we can configure the number of clusters, frames, etc, it is more interesting. Moreover it seems we’ll release the source code of this application end of April.

Ciao,

Germain

From: Patrick Bellasi [mailto:derk...@gmail.com]
Sent: Wednesday, April 10, 2013 1:19 PM
To: bosp-...@googlegroups.com; Germain HAUGOU
Subject: Re: SThorm Power Budget Control - Mechanisms & Policy

On Wed, Apr 10, 2013 at 1:06 PM, Germain HAUGOU <germain...@st.com> wrote:

Giuseppe Massari

unread,

Apr 10, 2013, 1:44:10 PM4/10/13

to bosp-...@googlegroups.com, Patrick Bellasi

Hi Germain,

Basically i've completed the integration of the power management part. Actually, i'm using the SDK version 145.

Does this version of the SDK already fill the data structure with the energy data?

2013/4/10 Germain HAUGOU <germain...@st.com>

--

You received this message because you are subscribed to the Google Groups "BOSP Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bosp-devel+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Germain HAUGOU

unread,

Apr 10, 2013, 2:09:18 PM4/10/13

to bosp-...@googlegroups.com, Patrick Bellasi

Hi Giuseppe,

indeed it already fills-in the energy data. The only things to do is to add the option --power to p12run and use the protocol I’ve sent. The runtime will write a value only if the field hostReady is set to 1.

Bye,

Germain

Germain HAUGOU

unread,

Apr 10, 2013, 3:56:20 PM4/10/13

to Edoardo Paone, bosp-...@googlegroups.com

Hi Edoardo,

thanks a lot for the instructions, I was able to debug the issue.

The application is indeed executing 4 work-groups during 50% and 1 work-group during 50% of the time thus we should see the impact of the quotas.

The issue is that I forgot the OpenCL runtime needs to be informed of the execution ID. In the previous face detection integration we did some time ago, there was this code for registering it :

FaceDetectEXC::FaceDetectEXC(char *argv[],

std::string const & name,

std::string const & recipe,

RTLIB_Services_t * rtlib) :

BbqueEXC("fd", "facedetection", rtlib) {

char *execIdStr = (char *)malloc(256);

snprintf(execIdStr, 256, "%d", GetUid());

setenv("P2012_EXC_ID", execIdStr, 1);

picture_path = argv[1];

accuracy = atoi(argv[2]);

face_min = atoi(argv[3]);

face_max = atoi(argv[4]);

width = 512;

height = 512;

image_type = 1;

size = width*height;

data = (unsigned char *)malloc(size*3);

}

In your case you should add the following lines before you start using the OpenCL runtime:

char *execIdStr = (char *)malloc(256);

snprintf(execIdStr, 256, "%d", GetUid());

setenv("P2012_EXC_ID", execIdStr, 1);

This is used by the OpenCL runtime to know which constraint corresponds to which process. Could you integrate that in your code and sends me again the binaries ?

Ciao,

Germain

From: Edoardo Paone [mailto:pa...@elet.polimi.it]

Sent: Wednesday, April 10, 2013 12:24 PM
To: bosp-...@googlegroups.com
Cc: Germain HAUGOU

Subject: Re: SThorm Power Budget Control - Mechanisms & Policy

Hi Germain,

---------------------------------------------------------

Edoardo Paone

Ph.D. student

Politecnico di Milano

Dipartimento di Elettronica e Informazione

Via Giuseppe Ponzio 34/5

20133 Milano, Italy

Office: +39.02.2399.9688

E-mail: pa...@elet.polimi.it

--

Patrick Bellasi

unread,

Apr 11, 2013, 4:29:45 AM4/11/13

to bosp-...@googlegroups.com, Edoardo Paone

On Wed, Apr 10, 2013 at 9:56 PM, Germain HAUGOU <germain...@st.com> wrote:

Hi Germain!

The issue is that I forgot the OpenCL runtime needs to be informed of the execution ID. In the previous face detection integration we did some time ago, there was this code for registering it :

FaceDetectEXC::FaceDetectEXC(char *argv[],

                std::string const & name,
                std::string const & recipe,

                RTLIB_Services_t * rtlib) :
        BbqueEXC("fd", "facedetection", rtlib) {

char *execIdStr = (char *)malloc(256);

snprintf(execIdStr, 256, "%d", GetUid());
setenv("P2012_EXC_ID", execIdStr, 1);

picture_path =      argv[1];

accuracy     = atoi(argv[2]);
face_min     = atoi(argv[3]);

face_max     = atoi(argv[4]);

  width      = 512;

height     = 512;
image_type = 1;

  size = width*height;

data = (unsigned char *)malloc(size*3);
}

In your case you should add the following lines before you start using the OpenCL runtime:

char *execIdStr = (char *)malloc(256);

snprintf(execIdStr, 256, "%d", GetUid());
setenv("P2012_EXC_ID", execIdStr, 1);

This is used by the OpenCL runtime to know which constraint corresponds to which process. Could you integrate that in your code and sends me again the binaries ?

Me too I was not aware about that requirement.

Once I've added the GetUid API to the RTLib, I would expect that this setup could be somehow masked by the OpenCL library.

I'm wondering if it could make sense to have an API extension, I could envision two possibilities:

1. within your:

<sthorm_sdk>/modules/OpenCL/host/<platform>/include/oclUtil.h

which allows to easily setup the run-time manager.

For example, using a new dedicated function like:

clInitRunTimeManager(uint32_t excId);

2. by extending the standard OpenCL API call:

cl_context clCreateContextFromType (cl_context_properties *properties,

cl_device_type device_type,

void (*pfn_notify) (const char *errinfo,

const void *private_info,

size_t cb,

void *user_data),

void *user_data,

cl_int *errcode_ret)

and adding, in your implementation, the suppport for a new cl_context_properties property, for example

CL_CONTEXT_EXC_ID

which should be set to the value returned by GetUid()

Another solution, eventually, is to move the code you suggested directly into the RTLib.

Indeed, I could set this ENV variable when the RTLib is compiled for SThorm platform.

What do you think?

Ciao Patrick

Germain HAUGOU

unread,

Apr 11, 2013, 4:36:55 AM4/11/13

to bosp-...@googlegroups.com, Edoardo Paone

Hi Patrick,

I think the best is that the RTlib sets this environment variable, this will avoid OPenCL extensions which at always at the end painfull.

Indeed, looking into my old mails I recovered this one sent in November:

Hi Patrick, Giuseppe,

I have good news, I was able to make the connection between BBQ and the OpenCL runtime. For that I managed to get the UID and sends it to the fabric side, that now takes the corresponding constraint from the shared page. However I have a few problems that we should discuss:

- On Posix mode, the platform has no timing, thus the connection is only functional, the constraints are always respected.

- On ISS mode, I get some errors about timeout at the very beginning.

For the first point, there is nothing to do, this configuration will be used only for testing that we are able to run the applications. BBQ can still allocate resources but this no impact at the end on the scheduling.

For the second point, can I forward you the new SDK so that you reproduce the problem and have a look ? I would like to have more details about the error to understand if I can do something on platform on runtime side.

Ciao,

Germain

Indeed I have never sent you the sample code and we have stopped working on that after, probably because I switched to the porting on the eval board at this moment.

Ciao,

Germain

From: bosp-...@googlegroups.com [mailto:bosp-...@googlegroups.com] On Behalf Of Patrick Bellasi
Sent: Thursday, April 11, 2013 10:30 AM
To: bosp-...@googlegroups.com
Cc: Edoardo Paone
Subject: Re: SThorm Power Budget Control - Mechanisms & Policy

On Wed, Apr 10, 2013 at 9:56 PM, Germain HAUGOU <germain...@st.com> wrote:

--

Patrick Bellasi

unread,

Apr 11, 2013, 4:59:59 AM4/11/13

to bosp-...@googlegroups.com, Germain Haugou, Edoardo Paone

On Thu, Apr 11, 2013 at 10:36 AM, Germain HAUGOU <germain...@st.com> wrote:

Hi Patrick,

I think the best is that the RTlib sets this environment variable, this will avoid OPenCL extensions which at always at the end painfull.

Ok Germain!

I'll add that code to the RTLib, in the end it is also the most transparent solution from the application developer standpoint.

Just to know and in order to better identify the proper code place where to set this variable, when do you need the value from the OCL library side? I expect that you use that value each time you have to inject commands into the OpenCL queue, is that correct or no?

Indeed I have never sent you the sample code and we have stopped working on that after, probably because I switched to the porting on the eval board at this moment.

No problems on that point... right now we are tacking integration and it's the proper time to line up things and make them working ;-)

I could also anticipate that Edoardo has already verified different performances changing the AWM if this env variable is correctly configured... thus finally we are on the right path for an awesome MView demo as well! ;-)

Ciao,

Germain

Ciao Patrick

Germain HAUGOU

unread,

Apr 11, 2013, 12:57:30 PM4/11/13

to Patrick Bellasi, bosp-...@googlegroups.com, Edoardo Paone

Hi Patrick,

I need this environment variable when the first OpenCL API function is called because I read it only once during initialization. Thus we need to carefully choose where to put it. Do you have a way to make sure no OCL function is called before you set it ?

Ciao,

Germain

From: Patrick Bellasi [mailto:derk...@gmail.com]
Sent: Thursday, April 11, 2013 11:00 AM
To: bosp-...@googlegroups.com; Germain HAUGOU
Cc: Edoardo Paone
Subject: Re: SThorm Power Budget Control - Mechanisms & Policy

On Thu, Apr 11, 2013 at 10:36 AM, Germain HAUGOU <germain...@st.com> wrote:

Patrick Bellasi

unread,

Apr 11, 2013, 1:41:46 PM4/11/13

to Germain HAUGOU, bosp-...@googlegroups.com, Edoardo Paone

On Thu, Apr 11, 2013 at 6:57 PM, Germain HAUGOU <germain...@st.com> wrote:

Hi Patrick,

Hi Germain!

I need this environment variable when the first OpenCL API function is called because I read it only once during initialization. Thus we need to carefully choose where to put it. Do you have a way to make sure no OCL function is called before you set it ?

I would place the initialization in the onSetup pre-notifier, which is an internal RTLib method called just before to call the user-defined onSetup method.

This does not completely grant that there are not OpenCL calls before, since a programmer technically could place code in the BbqueEXC constructor.

However, according to the RTLib guidelines, it is suggested to use the constructor just for locals initialization while all the code, and resources acquisition code, should be placed into the onSetup.

I still have to investigate on a technical solution to enforce this usage of the RTLib API... however, for the time being, and especially for the purposes of the use-cases definition, I would suggest to go for that solution; at least if fits for good programmers and it is also forward compatible.