task offer

Ain Uljas

unread,

May 2, 2008, 12:27:32 PM5/2/08

to ulno...@googlegroups.com

1) get /opt/ibm/usr/bin/fdprpro -V running. Some mystical library
errors. (It's a ppu/spu program optimizer/profiler). Seems to be very
useful. Documentation in /opt/ibm/usr/share/doc.
2) Write or find a working exampe of stopping spu[i] and resuming later
where it left off [to do func1 or func2 as requested].
3) Build a sensible C++ interface to my ppu progam (in /cg_ps3).
4) Investigate ALF and/or DACS and port my code in /cg_ps3 on it/them.

This is is the list of things I would do after completing internal
issues (won't happen in near future).
Tried 1) , I failed:)

Ain

Ulrich Norbisrath

unread,

May 4, 2008, 3:47:06 PM5/4/08

to ulno...@googlegroups.com

Hi all.
Talking about communication - anybody likes to comment on this?

Thanks for your info, Ain.

Cheers,
Ulrich

--
title+name: Dr. Ulrich Norbisrath
web: http://ulno.net; address+phone+fax: http://ulno.net/contact
google:unorb...@gmail.com; icq:46786247
mailto:u...@ulno.net

Toomas Laasik

unread,

May 4, 2008, 5:31:00 PM5/4/08

to ulno...@googlegroups.com, toomas...@ut.ee

Hi,
This is the second try to post to the list. The first time I got message
from googlegroups, that I don't have permissions for posting.

It seems someone has broken cg_cpp, I can't get it compiling. Since there
seems to be improvement of code I don't want to write tests for previous
revisions. So the one who made the last commit, please check the code.

One error is:
CgVectorLoader.h: No such file or directory

and if I comment this include out, I get:
/root/cg/cg_cpp/Debug/../main.cc:8: multiple definition of `main'
./cg_c.o:/root/cg/cg_cpp/Debug/../cg_c.cpp:115: first defined here

Question to Ain. Do you suggest that we should move our development from cg_cpp to cg_ps3 and try to rewrite the c++ code for the algorithm. We probably can use somewhat what Vladimir and Lauri have made in gc_cpp, but it seems quite heavy task. In my opinion we lack design and hack too much. I suggest that when we want to work as a team, we need to agree on design as soon as possible and have some sort of implementation plan.

About the tasks that Ain proposed, they are sensible. I would add there one more:
5) Try using Cell's own BLAS implementation (that is natevly usable in C). Even tho it don't support sparse matrixes, we only need to implement dot product between sparce matrix and a dense vector. For all the other operations we can use all the goodies provided by cell's blas library. Since I can't really write tests to code that won't compile and the other branch is pretty incomplete (and not in c++), I will take an initial look into that thing.

One thing I start to miss more and more is the place where we could hold project specific docs. Starting from compiling instructions and ending with results. I personally prefer DokuWiki mostly because it looks clean and keeps history of all the changes to pages (like svn). Alternately we could use Google labs page for that. What do you think?

PS. I installed svn client on PS3 01 for convinience.

Toomas

toomas_laasik.vcf

Lauri Tulmin

unread,

May 5, 2008, 3:37:54 AM5/5/08

to ulno...@googlegroups.com

CgVectorLoader.h is my bad. Just remove the import. 2 main definitions is because oleg added a template sample. Rename the second main to main2 or sth and it should be ok. About blas are you sure that cell blas implementation is not just optimized for spu? By that i mean it just contains operations for using blas inside your spu program or does it really separate work between spus?

Lauri Tulmin

unread,

May 5, 2008, 4:34:30 AM5/5/08

to ulno...@googlegroups.com

1) has a look at it, but wasn't able to fix it. should try on fedora machine, but i have forgotten its address.

Toomas Laasik

unread,

May 5, 2008, 10:14:04 AM5/5/08

to ulno...@googlegroups.com

Tahe a look at this spec:
http://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/F6DF42E93A55E57400257353006480B2?Open&S_TACT=105AGX16&S_CMP=LP

It is said there that the functions that can benefit from distributing
workload over SPUs, are implemented so. The functions where the
communication cost is too high are implemented using only PPU. So if we
were to implement our own functions using same or similar design then we
couldn't get much better performance that IBM's blas library. The only
design I can see at the moment from where we can benefit more is to keep
all the data always distributed in SPU local storage and not in main memory.

Toomas

toomas_laasik.vcf

Ain Uljas

unread,

May 5, 2008, 4:20:39 PM5/5/08

to ulno...@googlegroups.com

Hi, everyone!

To slightly ruin Ulrich's evil intentions about other finding bugs in my
code (and if not ... !!!), I'll post some well known findings.

1) Maximum size sent to spu currently 16kB (the size of sparse matrix
or the vector)
2) Data sent to spu at once must be a multiple of 16 bytes (so, even
number of elements in both matrix and vector).
3) The size of local storage in spu is 256kB. Matrix, vector, answer
vector and spu program itself must fit to it at the same time.
4) Align your data before you calling "distribute1/2" function in ppu
code (-- The interface should be changed, I know).

Example (4-4): Dynamic reservation (16-byte alignment) using the
*posix_memalign()* function

(PPE program)

#define _XOPEN_SOURCE 600 // whatever that supposed to do -- A

#include <stdlib.h>

char *buffer;

ret = posix_memalign(&buffer, 16, 1024);

These limitations except 4) are about disappear (for you) in the future,
but it's a good thing to be aware of them and/or have some tests beforehand.

>
> Question to Ain. Do you suggest that we should move our development from cg_cpp to cg_ps3 and try to rewrite the c++ code for the algorithm. We probably can use somewhat what Vladimir and Lauri have made in gc_cpp, but it seems quite heavy task. In my opinion we lack design and hack too much. I suggest that when we want to work as a team, we need to agree on design as soon as possible and have some sort of implementation plan.
>

1) I suggest and prefer my code integrated to yours.
2) I am currently using two different kind of data representation with
slightly different implementation of (sparse) matrix-vector
multiplication (spu/spu2). The second is probably faster, but just in
case, a test(s) to prove that ASAP would be nice.

>
> 5) Try using Cell's own BLAS implementation (that is natevly usable in C). Even tho it don't support sparse matrixes, we only need to implement dot product between sparce matrix and a dense
>

Something for me again. But as Lauri said, its better to get one
function working first.

Finally, a very informative page about ppu/spu programming (see the last
link listed) once again. Chapter 4 is pretty new:)
http://www.kernel.org/pub/linux/kernel/people/geoff/cell/CELL-Linux-CL_20080201-ADDON/doc/