This is is the list of things I would do after completing internal
issues (won't happen in near future).
Tried 1) , I failed:)
Ain
Thanks for your info, Ain.
Cheers,
Ulrich
--
title+name: Dr. Ulrich Norbisrath
web: http://ulno.net; address+phone+fax: http://ulno.net/contact
google:unorb...@gmail.com; icq:46786247
mailto:u...@ulno.net
It seems someone has broken cg_cpp, I can't get it compiling. Since there
seems to be improvement of code I don't want to write tests for previous
revisions. So the one who made the last commit, please check the code.
One error is:
CgVectorLoader.h: No such file or directory
and if I comment this include out, I get:
/root/cg/cg_cpp/Debug/../main.cc:8: multiple definition of `main'
./cg_c.o:/root/cg/cg_cpp/Debug/../cg_c.cpp:115: first defined here
Question to Ain. Do you suggest that we should move our development from cg_cpp to cg_ps3 and try to rewrite the c++ code for the algorithm. We probably can use somewhat what Vladimir and Lauri have made in gc_cpp, but it seems quite heavy task. In my opinion we lack design and hack too much. I suggest that when we want to work as a team, we need to agree on design as soon as possible and have some sort of implementation plan.
About the tasks that Ain proposed, they are sensible. I would add there one more:
5) Try using Cell's own BLAS implementation (that is natevly usable in C). Even tho it don't support sparse matrixes, we only need to implement dot product between sparce matrix and a dense vector. For all the other operations we can use all the goodies provided by cell's blas library. Since I can't really write tests to code that won't compile and the other branch is pretty incomplete (and not in c++), I will take an initial look into that thing.
One thing I start to miss more and more is the place where we could hold project specific docs. Starting from compiling instructions and ending with results. I personally prefer DokuWiki mostly because it looks clean and keeps history of all the changes to pages (like svn). Alternately we could use Google labs page for that. What do you think?
PS. I installed svn client on PS3 01 for convinience.
Toomas
It is said there that the functions that can benefit from distributing
workload over SPUs, are implemented so. The functions where the
communication cost is too high are implemented using only PPU. So if we
were to implement our own functions using same or similar design then we
couldn't get much better performance that IBM's blas library. The only
design I can see at the moment from where we can benefit more is to keep
all the data always distributed in SPU local storage and not in main memory.
Toomas
To slightly ruin Ulrich's evil intentions about other finding bugs in my
code (and if not ... !!!), I'll post some well known findings.
1) Maximum size sent to spu currently 16kB (the size of sparse matrix
or the vector)
2) Data sent to spu at once must be a multiple of 16 bytes (so, even
number of elements in both matrix and vector).
3) The size of local storage in spu is 256kB. Matrix, vector, answer
vector and spu program itself must fit to it at the same time.
4) Align your data before you calling "distribute1/2" function in ppu
code (-- The interface should be changed, I know).
Example (4-4): Dynamic reservation (16-byte alignment) using the
*posix_memalign()* function
(PPE program)
#define _XOPEN_SOURCE 600 // whatever that supposed to do -- A
#include <stdlib.h>
char *buffer;
ret = posix_memalign(&buffer, 16, 1024);
These limitations except 4) are about disappear (for you) in the future,
but it's a good thing to be aware of them and/or have some tests beforehand.
>
> Question to Ain. Do you suggest that we should move our development from cg_cpp to cg_ps3 and try to rewrite the c++ code for the algorithm. We probably can use somewhat what Vladimir and Lauri have made in gc_cpp, but it seems quite heavy task. In my opinion we lack design and hack too much. I suggest that when we want to work as a team, we need to agree on design as soon as possible and have some sort of implementation plan.
>
1) I suggest and prefer my code integrated to yours.
2) I am currently using two different kind of data representation with
slightly different implementation of (sparse) matrix-vector
multiplication (spu/spu2). The second is probably faster, but just in
case, a test(s) to prove that ASAP would be nice.
>
> 5) Try using Cell's own BLAS implementation (that is natevly usable in C). Even tho it don't support sparse matrixes, we only need to implement dot product between sparce matrix and a dense
>
Something for me again. But as Lauri said, its better to get one
function working first.
Finally, a very informative page about ppu/spu programming (see the last
link listed) once again. Chapter 4 is pretty new:)
http://www.kernel.org/pub/linux/kernel/people/geoff/cell/CELL-Linux-CL_20080201-ADDON/doc/
Hopefully this will be helpful.
Cheers!
Ain