I used the following command to install the numpy to enable the SSE3
numpy-1.5.1rc1-win32-superpack-python3.1.exe /arch sse3
Then how can I know whether numpy is running with SSE or not?
I have a program to process the data from sql server using java to process 600M rows, it takes 7 hours to complete, about 4 hours is eating the cpu. I am wondering whether I can port the java to numpy to cut the 4 hours to 2hours or even less by enabling the SSE3. Any comment?
> I used the following command to install the numpy to enable the SSE3
> numpy-1.5.1rc1-win32-superpack-python3.1.exe /arch sse3
>
> Then how can I know whether numpy is running with SSE or not?
As far as I know, the only thing that uses SSE/SSE2/SSE3 would be BLAS operations. Things like elementwise addition, multiplication, etc. are not implemented to take advantage of vectorized machine instructions, at least not yet, unless the C compiler is aggressively optimizing and doing some loop unrolling which I sort of doubt.
> I have a program to process the data from sql server using java to process 600M rows, it takes 7 hours to complete, about 4 hours is eating the cpu. I am wondering whether I can port the java to numpy to cut the 4 hours to 2hours or even less by enabling the SSE3. Any comment?
It's not clear that crunching data from an SQL database would be any faster with NumPy. It really depends on the specifics of your problem.
David
_______________________________________________
NumPy-Discussion mailing list
NumPy-Di...@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
> day 1,2,3 have the non-promoted sales, day 4 have the promoted sales, day 5,6,7 have the non-promted sales, the output for day 1~7 are all non-promoted sales. During the process, we might need to sum all the data for day 1~7, is this what you called " elementwise addition, multiplication", which can't be SIMDed in numpy?
Really the only thing that can be SIMDed with SSE/SSE2/SSE3 is matrix-matrix or matrix-vector multiplies, i.e. things that involve calls to the BLAS. NumPy will perform the summations you mention with efficient loops at the C level but not using SSE. I don't know how much of a speed boost this will provide over Java, as the JVM is pretty heavily optimized.
The whole point of the super pack installer is to install the most
optimized one possible on your machine. So you should not use the arch
flag (it is meant for people who want to explicity install something
which is not the most optimal one).
As for your program, it depends too much on what you are doing. Keep
in mind that java with the appropriate JVM is pretty fast,
cheers,
Thank David,the java program takes 3 hours to read data, after read the data into memory, it takes 4 hours to process/calculate somthing on all these data.The data is the sale data which contains both promoted sale and non-promoted sale, the program needs to predict the non-promoted sale: so input data is a serial of promoted sale and non-promoted sale, the output is a serial of non-promoted sale. e.gday 1,2,3 have the non-promoted sales, day 4 have the promoted sales, day 5,6,7 have the non-promted sales, the output for day 1~7 are all non-promoted sales. During the process, we might need to sum all the data for day 1~7, is this what you called " elementwise addition, multiplication", which can't be SIMDed in numpy?