--
You received this message because you are subscribed to the Google Groups "OpenBLAS-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openblas-user...@googlegroups.com.
To post to this group, send email to openbla...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Hi Xianyi,Thanks very much for the DLLs. I get the code running without any problem.However, there is one more thing I really need your help. I used openBlas under the Mac OS before and I used the LLVM. I remember the speed was super fast. At least it is much faster than Intel MKL. Furthermore, I can also turn on the parallel running, where I mean I can use all the cores to run the code. I was thus very happy with this library.However, I have to change back to Windows recently. And I have to use Visual studio 2013 (C++). I get two problems:One is that I cannot turn on the parallel!!!???? My environment is windows 7 64bit + visual studio 2013+Armadillo+openblas. My code is built on 64bit + release. I use the openBlas binary version for windows, downloaded from your website. I have turn on all the parallel setting in visual studio and I tried to add OPENBLAS_NUM_THREADS as the system environment. I tried to add OPENBLAS_NUM_THREADS in visual studio. I tried to add #include "cblas.h" + openblas_set_num_threads(8) as the runtime setting. But my CPU is always below 15%. Obviously the parallel does not work. Does this mean I have to compile the library by myself and the binary version on your website doesn't support parallel, or I did somewhere wrongly? Could you kindly help me?The other thing is that I find the speed is slow under Windows. At least it is much slower than that under LLVM & Mac OS. Maybe it is because the parallel is not turned on. I attach the code I used below. Is this normal?The CPU in my new workstation is Intel i7 4900MQ (2.8GHz). This is a very powerful CPU. The funny thing is that I run the same code in my another small laptop, which the CPU is only Intel i5-2520M (2.5GHz). In the small laptop, I use the Intel C++ and Intel MKL as the compiler. With the same code below, the running time is THE SAME!!!! How is this possible? In my impression, openBlas should be faster than Intel MKL. I am confused. Maybe you have better idea.Thank you very much for your help. Because my work is machine learning, so your library is so important for me.Best wishes,Ying#include "stdafx.h"
#include <stdio.h>
#include <stdlib.h>
#include <conio.h>
#include <iostream>
#include <time.h>
#include "armadillo"
#include "cblas.h"
using namespace arma;
using namespace std;
int main()
{
openblas_set_num_threads(8);
//goto_set_num_threads(4);
cout << openblas_get_config() << openblas_get_parallel()<<endl;
//int m = 2000, p = 200, n = 1000;
int m = 100, p = 10, n = 100;
double alpha = 1.0, beta = 0.0;
mat A = zeros<mat>(m, p);
for (int i = 0; i < m; i++)
{
for (int j = 0; j < p; j++)
{
A(i, j) = (double)(i*p + j + 1);
}
}
mat B = zeros<mat>(p, n);
for (int i = 0; i < p; i++)
{
for (int j = 0; j < n; j++)
{
B(i, j) = -(i* n + j + 1);
}
}
mat C = zeros<mat>(m, n);
arma::wall_clock timer;
timer.tic();
clock_t start = clock();
for (int i = 0; i < 400000; i++)
{
C = alpha*(A*B) + beta*C;
}
double diff = (clock() - start) / (double)CLOCKS_PER_SEC;
std::cout << diff << std::endl;
printf("\n Computations completed.\n\n");
cout << "took " << timer.toc() << " seconds" << endl;
getch();
return 0;
}
OpenBLAS-v0.2.12-Win64-int32.zip
Hi Xianyi,I already tried 1,2,4,8,16, but no difference actually. This is why I suspected the binary version directly downloaded or the VC++. This is quite annoying. I have to try the MinGW, but not preferable for me to change the compiler. Any thinking? Thanks very much.Best wishes,Ying
That is the binary packages from http://www.openblas.net/ and OpenBLAS-v0.2.12-Win64-int32.zip is what I am trying now. Sorry didn't make it clear. Thanks, Xianyi.Ying
Hi Xianyi,I am attaching the armadillo configuration file.There is some new findings: I finally set up MinGW and MSYS. I compiled the openblas from the source code in my workstation and compare it with VC++ 12.Without openBlas and only Armadillo:VC 37.62sGCC 22.28s** Thus the GCC can be faster than VC++. Without openBlas, it is the Armadillo do the matrix computation.With openBlas and Armadillo:VC 16.50sGCC 7.5s** I can see openBlas improve the speed greatly. At least, finally I have good speed now.HOWEVER, why I still cannot run the code in parallel???? I try what I have tried and the CPU is always under 20%. If I want to use the multi-cores with openBlas, how can I do it under Windows? The openblas is compiled with NO_AFFINITY. Should I set NO_AFFINITY=0 and use openMP?Thanks,Ying