Hi all,
I am using element-wise operations heavily in deep learning so it's important to do it in parallel.
1. How do you guys solve this problem? Perhaps, one option is to write own parallel version of these operations by using threads.
2. Why do you think that these operations are not implemented in BLAS/OpenBLAS?
Thanks.