Using NativeBLAS and F2J in The Same Program

22 views
Skip to first unread message

nate...@curalate.com

unread,
Dec 13, 2017, 5:33:42 PM12/13/17
to Scala Breeze
I have a project where we compute a 1) very large matrix dot product and then perform a large loop of 2) smaller vector operations.  We would ideally like to use a native BLAS implementation for 1) and F2J for 2) as my hunch is that a native call is actually more expensive.  My motivation is partially from: https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala#L299-L300 and partially from determining that the System CPU usage is somewhat high during processing.  Is there currently any support for this?  I couldn't see a way to instantiate two different underlying implementations.

Thanks,
Nathaniel

David Hall

unread,
Dec 13, 2017, 5:40:44 PM12/13/17
to scala-...@googlegroups.com
There's not, but I actually think I did this specific tuning for you:


I profiled them fairly carefully to determine the tradeoff points, but maybe there's been some drift or I messed up.

F2J blas is almost never what you want in practice. The loops it generates don't seem to reliably turn off JVM's array bounds checks or something, which kills performance.

-- David

--
You received this message because you are subscribed to the Google Groups "Scala Breeze" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-breeze+unsubscribe@googlegroups.com.
To post to this group, send email to scala-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/scala-breeze/37858bab-70e2-4704-be42-5c8e43655f88%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

David Hall

unread,
Dec 13, 2017, 5:52:28 PM12/13/17
to scala-...@googlegroups.com
(also, in the code you linked, seems like you'd be better off using matrices?)

nate...@curalate.com

unread,
Dec 13, 2017, 6:23:21 PM12/13/17
to Scala Breeze
Ah interesting, I wonder why Spark uses F2J and specifically cites that it is faster.  In any case, thanks for the info and quick reply!


On Wednesday, December 13, 2017 at 2:40:44 PM UTC-8, David Hall wrote:
There's not, but I actually think I did this specific tuning for you:


I profiled them fairly carefully to determine the tradeoff points, but maybe there's been some drift or I messed up.

F2J blas is almost never what you want in practice. The loops it generates don't seem to reliably turn off JVM's array bounds checks or something, which kills performance.

-- David
On Wed, Dec 13, 2017 at 1:11 PM, <nate...@curalate.com> wrote:
I have a project where we compute a 1) very large matrix dot product and then perform a large loop of 2) smaller vector operations.  We would ideally like to use a native BLAS implementation for 1) and F2J for 2) as my hunch is that a native call is actually more expensive.  My motivation is partially from: https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala#L299-L300 and partially from determining that the System CPU usage is somewhat high during processing.  Is there currently any support for this?  I couldn't see a way to instantiate two different underlying implementations.

Thanks,
Nathaniel

--
You received this message because you are subscribed to the Google Groups "Scala Breeze" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-breeze...@googlegroups.com.

David Hall

unread,
Dec 13, 2017, 6:24:04 PM12/13/17
to scala-...@googlegroups.com
Maybe the code is older than my profiling it? I did that in mid-2015 I think

To unsubscribe from this group and stop receiving emails from it, send an email to scala-breeze+unsubscribe@googlegroups.com.

To post to this group, send email to scala-...@googlegroups.com.

nate...@curalate.com

unread,
Dec 13, 2017, 6:25:20 PM12/13/17
to Scala Breeze
The factors are distributed across nodes within Spark so using matrices isn't possible there, if that wasn't the case I would certainly agree :)


On Wednesday, December 13, 2017 at 2:52:28 PM UTC-8, David Hall wrote:
(also, in the code you linked, seems like you'd be better off using matrices?)
On Wed, Dec 13, 2017 at 2:40 PM, David Hall <david....@gmail.com> wrote:
There's not, but I actually think I did this specific tuning for you:


I profiled them fairly carefully to determine the tradeoff points, but maybe there's been some drift or I messed up.

F2J blas is almost never what you want in practice. The loops it generates don't seem to reliably turn off JVM's array bounds checks or something, which kills performance.

-- David
On Wed, Dec 13, 2017 at 1:11 PM, <nate...@curalate.com> wrote:
I have a project where we compute a 1) very large matrix dot product and then perform a large loop of 2) smaller vector operations.  We would ideally like to use a native BLAS implementation for 1) and F2J for 2) as my hunch is that a native call is actually more expensive.  My motivation is partially from: https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala#L299-L300 and partially from determining that the System CPU usage is somewhat high during processing.  Is there currently any support for this?  I couldn't see a way to instantiate two different underlying implementations.

Thanks,
Nathaniel

--
You received this message because you are subscribed to the Google Groups "Scala Breeze" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-breeze...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages