SVD save_vectors write time

18 views
Skip to first unread message

Marc Tomlinson

unread,
Oct 18, 2013, 9:48:14 AM10/18/13
to graph...@googlegroups.com
Hello,

   We are currently exploring utilizing the Graphlab SVD algorithm for dimensionality reduction.  We are working on a matrix of 150,000 * 1.4 million entries with about 21 million non-zero entries. We're using a single machine with 12 hyperthreaded cores.  We have the ncpus set to 10, The SVD runs fairly quickly, about 20 minutes per iteration, however, the system is writing out the U and V vectors very slowly, 12+ hours for 120 singular values.  Is this expected behavior?

svd g.mmx --ncpus=6 --nsv=350 --nv=370 --max_iter=10 --save_vectors=true --rows=150254 --cols=1489971 --ortho_repeats=3 --tol=1e-04

Thanks,

Marc

Danny Bickson

unread,
Oct 18, 2013, 9:53:19 AM10/18/13
to graph...@googlegroups.com
Hi Marc, 
This is definitely not an expected behavior.. 
Can you check maybe you have kind of disk access error or problems when writing to disk? According to the problem size it should not take more than a couple of minutes. 
One potential example problem is if you are working in a /tmp/ directory which is mounted into memory, and when you run out of memory the system starts swapping and becomes very slow. 

Best, 


--
You received this message because you are subscribed to the Google Groups "GraphLab API" group.
To unsubscribe from this group and stop receiving emails from it, send an email to graphlabapi...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Marc Tomlinson

unread,
Oct 19, 2013, 9:10:41 AM10/19/13
to graph...@googlegroups.com
Looked at it a little more, it appears to be finalizing the graph every time it writes out a new singular value.  The write time is the rather significant re-allocation and filling of the memory for the graph.  Does this actually need to be done?  It doesn't seem like the graph is changing anymore at that point. 

Thanks,

Marc 

Danny Bickson

unread,
Oct 19, 2013, 12:27:07 PM10/19/13
to graph...@googlegroups.com
HI Marc,
We will look into that. The graph is supposed to be finalized already - so the finalize() code should be skipped
see: https://github.com/graphlab-code/graphlab/blob/master/src/graphlab/graph/distributed_graph.hpp#L697-L700

Thanks,


Danny Bickson
Co-Founder
GraphLab Inc.


--

Danny Bickson

unread,
Oct 19, 2013, 1:20:49 PM10/19/13
to graph...@googlegroups.com
Hi Marc,
Thanks for pinpointing this performance problem. As a workaround I have disabled dynamic graphs, which are an experimental new feature of graphlab. See change here; https://github.com/graphlab-code/graphlab/commit/b1fe391a11413ffb4546a52544ebb2ba1cf18621
Once we fix this issue we will re-enable this feature.

Please take the latest from github and recompile. Let us know if you it works for you.

Thanks!

Danny Bickson
Co-Founder
GraphLab Inc.


Danny Bickson

unread,
Oct 21, 2013, 2:27:11 PM10/21/13
to graph...@googlegroups.com, Haijie Gu
Hi Marc, 
Jay have kindly pushed a fix for the finalization issues. Please retake from github, recompile and let us know if the fix works for you.

Thanks!

Danny Bickson
Co-Founder
GraphLab Inc.

Marc Tomlinson

unread,
Oct 21, 2013, 5:00:25 PM10/21/13
to graph...@googlegroups.com
Thanks Jay and Danny - Runtimes are down to 5 minutes for the first 20 singular values of a 300,000 x 5 million matrix.   These seem much more reasonable. 




You received this message because you are subscribed to a topic in the Google Groups "GraphLab API" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/graphlabapi/WetIFAvpHco/unsubscribe.
To unsubscribe from this group and all its topics, send an email to graphlabapi...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages