itemcf slower

16 views
Skip to first unread message

Clive Cox

unread,
Jan 7, 2014, 10:40:17 AM1/7/14
to graphchi...@googlegroups.com
Hi,

 I'm trying out itemcf from the collaborative filtering toolkit with latest build and it seems to be twice as slow in an informal test than the previous one I had running which is from Jan 2013 checkout of the project.

 I see now one needs to use --nshards=1 when before one didn't for itemcf. Could this be a reason? Or is it using less threads?

 Any ideas?

 As an example from logs on same dataset. Previous has:
INFO:     itemcf.cpp(update:340):    1976.43)   760000000 pairs compared     371577 written.

while latest at simlar time point has:

INFO:     itemcf.cpp(update:397):    1965.34)   330000000 pairs compared     104568 written.

 Thanks,

 Clive

Danny Bickson

unread,
Jan 8, 2014, 1:20:20 AM1/8/14
to graphchi-discuss
Hi Clive, 
This is strange. We were not aware of this performance hit. 

1) Can you verify that the macro
#define GRAPHCHI_DISABLE_COMPRESSION
is defined on your latest version?  This can easily explain x2 speedup.

2) There was one change in the code, that first top K results are computed, sorted and then
saved to disk. (Previously, we would save all results to disk, and later you had to sort 
and find the top K). This change consumes more memory but maybe slows down the run.

3) --nshards=1 should increase performance since it forces the full graph to be loaded into memory 
and thus saves a lot of disk reads.

Best,

Danny Bickson
Co-Founder
GraphLab Inc.


--
You received this message because you are subscribed to the Google Groups "graphchi-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to graphchi-discu...@googlegroups.com.
To post to this group, send email to graphchi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/graphchi-discuss/ea024cce-4231-40e7-8913-2fa9aab6a33a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Aapo Kyrola

unread,
Jan 8, 2014, 2:07:48 AM1/8/14
to graphchi...@googlegroups.com
 

However, if the application requires more memory than you have, your computer will "swap" and it can run much slower.

Aapo Kyrola
Ph.D. student, http://www.cs.cmu.edu/~akyrola
GraphChi: Big Data - small machine: http://graphchi.org
twitter: @kyrpov

Reply all
Reply to author
Forward
0 new messages