LibFM on large data

248 views
Skip to first unread message

fzl.l...@gmail.com

unread,
May 3, 2017, 6:12:12 AM5/3/17
to libFM - Factorization Machines
Hi,

I have recommendation data with users, items and 5000 contextual variables (5002 attributes) and about 23000 records.

I am trying to run libFM on this data using the optimization method SGDA.

For that, I split my data into train, test and validation sets. Then I converted libFM files into binary files, and run the libfm commande line for SGDA method with 100 iteration.

The problem is that it takes too much times: about two hours for each iterations, so I need about 200 hours for running with SGDA with 100 iteration, and still more time to tune parameters !

Is there a way to speed up the running time ?

Thank you!

Thierry Silbermann

unread,
May 3, 2017, 10:57:18 PM5/3/17
to libFM - Factorization Machines, fzl.l...@gmail.com
Could you paste here ths first 3 lines of your train dataset ?
How sparse is your data ?
I used libFM with 10 Millions rows and thousands of sparse features.
And I could run a model on 100 of iterations in less than a day.

What command line are you using inside the terminal ?

fzl.l...@gmail.com

unread,
May 4, 2017, 6:14:57 AM5/4/17
to libFM - Factorization Machines, fzl.l...@gmail.com
Thank you for your interest.

While writting an answer to you with information you requested, I realized that data was incorrectly converted to libfm format.

My data are highly sparse, but the libfm file is very dense.

I come back to you as soon as I have studied this problem.

Thanks again

wenming xu

unread,
Jul 28, 2017, 6:41:52 AM7/28/17
to libFM - Factorization Machines, fzl.l...@gmail.com
Hi, thanks for your sharing.You just mentioned that you train >10 milion rows and thouands of sparse features.
Have you ever try convert the train data into Binary using 'Convert ' command? I just hang there for hours when my dataset is more than 4GB.
My server configuration is good enough with 32 cores and 64GB cache and TB disks.
BTW, how large is your train dataset? Any suggestion? Thanks a ton!



在 2017年5月4日星期四 UTC+8上午10:57:18,Thierry Silbermann写道:
Reply all
Reply to author
Forward
Message has been deleted
0 new messages