results of PMF in collaborative filtering toolkit

37 views
Skip to first unread message

James Jensen

unread,
May 30, 2014, 8:53:53 PM5/30/14
to graphchi...@googlegroups.com
I'm confused about the results of running the Bayesian probabilistic matrix factorization method. Other methods such as ALS output two files, [name of input file]_U.mm and [name of input file]_V.mm. When I run PMF, neither of these is created, nor is any other file that seems like it's supposed to take their place.

If I add the "--pmf_additional_output=1" option, I get a U.mm and V.mm file for every sampling iteration of the algorithm. I suppose I could average the products of all of these to get the final reconstructed matrix. But I expected this would be done automatically and that the
"--pmf_additional_output=1" would be only for looking under the hood. But if so, where is the normal output?

I'm running it like this:

pmf --training=data.mtx --minval=0 --maxval=1 --D=20 --max_iter=100 --quiet=1 --pmf_burn_in=5



This is with the latest version pulled from GitHub.

I apologize if I'm overlooking something painfully obvious here.

Danny Bickson

unread,
May 30, 2014, 11:35:38 PM5/30/14
to graphchi-discuss
Hi James, 
You assume correctly, you need to average the products of each user and item pair (omitting the first ones to allow for burn in period).
If there are M users and N items there are M*N*t products to average (where t is the number of iterations you run minus the burn in period).

Are you looking for the top K products? or products over a test set? if so, you can use the --test=filename command line argument.
If you are looking for the top K products you will need to compute them youself,

Let us know if this is clearer.


  Danny Bickson
Co-Founder
US phone: 206-691-8266
Israeli phone: 073-7312889
 



--
You received this message because you are subscribed to the Google Groups "graphchi-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to graphchi-discu...@googlegroups.com.
To post to this group, send email to graphchi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/graphchi-discuss/f922de03-a28d-40c4-8e95-4e7ce404fd03%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

James Jensen

unread,
May 31, 2014, 7:37:47 PM5/31/14
to graphchi...@googlegroups.com
Thanks, Danny. That makes sense. And I'm glad you reminded me about the --test=filename option. I'm not using that now but I think I will later on.

Just to clarify: I notice that, if I do 100 iterations with 5 burn-in iterations, it outputs U.mm and V.mm files indexed 0-94 only. Am I right that they are numbered in reverse order, and that 99-95 were from the burn-in? I.e. the user/item pairs from the burn-in period are omitted automatically?

Best,

James
Reply all
Reply to author
Forward
0 new messages