weights + sparsity + nonnegativity?

19 views
Skip to first unread message

James Jensen

unread,
May 1, 2014, 7:09:51 PM5/1/14
to graphchi...@googlegroups.com
Hi,

I'm working on a one-class collaborative filtering problem. I've used GraphChi's weighted ALS and sparse ALS. I'm wondering if there are plans to include a method that both incorporates weights and factorizes to sparse user and/or item matrices.

For my application it would also be nice if nonnegativity were a possibility. GraphChi's collaborative filtering toolkit has NMF, but not sparse NMF or weighted sparse NMF. I found this paper on weighted sparse NMF. Since NMF can be done by non-negativity-constrained ALS, this doesn't look too difficult mathematically, so I'm even tempted to try to write my own implementation. But it would not have the benefit of the optimizations and efficiency of the methods in GraphChi.

Ideally, there would be a general-purpose ALS method that one could invoke with different parameters to make it non-negative, weighted, sparse, etc.

Alternatively, am I overlooking a way to achieve this sort of thing with what is already in GraphChi? For weights and sparsity, the closest I could come was, taking an idea from Pan et al. (2008), to do an ensemble of sparse ALS analyses, for each of which I use the weights as the probability of changing each entry to the opposite value (1 to zero, zero to 1), and average the results. But this wouldn't yield quite the desired result, because averaging destroys the sparsity: if an entry in the item matrix is nonzero once out of 1000 times, the average will be nonzero, whereas presumably if I had an implementation that could do the analysis in a single go (i.e. without the ensemble bit), such an entry would be zero. I suppose I could have the zero/nonzero status of each entry in each factorization be a vote for whether to call the entry zero or nonzero, apply some threshold % of votes, and if it's voted to be nonzero, average the values... but that strikes me as not being a principled way of doing it. And it lacks the non-negativity.

Any suggestions?

Thanks for your help,

James

Danny Bickson

unread,
May 2, 2014, 10:59:38 AM5/2/14
to graphchi-discuss
Hi James, 
From my experience one class problems are not solved well with matrix factorization methods. If you have additional features I suggest taking a look at GraphChi gensgd. 
Sparse MF methods are often used when you want to cluster the results together, thus finding similar users or items. 

I tried looking at the paper you sent but the link is broken. Please resend.

  Danny Bickson
Co-Founder
US phone: 206-691-8266
Israeli phone: 073-7312889
 



--
You received this message because you are subscribed to the Google Groups "graphchi-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to graphchi-discu...@googlegroups.com.
To post to this group, send email to graphchi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/graphchi-discuss/c62c92f3-1fbf-4a88-a882-39c24ebc253f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

James Jensen

unread,
May 5, 2014, 7:54:25 PM5/5/14
to graphchi...@googlegroups.com
Hi, Danny,

Thanks for the quick reply.

I didn't know that one-class problems aren't solved well with matrix factorization; I am relatively new to collaborative filtering methods. I'm glad you told me. What classes of methods are more successful for one-class problems?

I have seen your blog posts about gensgd, and it looks promising. I do have additional features I could include. I'd like to understand more about how the method works. Could you point me to any additional info on it?

The reason I hoped to use the sparse MF methods is that they facilitate extracting meaning from the model itself, rather than it being only a black box that gives good predictions. As you know, with sparse methods a limited number of samples and/or features is associated with each latent factor, and so one can get further in associating the latent factors with some known entity/process/etc. I was looking at doing something similar to the cancer example toward the end of the paper on sparse latent semantic analysis that is referenced in connection with GraphChi's sparse ALS implementation.

I don't want to take your time with too much info about my particular problem, but here is some if it is relevant:

I'm working with mutation data. My prediction question is, given a mutation profile from a tumor, what other mutations would such a tumor be most likely to acquire? Or perhaps more important, what mutations would they be least likely to acquire (some of these may be potential therapeutic targets). On the model-interpretation side, the latent factors could be seen as cellular pathways or processes; one could use the sparse list of genes associated with each to try to relate it to known pathways, and use the sparse list of samples to see in which subsets of patients each factor is operative.

My reason for wanting a weighted method: the data is binary (0=not mutated, 1=mutated). Not all mutations are of equal functional significance. Some are "drivers" to which the model should be fit, and some are "passengers" to which it shouldn't, with a whole spectrum in between. And conversely, non-mutations carry varying degrees of information; not all of them should be treated simply as missing values, but not all should be fit to as explicit zero values either. There are decent methods for assigning a likelihood of a mutation being a driver, and I thought perhaps I could use these as weights a la weighted ALS. Perhaps the features that are used to identify drivers could be fed into gensgd directly, rather than performing a separate weighting step and a factorization step... I'm not sure.

Thanks again for your help,

James

P.S. Here is the weighted sparse NMF paper I mentioned, though on second reading I am less confident that I should bring it to your attention.

Danny Bickson

unread,
May 6, 2014, 2:43:15 AM5/6/14
to graphchi-discuss
Hi James, 
Thanks for the detailed reply. Some comments below.

  Danny Bickson
Co-Founder
US phone: 206-691-8266
Israeli phone: 073-7312889
 



On Tue, May 6, 2014 at 2:54 AM, James Jensen <jdje...@eng.ucsd.edu> wrote:
Hi, Danny,

Thanks for the quick reply.

I didn't know that one-class problems aren't solved well with matrix factorization; I am relatively new to collaborative filtering methods. I'm glad you told me. What classes of methods are more successful for one-class problems?
This is, of course, my personal take, some others will argue. 

I have seen your blog posts about gensgd, and it looks promising. I do have additional features I could include. I'd like to understand more about how the method works. Could you point me to any additional info on it?
The method is described in the following paper: http://www.inf.uni-konstanz.de/~rendle/pdf/Rendle2010FM.pdf
Factorization Machines by Steffen Rendle. In GraphChi we implement a subset of the described technique.
 

The reason I hoped to use the sparse MF methods is that they facilitate extracting meaning from the model itself, rather than it being only a black box that gives good predictions. As you know, with sparse methods a limited number of samples and/or features is associated with each latent factor, and so one can get further in associating the latent factors with some known entity/process/etc. I was looking at doing something similar to the cancer example toward the end of the paper on sparse latent semantic analysis that is referenced in connection with GraphChi's sparse ALS implementation.
My initial reaction is that it is very hard to know which method will work best, I suggest trying some of the methods we have implemented and finding which ones gives a better output for your data.  

I don't want to take your time with too much info about my particular problem, but here is some if it is relevant:

I'm working with mutation data. My prediction question is, given a mutation profile from a tumor, what other mutations would such a tumor be most likely to acquire? Or perhaps more important, what mutations would they be least likely to acquire (some of these may be potential therapeutic targets). On the model-interpretation side, the latent factors could be seen as cellular pathways or processes; one could use the sparse list of genes associated with each to try to relate it to known pathways, and use the sparse list of samples to see in which subsets of patients each factor is operative.

My reason for wanting a weighted method: the data is binary (0=not mutated, 1=mutated). Not all mutations are of equal functional significance. Some are "drivers" to which the model should be fit, and some are "passengers" to which it shouldn't, with a whole spectrum in between. And conversely, non-mutations carry varying degrees of information; not all of them should be treated simply as missing values, but not all should be fit to as explicit zero values either. There are decent methods for assigning a likelihood of a mutation being a driver, and I thought perhaps I could use these as weights a la weighted ALS. Perhaps the features that are used to identify drivers could be fed into gensgd directly, rather than performing a separate weighting step and a factorization step... I'm not sure.
If you like send me a personal email we can setup a phone call to discuss your problem and give some advice. 
Reply all
Reply to author
Forward
0 new messages