What's the right dense FM data format (example)

60 views
Skip to first unread message

Andres R

unread,
Nov 13, 2019, 7:59:56 AM11/13/19
to libFM - Factorization Machines
My question: is the attached datasheet a correct example of a dense matrix for the use in factorization machines.

The column represent the following: Users and Movies are simply indicator dummy variables. Other movies viewed are the other movies the user viewed and those are normalized by dividing the indicator variable (1) by the total number of movies the user viewed. Day of month represents the day in the month so it goes from 1 to 31. Last movie viewed show the last film the user viewed at time t. All of this is standard as in the Rendel (2010) paper (https://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf).

I get confused is when it comes to adding other information regarding the movies and users: for example, I want to add the Number of actresses and actors by movie. Since it is not a category, can I just add the column and then standardize it to be N(0,1)? I could turn these variables into a dummy variable because these variables have a finite cardinality. What's best practice in this case?

Further, I want to add a Movie genre variable: I could make this a categorical variable and then apply get_dummies to it (Python). This I haven't done in the attached dataset but it seems the most natural choice. What's best practice?

Lastly, I have a user-specific variable that measures the number of minutes a user spent watching a certain genre over the last 7 days (Number of minutes user watched genre in last 7 days). I would again transform this variable using a standardization (N(0,1). This is not a category as it has high cardinality - can I just keep it as a separate column?

Thanks
Factorization Machine Data.csv
Reply all
Reply to author
Forward
0 new messages