My question: is the attached datasheet a correct example of a dense matrix for the use in factorization machines.
The column represent the following:
Users and Movies are simply indicator dummy variables.
Other movies viewed
are the other movies the user viewed and those are normalized by
dividing the indicator variable (1) by the total number of movies the
user viewed.
Day of month represents the day in the month so it goes from 1 to 31.
Last movie viewed
show the last film the user viewed at time t. All of this is standard
as in the Rendel (2010) paper
(
https://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf).
I get confused is when it comes to adding other information regarding the movies and users: for example, I want to add the Number of actresses and actors
by movie. Since it is not a category, can I just add the column
and then standardize it to be N(0,1)? I could turn these variables into
a dummy variable because these variables have a finite cardinality.
What's best practice in this case?
Further, I want to add a Movie genre
variable: I could make this a categorical variable and then apply
get_dummies to it (Python). This I haven't done in the attached dataset
but it seems the most natural choice. What's best practice?
Lastly,
I have a user-specific variable that measures the number of minutes a
user spent watching a certain genre over the last 7 days (Number of minutes user watched genre in last 7 days).
I would again transform this variable using a standardization (N(0,1).
This is not a category as it has high cardinality - can I just keep it as a separate column?
Thanks