Some noob level questions here regarding the Tensorflow Recommenders framework. Any insight appreciated.
- How can one specify the number of recommendations to generate per user? The default appears to be 10; where can this be overridden?
- How does the caller retrieve the generated recommendations along with the respective recommendation ratings?
_, titles = index(tf.constant(["42"]))
print(f"Recommendations for user 42: {titles[0, :3]}")
This retrieves the movie titles, in the tutorial, but not the rating values. - The generation of the embedding values e.g. for user ID's. Must these be contiguous integers? Can I re-use my own ID values? E.g. I have two users, user 1 with ID=123 and user 2 with ID=998. Can I use 123 and 998 or must
I map these ID's to 1 and 2? (I assume it's the latter approach but please clarify).
- Is there a way to instruct the recommenders code not to include user's history items in the generated recommenders? The idea is to avoid having to do my own post-filter.
- Is there a way to instruct the recommenders code not to include multiples of the same item in the generated recommendations? There was a sentence in the tutorial that led me to believe that duplicates might be present
(please clarify).
- During featurization/tokenization, have folks worked out the I18N aspects? Say, if I have text features in English and Spanish, are both an English tokenizer and a Spanish tokenizer available? How well would they work
with short strings? Is there a Language Identifier to wire in?
- What is a 'good' range of values for the top-100 accuracy? should it be close to 1? how close?
- How to bring RMSE down? When running some of the examples, RMSE tends to be > 1. Is there an optimal number of features to use, perhaps? Any suggestions as to the tuning of the hyperparameters to keep the RMSE below 1?
- How to scale/distribute the processing? If I have several million users and several million items and 3-4 features on users and 5-7 features on the items, what would be some of the approaches to scale this? I'm looking
at this writeup: https://towardsdatascience.com/scaling-up-with-distributed-tensorflow-on-spark-afc3655d8f95. Any examples of how to code up
a TFRS recommender that could run on Spark? or some other way to distribute. Seems like TensorFlowOnSpark is a way to go...
--
You received this message because you are subscribed to the Google Groups "Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
discuss+u...@tensorflow.org.
To view this discussion on the web visit
https://groups.google.com/a/tensorflow.org/d/msgid/discuss/cf05c7ed-fb5b-4828-a9f9-c59a702c2e33n%40tensorflow.org.