Fwd: Noob level questions on Tensorflow Recommenders

Bairen YI

unread,

Dec 21, 2020, 1:16:38 AM12/21/20

to recomm...@tensorflow.org

+SIG-Recommenders

Begin forwarded message:

From: Dmitry Goldenberg <dgolden...@gmail.com>

Subject: Noob level questions on Tensorflow Recommenders

Date: December 20, 2020 at 06:27:39 GMT+8

To: Discuss <dis...@tensorflow.org>

Some noob level questions here regarding the Tensorflow Recommenders framework. Any insight appreciated.

How can one specify the number of recommendations to generate per user? The default appears to be 10; where can this be overridden?
How does the caller retrieve the generated recommendations along with the respective recommendation ratings?
_, titles = index(tf.constant(["42"]))
print(f"Recommendations for user 42: {titles[0, :3]}")
This retrieves the movie titles, in the tutorial, but not the rating values.
The generation of the embedding values e.g. for user ID's. Must these be contiguous integers? Can I re-use my own ID values? E.g. I have two users, user 1 with ID=123 and user 2 with ID=998. Can I use 123 and 998 or must I map these ID's to 1 and 2? (I assume it's the latter approach but please clarify).
Is there a way to instruct the recommenders code not to include user's history items in the generated recommenders? The idea is to avoid having to do my own post-filter.
Is there a way to instruct the recommenders code not to include multiples of the same item in the generated recommendations? There was a sentence in the tutorial that led me to believe that duplicates might be present (please clarify).
During featurization/tokenization, have folks worked out the I18N aspects? Say, if I have text features in English and Spanish, are both an English tokenizer and a Spanish tokenizer available? How well would they work with short strings? Is there a Language Identifier to wire in?
What is a 'good' range of values for the top-100 accuracy? should it be close to 1? how close?
How to bring RMSE down? When running some of the examples, RMSE tends to be > 1. Is there an optimal number of features to use, perhaps? Any suggestions as to the tuning of the hyperparameters to keep the RMSE below 1?
How to scale/distribute the processing? If I have several million users and several million items and 3-4 features on users and 5-7 features on the items, what would be some of the approaches to scale this? I'm looking at this writeup: https://towardsdatascience.com/scaling-up-with-distributed-tensorflow-on-spark-afc3655d8f95. Any examples of how to code up a TFRS recommender that could run on Spark? or some other way to distribute. Seems like TensorFlowOnSpark is a way to go...

--
You received this message because you are subscribed to the Google Groups "Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/cf05c7ed-fb5b-4828-a9f9-c59a702c2e33n%40tensorflow.org.

Derek Cheng

unread,

Dec 21, 2020, 1:27:25 AM12/21/20

to Bairen YI, Maciej Kula, Tiansheng Yao, recomm...@tensorflow.org

+Maciej Kula +Tiansheng Yao FYI

You received this message because you are subscribed to the Google Groups "SIG Recommenders" group.
To unsubscribe from this group and stop receiving emails from it, send an email to recommenders...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/recommenders/5D3BC6D9-4710-4BB3-914E-A79AF24C1E25%40connect.ust.hk.
For more options, visit https://groups.google.com/a/tensorflow.org/d/optout.

--

Derek

Maciej Kula

unread,

Dec 21, 2020, 12:16:52 PM12/21/20

to Derek Cheng, Tiansheng Yao, recomm...@tensorflow.org

Also created as a Github issue.

Reply all

Reply to author

Forward