Universal Sentence Encoder and non English languages

74 views
Skip to first unread message

Philippe Da Silva

unread,
May 12, 2021, 11:52:46 AM5/12/21
to Discuss
Hi,

I'm totally new to Tensorflow and ML in general but I'm digging into it for a small prototype project I'd like to work on.
The basic idea would be to provide email users a way to classify their new emails between the eisenhower matrix (Urgent and Important, Urgent but not Important, Not Urgent but Important and Not Urgent Nor Important).

Based on the litterature I achieved to read so far, I believe I should be able to achieve this using Tensorflow.js and Universal Sentence Encoder (USE).
However, it's unclear to me whether USE can be used for other languages than English as I couldn't read anywhere that could be a constraint but I saw there are some additional modules such as the Multilingual Universal Sentence Encoder which seems to solve cross languages use cases...

I'd appreciate if someone familiar with these could spend some time explaining whether I am in the right track or not? Why and eventually redirect me to any additional litterature.

Thanks by advance,

Philippe

Eduardo A. Flores Verduzco

unread,
May 13, 2021, 10:35:03 AM5/13/21
to Discuss, philippe...@indiefreaks.com
Hi Philipppe! I've used the multilingual versions of USE  for semantic similarity tasks which adapt between English and Spanish with pretty good success. An important part would be fine-tuning for your specific task which, I guess, would be classification. There are several multilingual USE versions but you may want to start with https://tfhub.dev/google/universal-sentence-encoder-multilingual/3 I've found that in general is fast and give reliable results for classification and STS tasks. For classification I would recommend seeing Neural Language Model (NNLM) examples and then substitute NNLM with USEM for multilanguage tasks. I did a bit of benchmarking some time ago using NNLM and USEM but didn't updated with USE, but I can say that USE outperforms NNLM for that task. You can check here: https://github.com/eduardofv/lang_model_eval

Hope this helps!
EF

Eduardo A. Flores Verduzco

unread,
May 13, 2021, 10:36:42 AM5/13/21
to Discuss, Eduardo A. Flores Verduzco, philippe...@indiefreaks.com
Sorry, missed to add this link with a NNLM classification example https://www.tensorflow.org/hub/tutorials/tf2_text_classification

Philippe Da Silva

unread,
May 14, 2021, 4:57:32 AM5/14/21
to Discuss, edua...@gmail.com, Philippe Da Silva
Thanks Eduardo for the hints. I started digging in and found great starting blocks.

Best regards,

Philippe

Eduardo A. Flores Verduzco

unread,
May 14, 2021, 10:14:50 AM5/14/21
to Philippe Da Silva, Discuss
Great! Feel free to ask if you need help!
Reply all
Reply to author
Forward
0 new messages