Tokenization + Char Ngrams

24 views
Skip to first unread message

Hossam Amer

unread,
Oct 3, 2023, 6:43:16 PM10/3/23
to fastText library
(1) What is the subword tokenization done for English FastText input?
I understand that you might refer me to Europarl. However, I cannot see where this preprocessing script occurs.
(2) How can I know the value of n used in the pre-trained model character n-gram?
Reply all
Reply to author
Forward
0 new messages