Fasttext aligned word vectors for translating homographs

40 views
Skip to first unread message

Kingstar

unread,
Mar 25, 2020, 7:49:21 AM3/25/20
to fastText library

Homograph is a word that shares the same written form as another word but has a different meaning, like right in the sentences below:

  • success is about making the right decisions.
  • Turn right after the traffic light

The English word "right", in the first case is translated to Swedish as "rätt" and to "höger" in the second case. The correct translation is possible by looking at the context (surrounding words).

Question 1. I wonder if fasttext aligned word embedding can come to help for translating these homograph words or words with several possible translations into another language?

Question 2. I loaded the english pre-trained vectors model and the English aligned vector model. While both were trained on Wikipedia articles, I noticed that the distances between two words were sort of preserved but the size of the dataset files (wiki.en.vec vs wiki.en.align.vec) are noticeably different (1GB). Wouldn't it make sense if we only use the aligned version? What information is not captured by the aligned dataset?

Reply all
Reply to author
Forward
0 new messages