word2vec, similar words to orthogonal vectors

Rostyslav

unread,

Jul 8, 2021, 5:31:55 PM7/8/21

to Gensim

Hello,

I have a question regarding the word2vec. As far as I understand, word2vec maps words to vectors, such that similar words are heading in similar direction in the vector space.

What I would like to do is exactly the opposite, namely the similar words should be as orthogonal as possible in the vector space. Is it possible to implement? If yes, how?

Thank you!

Gordon Mohr

unread,

Jul 8, 2021, 11:34:20 PM7/8/21

to Gensim

Yes, but it may not be as meaningful as you're hoping - as vectors that are either 'orthogonal' or even 'opposite' a vector may not necessarily match human intuitioms of what those should mean.

For strict orthogonality, you could supply a `topn` of `None` to `.most_similar()` - which then returns the raw similarities to *all* other words, in the vector sets's internal (slot) ordering (rather than in descending similarity with words attached). Those values closest to `0.0` will indicate the positions of the words with the vectors most orthogonal to the provided word (or raw vector).

(Or equivalently, a `topn` equal to the total number of words returns normal ranked results for all words. Re-sort these not by descending similarity but by ascending absolute-value of the similarity, and the top of those results will indicate the same 'most orthogonal' as the revious method.)

Similarly, since `.most_similar()` can take a raw vector, such as the negation of any actual vector, you could get the 'most-similar-to-the-opposite-direction' with something like:

neg_vec = -vec_model[target_word]

most_opposities = vec_model.most_similar(positive=[neg_vec, ])

But note:

* opposite in coordinate space is unlikely to indicate 'opposite' in human intuition - indeed, strict antonyms along certain dimensions of meaning tend to be quite similarly positioned, overall, as they refer to the same domains-of-use, with many similar neighboring words

* usual ways of training word-vectors, such as with negative-sampling and more than 1 negative-samples per positive example, tend to make the 'cloud' of word-vectors a bit lopsides with respect to the origin point. See the mention of the 'All But The Top' paper in this prior message – https://groups.google.com/g/gensim/c/o8cDWyihuKc/m/hruB7QLHHwAJ – for more more discussion of this effect.

- Gordon

Rostyslav

unread,

Jul 9, 2021, 11:20:06 AM7/9/21

to Gensim

Gordon, thank you for the thorough answer! However, I think I did not express myself very well, since I aim for a little different result. Let me explain:

when training the model

model = Word2Vec(walks, vector_size=args.dimensions, window=args.window_size,\

min_count=0, sg=1, workers=args.workers, epochs=args.iter)

I get the embeddings, which I can access via

model.wv.save_word2vec_format(args.output)

These embeddings have the following interpretation: if the words (given by walks in my case) are similar, then they are embedded closer. But I would like to do exactly the opposite, namely the more similar the words are, the farther away in the embedded space they should be placed.

Is this possible?

Thank you very much!

Rostyslav

пятница, 9 июля 2021 г. в 05:34:20 UTC+2, Gordon Mohr:

Rostyslav

unread,

Jul 12, 2021, 3:11:04 AM7/12/21

to Gensim

I would like to update the thread, because I am afraid that the additional question that I asked on 9th of July was not seen.

Thank you!

Rostyslav

пятница, 9 июля 2021 г. в 17:20:06 UTC+2, Rostyslav:

Gordon Mohr

unread,

Jul 12, 2021, 8:30:09 AM7/12/21

to Gensim

I can't quite yet imagine how that would be interesting, so it's hard to think about potential ways to achieve it.

In particular, the usual 'similarity as nearness' has some level of transitivity - if A is a lot like B, and B is a lot like C, then A is going to be at least somewhat like C. The very process of word2vec training essentially achieves compression - representing a large number of words in a smaller number of dimensions – by nudging words with similar neighbors nearer each other - meaning the same shared model weights can participate, incrementally, in many words' predictions.

If instead words that are 'similar' (shared neighbor words) are to be maximally *different*, in coordinates, I'm not sure that maps to any trainable/characterizable space.

On the other hand, there's a naive way to change the *similarity* calculation: just flip the sign on the standard similarity-calc results. Now, a prior similarity of `0.9` (very similar) is `-0.9` (almost maximally dissimilar). If that sort of flip doesn't meet your need, a better description of what you're trying to achieve, with a different coordinate system, might help me think of things that might be appropriate.

- Gordon

Rostyslav

unread,

Jul 12, 2021, 9:03:40 AM7/12/21

to Gensim

Generally speaking, I want to use word2vec to solve graph coloring problem: I want to have the least number of conflicting nodes (the connected nodes that belong to the same class).

In general, I want to do representation learning (to learn the features of the nodes) such that I can use those features to classify the connected nodes to (hopefully) different classes.

The approach is taken from node2vec paper and in short can be described as following: some node A is characterized by the nodes that are visited during the random walk that starts from that node A. This random walk gives me a sequence of the nodes. I feed this sequence to word2vec and as a result I get embeddings of the node A. This process is repeated for all of the nodes in the graph.

In the obtained embeddings of the graph, similar nodes are placed close together. And here is the problem: the nodes are more likely to be placed close together if they are connected in the graph. Which in turn implies that if I try to classify them, then they are very likely going to belong to the same class. And this is the opposite of what I am aiming for.

Therefore, I had an idea of embedding dissimilar nodes close together. This will mean that during classification, dissimilar nodes (nodes that are less likely to be connected in the graph) obtain the same class.

So do you know if I can still achieve it via word2vec? Or should I use some different approach?

Thank you!

Rostyslav

понедельник, 12 июля 2021 г. в 14:30:09 UTC+2, Gordon Mohr:

Gordon Mohr

unread,

Jul 12, 2021, 2:09:52 PM7/12/21

to Gensim

I'm familiar with the node2vec idea of creating pseudotexts, where nodes are like words, from random walks of the graph.

But, I'm not really up on graph-coloring algorithms. Still, I suspect if what you truly need done is coloring a graph, some off-the-shelf algorithm will do really well. And, I have no intuition that the fuzzier idea of 'neighborhoods', that could be bootstrapped from random-walks or word2vec trained on random-walks. would significantly help those algorithms.

I'd not expect to push-forward the frontier of graph-coloring possibilities based on fairly-computationally-expensive word2vec--like-plotting of nodes. Nor have I noticed any papers/project similarly trying to reverse the usual property of word2vec, wherein used-alike words are coordinates-close.

- Gordon

Reply all

Reply to author

Forward