Quasi-orthogonal, non-orthogonal, and similar

27 views
Skip to first unread message

Adam Vandervorst

unread,
Sep 4, 2023, 6:46:12 PM9/4/23
to VSACommunity
There are different binary relations two hypervectors can have, the most commonly discussed being quasi-orthogonal. This is quite long to spell out, and in code, you're more likely to be interested in non-quasi-orthogonal. Another relation hypervectors can have is being almost the same, barring some amount of noise. Note the former is measured compared to the noise floor, and the other is measured compared to the vector itself.

Using Ross Gayler's view, where we're comparing two hypervectors, one is located at the north pole, the south pole would be its inverse, and the equator is where you expect the other one to be if it's independent. In ASCII art, with some suggested terms:
self ~self
north pole equator south pole
| | |
|=====| orthogonal-ish, difference normal
|==============| |==============| related, difference special
|==| similar, difference is a little noise

This diagram could be extended to include "anti-similar" or a one-side version of related (e.g., more ones than usual, in the boolean setting).
In practice, these relational properties are parametrized by a threshold, which I like to be specified in standard deviations certainty (e.g. 6). What terminology do you use and prefer? Do you use a special metric like mutual information or jaccard for this? I just use hamming distance, and calculate the Z-score of the results.

Best, Adam

Ross Gayler

unread,
Sep 15, 2023, 8:02:19 AM9/15/23
to Adam Vandervorst, VSACommunity
Hi Adam,

Re terminology: I'll avoid that question, having already caused enough chaos by introducing the name Vector Symbolic Architectures in 2003.

More generally, I agree with your comment (in another email?) that this is a fruitful cause of confusion for beginners to the field.
At the risk of being defeatist, I think this is the natural state of the world and unavoidable. I was recently reading some introductory maths text which went to some lengths to point out that different mathematicians used different terminology to refer to the same thing and the same terminology to refer to different things. So perhaps internal consistency within a single document is the best we can reasonably hope for.

I am actually more concerned by the tendency of people new to the field to pick up some specific (but relatively arbitrary) design choices from the first paper they read and treat that choice as though it is the only possibility.
I would be much happier if people took a much more nuanced approach to the field and recognised that so many choices of specific details are context depemdent and/or arbitrary.

Re the threshold parameter: Standard deviations are a good way to specify the threshold, although it's arguable that a tail probability is more basic.

More generally, metrics should be chosen to be consistent with how they will be used. If you are using a clean-up memory then generally the projections of the input vector onto each of the cleanup memory items causally determines the output of the cleanup memory, in which case any metric that's isomorphic to the dot product (e.g. the cosine similarity) product will be appropriate. (The hamming distance for binary vectors is isomorphic to the dot product.)
A metric like mutual information would be appropriate if somehow the mutual information played a causal role in the dynamics of the VSA circuit.
Otherwise, you can use any metric that suits your purposes if it's only used in the analysis - you just have to argue for why that specific metric suits your analytic purposes.

Getting back to your point about standard deviations of Hamming distance, that's appropriate near the equator because I assume you are setting up your system so that the projection of randomly selected vectors onto the self (north pole) vector is relevant to the dynamics of the system. The distribution of dot products with randomly selected vectors is well approximated by a normal distribution, so the standard deviation is a good summary. However, you are probably more interested in the probability corresponding to the threshold because this will be relevant to your practical use case. this also shows that the choice of threshold is essentially arbitrary - you want to discourage newcomers to the field from assuming that there is some fixed threshold that's appropriate for all use cases.

Choosing a threshold implies the existence of a decision - you are going to do something differently depending which side of the threshold you are.
The obvious decision corresponding to the equatorial threshold is to decide whether some vector has been constructed to contain the pole vector (by bundling) or whether its construction does not involve bundling in the pole vector.
Given that the similarity of a constructed vector (containing the pole vector) to the pole vector can be made arbitrarily small by weighted bundling, there is *no* threshold below which a vector is guaranteed not to contain the pole vector.
So we can take a "decision making under uncertainty" approach based on the likelihood of the vector containing the pole vector versus not containing the pole vector as a function of threshold.
Ideally, you would know the distribution vectors (as constructed by the dynamics of your specific VSA circuit) constructed not containing the pole vector. In practice, we use the distribution of hypervectors selected at random as an adequate approximation to the distribution we should really be using.

Also, i would treat the behaviour around the poles differently because you couldn't treat vectors near a specific point as being selected at random - so the distribution could be very different.
One plausible approach would be if you had a noisy implementation of your system. In this case the similarity threshold is used to decide whether you want to treat some specific vector near the pole as being a noisy realisation of the pole vector versus the realisation of some vector constructed to contain (among other things) a contribution from the pole vector.

So, similarity thresholds are perfectly reasonable to use but I would always want to see a justification for why the concept of threshold and its specific value are relevant to the current specific use case.

Cheers,
Ross

--
You received this message because you are subscribed to the Google Groups "VSACommunity" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vsacommunity...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/vsacommunity/cef7253b-beac-4600-9e88-e2b6b2a9bc98n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages