Reducing bundling noise

Dan Ulery

unread,

Nov 4, 2024, 3:00:11 PM11/4/24

to VSACommunity

Hi There,

I'm pretty new to VSA and I'm very much a implementation type person rather than a theory type person so hopefully I can explain the issues I'm having and hopefully I've come to the right place.

I have a system set up that uses what I guess would be called ternary vectors. I was using bipolar, but it seemed like adding a bunch of 0s at random helped. I believe this is what's referred to as a sparse vector?

For smaller problems, I'm getting great results. Most of what I need to do is bundling (addition) and I guess what you'd call negation (e.g. [-1, 1, 0] becomes [1, -1, 0]). As problem size increases, so does the number of bundling operations. Unfortunately, this is leading to poor results.

Some things I've tried:

Increase dimensionality
Change "sparsity" of the seed vectors
Perform normalization after each bundling operation
Perform normalization only on the final vector
Perform no normalization
Normalizations Tested:
- Sign (e.g. 134 becomes 1, -583 becomes -1, 0 stays 0)
- Percentiles (values in highest X% become 1, values in lowest X% become -1, everything else becomes 0)
Multiple copies of seed vectors resulting in multiple output vectors which are then used for what I think they call ensemble voting.

I've referred to Hyperdimensional Computing: An Algebra for Computing with Vectors which mentions holding off on normalization as long as possible. Given that I'm actually able to go without needing to normalize until the end, I don't see how I could do much better.

Some ideas I've had in the back of my mind:

It seems like different "runs" against the same problem result in some runs having decent results and other runs having not-so-great results. I'm wondering if that is a sign random seed vectors aren't the answer. I thought about using a Genetic Algorithm to evolve seed vectors that work well with a given set of problems, though I'm unsure if the seed vectors are going to be generic enough to work on new problems.
Some way to read the vector at certain points (perhaps every 1000th bundling operation) and re-map it somehow. No real idea how I'd go about this though.

It'd be nice to get some ideas going. Or at least if I could be pointed in the direction of a paper I could read, though admittedly I get lost a lot reading academic papers.

Thanks,

Dan Ulery

Ross Gayler

unread,

Nov 18, 2024, 7:34:14 PM11/18/24

to VSACommunity

Hi Dan,

I think it's worth taking a step back to understand what you're trying to achieve here.

I am assuming that you are storing values in the bundle in order to later query that bundle to find out whether specific values are contained in it.

I am also assuming that the items you are bundling are intentionally quasi orthogonal (because that's the assumption behind the usual arguments about the capacity of bundling).

You are obviously trying to store a great many values in the bundle but there is no information on how those values are generated (e.g. they are generated one at a time via some procedurally controlled process).

If you are testing for presence/absence of hypervector values from a *very* large set of values I would suggest that bundling is not the way to go.

Bundling, quite literally, is calculating the average of the items being bundled. If the items are related/nonorthogonal the effect of the bundling is to emphasize what the items have in common and suppress what varies between items. If the items are unrelated (quasi-)orthogonal, then the average converges to zero as the number of items bundled increases (because bundling loses information). VSA systems, like bipolar, that don't allow a zero value inject noise when they would otherwise have gone to zero. So it's really a miracle of high-dimensionality that enough information survives bundling of quasi-orthogonal values to allow you to bundle some reasonable number of items and test for inclusion.

I tend to think of bundles as being the equivalent of working memory - somewhere to keep a modest number of items that are currently in u

If you want to test for membership in a (potentially very large) set then that sounds more like a clean-up memory.

There are many ways of implementing a clean-up memory, differing widely in "biological plausibility" (if you care about that sort of thing. However they all will use more memory than bundling. In bundling you are try to stuff an arbitrary number of hypervectors into a single hypervector, which is why it loses information.

The conceptually simplest clean-up memory is to keep a list of all the hypervectors you are storing as items and then calculate the dot product of the query hypervector with each of the stored hypervectors to see which (if any) of the stored hypervectors is sufficiently similar to the query hypervector. Whether that implementation meets your implementation constraints I can't tell.

I hope that helps.

Ross Gayler

unread,

Nov 19, 2024, 4:10:14 AM11/19/24

to Dan Ulery, VSACommunity