Hi Dan,
I think it's worth taking a step back to understand what you're trying to achieve here.
I
am assuming that you are storing values in the bundle in order to later
query that bundle to find out whether specific values are contained in
it.
I am also assuming that the items you are bundling are
intentionally quasi orthogonal (because that's the assumption behind the
usual arguments about the capacity of bundling).
You are
obviously trying to store a great many values in the bundle but there is
no information on how those values are generated (e.g. they are
generated one at a time via some procedurally controlled process).
If
you are testing for presence/absence of hypervector values from a
*very* large set of values I would suggest that bundling is not the way
to go.
Bundling, quite literally, is
calculating the average of the items being bundled. If the items are
related/nonorthogonal the effect of the bundling is to emphasize what
the items have in common and suppress what varies between items. If the
items are unrelated (quasi-)orthogonal, then the average converges to
zero as the number of items bundled increases (because bundling loses
information). VSA systems, like bipolar, that don't allow a zero value
inject noise when they would otherwise have gone to zero. So it's really
a miracle of high-dimensionality that enough information survives
bundling of quasi-orthogonal values to allow you to bundle some
reasonable number of items and test for inclusion.
I
tend to think of bundles as being the equivalent of working memory -
somewhere to keep a modest number of items that are currently in u
If you want to test for membership in a (potentially very large) set then that sounds more like a clean-up memory.
There
are many ways of implementing a clean-up memory, differing widely in
"biological plausibility" (if you care about that sort of thing. However
they all will use more memory than bundling. In bundling you are try to
stuff an arbitrary number of hypervectors into a single hypervector,
which is why it loses information.
The conceptually simplest
clean-up memory is to keep a list of all the hypervectors you are
storing as items and then calculate the dot product of the query
hypervector with each of the stored hypervectors to see which (if any)
of the stored hypervectors is sufficiently similar to the query
hypervector. Whether that implementation meets your implementation
constraints I can't tell.
I hope that helps.
Ross Gayler