Queries with total precision > 62 bits

57 views
Skip to first unread message

Arnoldo Muller

unread,
Nov 17, 2008, 6:01:42 AM11/17/08
to Uzaygezen
Hello Aioaneid and Mdakin:

I am the project owner of OBSearch, a distributed similarity search
engine.
I want to use Uzaygezen to match objects in metric spaces. I am
currently matching
trees and other heavy objects and therefore I need to use many
dimensions (around 32 dimensions of shorts). In the how-to you stated
that the total precision cannot exceed 62 bits for queries. I would
like to have queries of arbitrary precision.

I was browsing the file BacktrackingQueryBuilder that uses
FilteredIndexRange.
It seems that internally BacktrackingQueryBuilder uses
Pow2LengthBitSetRange
to make the computations and at some point it converts them to
LongRange.
It also seems that SimpleRegionInspector is doing the same thing.

If I change BacktrackingQueryBuilder, SimpleRegionInspector, and
FilteredIndexRange to use Pow2LengthBitSetRange instead of LongRange
would that make the trick? (not to mention the modification of unit
tests). Are you guys planning to support arbitrary precision queries?

Thanks,

Arnoldo Muller

Daniel

unread,
Nov 17, 2008, 2:21:28 PM11/17/08
to Uzaygezen
Hi Arnoldo,

The main reason why there is a limitation to 62 bits is that with high
dimensionality and/or precision the performance of the simple query
mechanism employed here drops significantly. There are no plans to
remove the restriction. The Hilbert index itself can still be
calculated to arbitrary precision and used let's say in a Hilbert
tree, but that is not covered here.

Thanks,
Daniel

Arnoldo Muller

unread,
Nov 18, 2008, 2:21:52 AM11/18/08
to Uzaygezen
Hello Daniel,

Thank you for the info. Do you know if the z-curve (ubtree)
would perform better in high dimensions?

Arnoldo

Daniel

unread,
Nov 18, 2008, 4:12:45 PM11/18/08
to Uzaygezen
Arnoldo,

The Z-curve should be easier to work with, while the Hilbert curve
achieves better clustering. I don't really know how well a Z-curve
based data structure would perform for this kind of dimensionality.

Daniel

Arnoldo Muller

unread,
Nov 19, 2008, 7:21:39 AM11/19/08
to Uzaygezen
Thank you Daniel,

I will play with the Z-curve or the U-curve. Hopefully they will be
better than the Pyramid Technique or iDistance.

Thanks,
:)

Arnoldo
Reply all
Reply to author
Forward
0 new messages