Hi Amy,
The scan function was correct for the version of YCSB we developed it
against. The YCSB benchmark we used for the SIGCOMM submission
generates keys from continuous sequence of integers. The search [x,
x+N) is guaranteed to return the N items after item x. You can see this
for yourself in the CoreWorkload.java and the *Generator.java files from
the last version of YCSB avialable at the time of publication[1].
Searches in HyperDex are performed in the most restrictive subspace, and
the predicates are used to compute which subspace that is. For example,
a range search on strings cannot be used client-side to narrow down a
search (because of hashing), but a range search on a numeric value will
be used.
The current HyperDex implementation does exactly what you describe with
the iterator. The implementation we used for SIGCOMM requested more
items concurrently, much like the current implementation of sorted
search does. The current sorted search will do one round trip to each
server, because the search results are inherently limited in number;
whereas, an unsorted search is unbounded and the iterator model made
more sense.
If you simply use a GreaterEqual predicate, it will return more than
half the key space roughly 50% of the time, and as you noticed, will
perform poorly. If you're looking to play around with changing scan, an
incremental improvement would be to use sorted search and limit it to N
results. I would still recommend an upper bound on the search for
efficiency.
-Robert
[1]
https://github.com/brianfrankcooper/YCSB/tree/5813dbcaaf0218184086f2f319ea7e36baf52575