So it sounds like what you really want is true JOIN operators (joining
across two or more queries), and that is exactly what we are starting
to work on next. It would be great if you're interested in helping to
define exactly what fast JOIN operators are needed first.
Yours,
Chris
>
> Thanks Chris.
> What you describe would be nice to have. However, I have a much simple
> problem in mind. Suppose we have a server that connects to NLMSA
> database to retrieve the results for a requesting client. Imagine the
> server returns an enormous result set. If we first go in and fetch the
> whole result set from the database before serving, then we are likely
> to timeout on the connection with the requesting client. If instead we
> get an iterator to the result set, we can start serving the client
> immediately with much smaller chance of timeout.
It sounds like you're talking about an XMLRPC client-server
connection. The use of an iterator in this context is a little funny,
because XMLRPC is not designed for the sort of persistent connection
that an iterator would require. It could be done, but we'd have to
define some very clear rules for letting the server purge old
iterators that clients have not properly released. More problematic
is the issue of multiple concurrent queries. Currently the NLMSA
server completes a query before beginning another. In your
"persistent iterator" model, the server has to hold incomplete
iterators open while continuing to process new queries. The internals
of intervaldb are in fact iterator-based, so it could do that, but it
certainly would add complexity to the server code.
It would be a LOT easier to do a straight LIMIT query just like SQL
does (see below). I suspect the above complexities may be why SQL
provides only a LIMIT mechanism, and no "progressive results" iterator
like you are asking for.
>
>
> Here's how this helps in my case. Suppose we are doing an exploratory
> analysis of the data where you need to quickly check if an idea would
> work. You need to run a query against the database and in order to
> confirm the idea works you only need a small portion of the result
> set. In other words what I need is a LIMIT modifier on a query. Having
> an iterator access makes limiting trivial. Hence my question.
we could implement a LIMIT clause very easily, right away, both for
local and XMLRPC queries. That seems like a great idea.
-- Chris
On Mar 12, 2009, at 6:59 PM, Alexander Alekseyenko wrote:
>
> a few hundred thousands.
-- Chris