This is something we are looking at for Firefox.next, so the sooner you
give feedback, the better. :)
Cheers,
Shawn
I think we should make async API as similar to the synchronous one as
possible. What's the reason behind basing it on HTML5 SQL? Developers
are already accustomed to existing mozStorage APIs, so we shouldn't
deviate from them without a good reason. Learning curve aside, it would
create a big inconsistency. Let's take a look at the differences:
* AsyncTransaction is a completely different animal than a traditional
transaction.
* mozStorage APIs revolve around statement objects. There's no explicit
statement here.
* Parameters are bound in a different way
* Results are read in a different way
By the way, AsyncTransaction.execute() reads as if the transaction is
executed, not the statement. C.f. mozIStorageStatement.execute().
Another concern is the callback. Let's make it a single function instead
of an object implementing given methods. In the latter case you've got
to implement QI and the code that handles the statement is more
scattered. If you use many async statements, you usually make a
<code>switch</code> and forward the callbacks to separate dedicated
functions, anyway.
Here is the Shawn's example, adapted to use my idea of the API. Please
note that I have no knowledge of the internals of sqlite and mozStorage,
so I don't know how feasible this is.
// conn is a mozIStorageConnection
let sql = "SELECT * FROM moz_downloads WHERE state = ?1";
let stmt = conn.createStatement(sql);
stmt.bindInt32Parameter(0, Ci.nsIDownloadManager.DOWNLOAD_FINISHED);
stmt.executeAsync(callback);
function callback(aStatement, aResult, aError) {
if (aError)
// update UI accordingly
for (let row in aResult) {
// process data
// should be able to access data in a row by name or argument
number
}
}
Alternatively, the callback could be defined in the style of XMLHttpRequest.
stmt.onResultsReady = onResultsReady;
stmt.onError = onError;
stmt.executeAsync();
function onResultsRead(aStatement, aResults) {
}
function onError(aStatement, aError) {
}
- Adam
I am extensively using mozIStorage interfaces in an accounting xul
application, that is approaching production readiness. Shawn, thanks for
this opportunity to present my point.
Async calls are a great idea!
Comments on the proposed interface:
1. Is there any technical reasons that require AsyncTransaction?
It seems to be redundant otherwise. The best practice is use sql
parameters, which are to be bound after the statement is already
prepared and its syntax is verified. The counter-proposal is to add
'executeAsync()' to 'mozIStorageStatement'.
2. Separate method to obtain ResultSet is a right thing. We can probably
also change 'mozIStorageStatement' synchronous calls API in the same
way. Currently, the first call of 'executeStep()' has a special EXECUTE
meaning, while all subsequent ones actually mean FETCH ROW. Separating
this two meanings may be a good idea.
Instead of passing ResultSet proxy to a callback, let's:
drop 'boolean executeStep();',
drop 'void reset();',
drop 'readonly attribute long state;'
add 'readonly attribute mozIStorageValueArray resultSet;'
to 'mozIStorageStatement', and stop subclassing 'mozIStorageValueArray'
there. In turn 'mozIStorageValueArray' will receive
'readonly attribute long state;',
'void fetchNext();', and
'void fetchNextAsync([optional] in Callback aCallback);'
So 'mozIStorageValueArray' (aka cursor) is separated from
'mozIStorageStatement'. The former can later be extended to provide
support for random access cursors. 'reset()' goes away entirely. It can
be invoked internally in 'execute()', 'executeAsync()' and 'finalize()'.
As a result, mozIStorage* will become more SQL-ish and friendlier to
other databases like MySQL or PostgreSQL. There is a legitimate question
"why do we care about other databases?". Here is one hypothetical
example: since we already keep a hefty part of user profile (history and
bookmarks) in a relational database, we can move the rest of valuable
data -- mainly passwords and extensions (yes! extensions) -- there as
well. So the profile will contain a database (or databases) and cached
data. This creates an opportunity for big organizations to use a central
database to store and manage extensions, web settings and browsing data
of their users. And if such a user logins simultaneously on several
workstations, no browsing data will be lost.
3. Finally, can we directly use nsIWebProgressListener2 as a callback
class in both 'executeAsync()' and 'fetchNextAsync()'?
This can be the first step of many to provide full database progress
feedback using progress bar. Long places operations (the main client of
storage) will clearly benefit from this feature.
4. And woooooohooooooo, as 3.0 RC1 has finally been tagged for building!
--
Sergey Yanovich
> * AsyncTransaction is a completely different animal than a traditional
> transaction.
Yes, and no. Each statement has an implicit transaction wrapped around
it - the whole statement either succeeds or fails. With that said, I'm
open to a better name on this.
> * mozStorage APIs revolve around statement objects. There's no explicit
> statement here.
I'm not sure this is a bad thing, but it is different.
> * Parameters are bound in a different way
It's my opinion that our current approach to binding parameters is
clunky and error-prone. I'd love to fix that, and that's what I'm
trying to do here.
> * Results are read in a different way
Same as above. Note that since this is targeting Firefox.next, which is
supposed to come out some time in the fall, we don't want to change
existing APIs. So we can add to our current API, but changing it should
be avoided if possible.
> By the way, AsyncTransaction.execute() reads as if the transaction is
> executed, not the statement. C.f. mozIStorageStatement.execute().
Implicit transaction again. Although, I do believe this could be done
without the AsyncTransaction object which would probably ease this
concern of yours (I had a reason for doing it with AsyncTransaction
because I shared your line of thinking initially. Something made me
change my mind, but I can't recall what that was currently. When I
figure it out again, I'll post).
> Another concern is the callback. Let's make it a single function instead
> of an object implementing given methods. In the latter case you've got
> to implement QI and the code that handles the statement is more
> scattered. If you use many async statements, you usually make a
> <code>switch</code> and forward the callbacks to separate dedicated
> functions, anyway.
Letting one method handle both errors and handling of an actual result
is a form of Logical Cohesion which is generally frowned upon (yeah, we
do it in the mozilla codebase in various places). Functional Cohesion
is preferred. I recommend reading Code Complete 2nd Ed., Chapter 7 for
more details (I imagine wikipedia has something on it too).
Additionally, JS consumers don't have to implement the QI code, and they
don't even need to provide an error or result handler if they don't want
to. XPConnect, as I understand it, will just throw
NS_ERROR_NOT_IMPLEMENTED if the back-end code tries to call it. The
back-end should be able to handle that gracefully.
> // conn is a mozIStorageConnection
> let sql = "SELECT * FROM moz_downloads WHERE state = ?1";
> let stmt = conn.createStatement(sql);
> stmt.bindInt32Parameter(0, Ci.nsIDownloadManager.DOWNLOAD_FINISHED);
> stmt.executeAsync(callback);
The problem with this approach is that preparing the statement (the
createStatement call) is often the most expensive operation for SELECT
statements. With the current API we can do that work on the background
thread[pool].
> Alternatively, the callback could be defined in the style of
> XMLHttpRequest.
The only reason why I don't like this approach is because it seems like
a lot more lines of code to accomplish the same thing. With that said,
I'm indifferent about it.
Thanks for the great feedback. Let me know if this addresses your concerns.
Cheers,
Shawn
> It seems to be redundant otherwise. The best practice is use sql
> parameters, which are to be bound after the statement is already
> prepared and its syntax is verified. The counter-proposal is to add
> 'executeAsync()' to 'mozIStorageStatement'.
I think that this might be the way to go, but I don't think we should
have them prepare it first (see previous post about that in reply to Adam).
> 2. Separate method to obtain ResultSet is a right thing. We can probably
> also change 'mozIStorageStatement' synchronous calls API in the same
> way. Currently, the first call of 'executeStep()' has a special EXECUTE
> meaning, while all subsequent ones actually mean FETCH ROW. Separating
> this two meanings may be a good idea.
I'd love to, but we don't want to have that much API churn for Firefox
3.1. For Firefox 4 we can look at that.
> 3. Finally, can we directly use nsIWebProgressListener2 as a callback
> class in both 'executeAsync()' and 'fetchNextAsync()'?
I'm not sure I understand how that is useful, so you'd have to elaborate
on that.
Cheers,
Shawn
(1) Dropped the AsyncTransaction interface, and moved its executeAsync
method onto mozIStorageConnection. There was no reason for the
interface to exist, and it's one less thing consumers have to worry about.
(2) Added a SQLError interface that will be passed back to the callback
for errors. There is, however, no definition as I'm not 100% sure what
we want to include when we notify that there was an error.
(3) Updated the sample code to go along with the new interface proposal.
Cheers,
Shawn
[1]
http://developer.mozilla.org/en/docs/User_talk:Comrade693#Async_Storage_Statements
Overall: It would probably be useful to explicitly state, near the top
of the wiki page, exactly which Mozilla version this is targeting -
sounds like Firefox.next / 1.9.nn?
Shawn Wilsher wrote:
> Adam Kowalczyk wrote:
>> // conn is a mozIStorageConnection
>> let sql = "SELECT * FROM moz_downloads WHERE state = ?1";
>> let stmt = conn.createStatement(sql);
>> stmt.bindInt32Parameter(0, Ci.nsIDownloadManager.DOWNLOAD_FINISHED);
>> stmt.executeAsync(callback);
> The problem with this approach is that preparing the statement (the
> createStatement call) is often the most expensive operation for SELECT
> statements. With the current API we can do that work on the background
> thread[pool].
On the other hand, this also means it's not possible to re-use statements.
I'm not quite sure how that second argument (the one with
the things to bind) can be implemented, given that XPIDL doesn't, as far
as I know, have method overloading. It seems to take either a JS Array,
Object, or a C++ Parameters thing that is to be determined. Would we
still get type-checking on the C++ side? (Hey, if you somehow manage to
get that, all the better.)
Of course, I know about nil about API design ;)
(Off-topic note-to-self: trigger + mozIStorageFunction = win)
--
Mook
While strictly speaking true, there are ways to get around this, esp. if all
your calls are from JS into C++. You can use nsIVariant, or declare no args in
the IDL and read them directly using XPConnect APIs.
-Boris
Sure. I mistook Firefox.next as Firefox 4.
>> 3. Finally, can we directly use nsIWebProgressListener2 as a callback
>> class in both 'executeAsync()' and 'fetchNextAsync()'?
> I'm not sure I understand how that is useful, so you'd have to elaborate
> on that.
I had in mind whole-sale database access like when History -> Show All
History is invoked. Our current Library window doesn't contain progress
bar, because it would be useless, since the routine that populates the
main tree is synchronous. But things will change, once async call
becomes possible. We can then provide substantially better user
experience by doing async loading and showing progress in a progress bar.
However, in the context of a dot release, introducing this new
functionality to the mozIStorage* no longer seems a good idea.
A) I started to argue with the necessity of changing connection
interface, but then came around this:
> The problem with this approach is that preparing the statement (the createStatement call) is often the most expensive operation for SELECT statements.
This is a great argument in favor of creating a statement
asynchronously, but it is also an argument in favor of storing the
produced statement for re-use.
This may have been Shawn's motivation to introduce AsyncTransaction.
Nevertheless, it must be an async operation. A possible name could be
createStatementAsync(). Another important method of the statement is
finalize(), which is the only reliable way to free the database from JS.
B) In the view of createStatementAsync(), Parameters callback receives
additional task of storing the statement.
C) Finally, ResultSet behavior needs some clarification. From how it is
used in the example, it may be inferred, that getNextResult() is
synchronous. That means, all data needs to be fetched and saved locally
before handleResult() is invoked. A number of questions arise:
* When working the large amounts of data (like browsing history), this
may result is excessive memory consumption.
* Although the data is fetched row by row, nothing will be displayed,
until the last row is fetched.
* If there are errors in the end of a large list, nothing will be
displayed at all.
Proposed solution is to have API for both paths of execution:
1. mozIStorageConnection.executeSimpleAsync(in AUTF8String aSQL,
[optional] in Parameters aParameters, [optional] in Callback aCallback);
2. mozIStorageConnection.createStatementAsync(in in AUTF8String aSQL,
[optional] in Parameters aParameters), then
mozIStorageStatement.executeAsync(in Callback aCallback), then
mozIStorageValueArray.fetchNextAsync(in Callback aCallback) for each
row.
D) Callback needs to have separate methods handleAllResults/handleResult
respectively to handle 1. and 2.
--
Sergey Yanovich
This is true for very light-weight SELECT statements, to be clear, and
it only matters if you're running them many times.
> This is a great argument in favor of creating a statement
> asynchronously,
No -- it can't really block, and the additional complexity of using it
will be a huge developer cost for no gain.
> but it is also an argument in favor of storing the
> produced statement for re-use.
Yes.
:)
Mike
>> The problem with this approach is that preparing the statement (the
>> createStatement call) is often the most expensive operation for SELECT
>> statements. With the current API we can do that work on the
>> background thread[pool].
> On the other hand, this also means it's not possible to re-use statements.
This is true, but I think I have a solution for that. I need to think
about it a bit more though, so that'll come in a later post.
> I'm not quite sure how that second argument (the one with
> the things to bind) can be implemented, given that XPIDL doesn't, as far
> as I know, have method overloading. It seems to take either a JS Array,
> Object, or a C++ Parameters thing that is to be determined. Would we
> still get type-checking on the C++ side? (Hey, if you somehow manage to
> get that, all the better.)
I've updated the wiki (revision 71419 [1]) to better detail this. I
think it's quite possible to do all this and still get type-checking.
Cheers,
Shawn
With that said, I think my previous statement wasn't very complete.
With large, fragmented database files, stepping through the result can
also be expensive since it has to keep reading from various places on
the disk.
Additionally, while preparing the statement is work, it's not blocking
the UI, so I'm not sure we should be worrying about saving that work.
Perhaps I'm wrong on this though...
> B) In the view of createStatementAsync(), Parameters callback receives
> additional task of storing the statement.
That violates Functional Cohesion, which I'd prefer to not do.
> C) Finally, ResultSet behavior needs some clarification. From how it is
> used in the example, it may be inferred, that getNextResult() is
> synchronous. That means, all data needs to be fetched and saved locally
> before handleResult() is invoked. A number of questions arise:
>
> * When working the large amounts of data (like browsing history), this
> may result is excessive memory consumption.
>
> * Although the data is fetched row by row, nothing will be displayed,
> until the last row is fetched.
>
> * If there are errors in the end of a large list, nothing will be
> displayed at all.
I'm not sure I understand this point specifically.
> Proposed solution is to have API for both paths of execution:
> 1. mozIStorageConnection.executeSimpleAsync(in AUTF8String aSQL,
> [optional] in Parameters aParameters, [optional] in Callback aCallback);
>
> 2. mozIStorageConnection.createStatementAsync(in in AUTF8String aSQL,
> [optional] in Parameters aParameters), then
> mozIStorageStatement.executeAsync(in Callback aCallback), then
> mozIStorageValueArray.fetchNextAsync(in Callback aCallback) for each
> row.
Based on my previous comments in this reply (and the one I'm about to
make), I don't think that's needed.
> D) Callback needs to have separate methods handleAllResults/handleResult
> respectively to handle 1. and 2.
I do not believe that this is necessary. handleResult can handle one or
more rows at a time, and it can be called more than once. I've updated
the wiki page accordingly (revision 71426 [1])
This may seem odd, but at least one other API that I know of does that -
Threading Building Blocks with it's Body and Range concepts. A Body may
have its function operator called more than once on different ranges,
which is essentially what we are doing here.
Let me know if this doesn't clarify things enough.
Cheers,
Shawn
[1]
http://developer.mozilla.org/en/docs/index.php?title=User_talk:Comrade693&oldid=71426
I'm jumping into the middle of the thread here, but would the following work?
Store both the original string and a prepared statement. Have an unsigned int
counter initialized to 0.
An attempt to execute the statement does the following:
1) atomically increment the counter
2a) If the counter value is now 1, go ahead and execute
the prepared statement, then atomically decrement the counter.
2b) Otherwise, atomically decrement the counter, create a one-off
statement from the string, and execute it.
This maintains the invariant that any time we're executing the prepared
statement the counter is nonzero, so that no one else can enter the prepared
statement execution. In the common case, the only overhead over blindly
executing the prepared statement is that we have to do the increment/decrement
and one branch. If two execution attempts race, one will hit the fast path,
while the other hits the slow path. At least if the branch between 2a and 2b is
done on the return value of PR_AtomicIncrement.
> Additionally, while preparing the statement is work, it's not blocking
> the UI, so I'm not sure we should be worrying about saving that work.
> Perhaps I'm wrong on this though...
Are all statement preparations happening on the same thread? How many
statements do we expect to be in-flight at once? Depending on the answer to the
former, it seems to me that if there is the potential for lots of statement
preparation to be happening, we'll either be blocking the statement preparation
thread (causing UI lag in whatever operation is showing a progress bar to the
user while waiting for the query to complete) or spawn off lots of statement
preparation threads (and some will end up blocking on lack of execution units,
though that might become less of a problem as processors have more and more cores).
But in any case, if it's cheap enough it might be worth optimizing for something
other than just UI responsiveness here (e.g. power consumption? ;) ).
-Boris
Yeah, now I seem to have more questions, than before :)
1. What's the common use case(s) for async calls? Should the resulting
API be universal or optimized for common usage?
2. When is mozIStorageStatement.finalize() expected to be called, if at all?
3. How many times is handleResult() expected to be called, when the
query yields 10, 100, 10000 rows? Is ResultSet.getNextResult() synchronous?
The answers will let me be more specific with my comments.
Thanks,
--
Sergey Yanovich
> 2. When is mozIStorageStatement.finalize() expected to be called, if at
> all?
You don't have to ever call it. Internally, it's called in the
statements destructor so when your js var is garbage collected it is
taken care of. However, it was added because you couldn't move/delete
the database file if a connection was still open, and a connection stays
open as long as there is a statement that isn't finalized. Seamonkey
specifically needed this.
> 3. How many times is handleResult() expected to be called, when the
> query yields 10, 100, 10000 rows? Is ResultSet.getNextResult() synchronous?
I think defining an amount would be a poor idea. The back-end could
chunk it in time, or it could chunk it some other way. I think that
decision should be decided at implementation time and be hidden from the
interface.
As a result, I don't think ResultSet::getNextResult is synchronous in
terms of database access. I'm envisioning the back-end to dump the data
into an object that is passed back to the callback. If that's not what
you meant by synchronous, you'll have to elaborate more :)
Cheers,
Shawn
(1) executeAsync has been moved onto mozIStorageStatement. You can now
use a prepared statement that has parameters already bound with this API.
(2) As a result of (1), the fancy binding of parameters magic has been
removed. We might look at that after this is done for Firefox 3.1, but
it will likely make Firefox 4.
(3) Code samples updated accordingly.
Questions and feedback welcome :)
Cheers,
Shawn
Either way was good to me, so there was no concern :) Nevertheless, I
like the made choice more than the other.
>> 2. When is mozIStorageStatement.finalize() expected to be called, if
>> at all?
> You don't have to ever call it. Internally, it's called in the
> statements destructor so when your js var is garbage collected it is
> taken care of. However, it was added because you couldn't move/delete
> the database file if a connection was still open, and a connection stays
> open as long as there is a statement that isn't finalized. Seamonkey
> specifically needed this.
After the decision to keep creating statements old way, this concern is
void. Statement object will remain to be normally accessible from js.
But this answers brings us back to the question 1. What we are trying to
solve? Please correct me, but I've read in bugzilla, that lack of
asynchronous API in storage makes browsing experience really ugly, if
the profile resides on a NFS mount. If this case falls into the range of
issues being addressed, we could probably reconsider this decision.
>> 3. How many times is handleResult() expected to be called, when the
>> query yields 10, 100, 10000 rows? Is ResultSet.getNextResult()
>> synchronous?
> I think defining an amount would be a poor idea. The back-end could
> chunk it in time, or it could chunk it some other way. I think that
> decision should be decided at implementation time and be hidden from the
> interface.
>
> As a result, I don't think ResultSet::getNextResult is synchronous in
> terms of database access. I'm envisioning the back-end to dump the data
> into an object that is passed back to the callback. If that's not what
> you meant by synchronous, you'll have to elaborate more :)
I understand the word "synchronous" when used with a method as follows.
If the method returns after the job is done, it is synchronous. If the
job continues to run in a different thread/process, when the method
returns, it is asynchronous.
It is now crystal clear from your answer, that the result set is deemed
to be created before handelResult() is invoked, so
ResultSet::GetNextResult() is a synchronous method (which fetches
products of asynchronous database access :)).
Now I can rephrase my earlier concerns on how this approach can handle
large result sets, like browsing history:
* When working the large amounts of data (like browsing history), this
may result is excessive memory consumption and allocation overhead. If
the caller plans to store all the results in its own storage, the
storage will be allocated 2 times: in the back-end, and by the caller.
* Although the data is fetched row by row, nothing will be displayed,
until the last row is fetched.
* If there are errors in the end of a large list, nothing will be
displayed at all. F.e., if there is network failure on the row #9999 of
10000, the back-end will invoke handleError(), and all fetched data will
be lost. The data can be recovered in handleError() as a workaround, but
this will violate Cohesion, right?
With all that said, we are also going to have an accessible statement,
which is a mozIStorageValueArray instance. So ResultSet is completely
redundant in this case, because the result set can be normally iterated
with mozIStorageStatement.executeStep(). Correct?
My proposal is that we add a executeStepAsync(). When the caller knows,
that the result set may be huge, it will use this and expect to be
called once for each row.
To enforce Functional Cohesion, Callback will need 2 separate methods
for this two async calls. They may be handleResultSet (for all) and
handleResult (for one). Both should receive mozIStorageStatement as a
parameter.
Thanks,
--
Sergey Yanovich
> Now I can rephrase my earlier concerns on how this approach can handle
> large result sets, like browsing history:
>
> * When working the large amounts of data (like browsing history), this
> may result is excessive memory consumption and allocation overhead. If
> the caller plans to store all the results in its own storage, the
> storage will be allocated 2 times: in the back-end, and by the caller.
This already happens with most other database API's, so I don't think
it's a big deal. Also, I don't expect most situations to be handling
large chunks of data. However, if this does turn out to be a problem,
we can always have the background task providing the data "slow down" as
it were, and wait for existing data to be processed before generating more.
> * Although the data is fetched row by row, nothing will be displayed,
> until the last row is fetched.
I'm not sure how you've come to that conclusion, but I don't believe it
to be true.
> * If there are errors in the end of a large list, nothing will be
> displayed at all. F.e., if there is network failure on the row #9999 of
> 10000, the back-end will invoke handleError(), and all fetched data will
> be lost. The data can be recovered in handleError() as a workaround, but
> this will violate Cohesion, right?
I don't see why we couldn't give the caller the data we have and then
invoke handleError. Also, many calls could have been made to
handleResult before we even hit that error.
> With all that said, we are also going to have an accessible statement,
> which is a mozIStorageValueArray instance. So ResultSet is completely
> redundant in this case, because the result set can be normally iterated
> with mozIStorageStatement.executeStep(). Correct?
We won't be exposing this though. Internally, in the background we'll
be calling it to generate ResultSet objects, but this shouldn't be
exposed to handleResult. The statement would already be consumed when
generating the ResultSet object. I *really* don't see the benefit of
exposing a mozIStorageStatement object to handleResult.
Cheers,
Shawn
My understanding was that handleResult() is called once on success, and
handleError() is called once on failure. If this is not correct, then
there is a new problem. I need to know when everything is fetched, cause
I will eventually need to free statement/close connection.
>> * If there are errors in the end of a large list, nothing will be
>> displayed at all. F.e., if there is network failure on the row #9999
>> of 10000, the back-end will invoke handleError(), and all fetched data
>> will be lost. The data can be recovered in handleError() as a
>> workaround, but this will violate Cohesion, right?
> I don't see why we couldn't give the caller the data we have and then
> invoke handleError. Also, many calls could have been made to
> handleResult before we even hit that error.
I thought that callback is invoked just once. Otherwise it is not clear,
when the processing is complete.
>> With all that said, we are also going to have an accessible statement,
>> which is a mozIStorageValueArray instance. So ResultSet is completely
>> redundant in this case, because the result set can be normally
>> iterated with mozIStorageStatement.executeStep(). Correct?
> We won't be exposing this though. Internally, in the background we'll
> be calling it to generate ResultSet objects, but this shouldn't be
> exposed to handleResult. The statement would already be consumed when
> generating the ResultSet object. I *really* don't see the benefit of
> exposing a mozIStorageStatement object to handleResult.
Well, the statement is available to the client, whether we want it or
not. In addition, executeStep() is the current approach to enumerating
the query results. In other words, mozIStorageStatement is a complete
logical equivalent to the proposed ResultSet, and executeStep() is a
complete logical equivalent to the proposed getNextResult(). More than
that, executeStep() is synchronous just as getNextResult().
So we are back to question:
> 3. How many times is handleResult() expected to be called, when the query yields 10, 100, 10000 rows?
Now I see, that I didn't interpreted your answer correctly. My major
problem here is that I need to know when everything is fetched. What's
wrong with calling the callback for each fetched row? That's approach
for processing large result sets. For small result sets, we can have a
different function, which will return when everything is already
fetched. I propose to name them executeStepAsync() and executeAsync()
respectively, similar to how their synchronous analogs are called.
--
Sergey Yanovich
The callback can be called multiple times, with whatever the "natural"
batching order is from the database. Early implementations might well
use 1 as the batch size.
We could send a flag with handleResult to indicate that it's the last
result set, or use .handleComplete() call. I prefer the latter,
because it also lets us signal multiple errors using a similar model,
but it's not a strong preference.
Mike
If you really need to know when it's all done, we could add another
method that looks something like void handleCompletion();
> Well, the statement is available to the client, whether we want it or
> not. In addition, executeStep() is the current approach to enumerating
> the query results. In other words, mozIStorageStatement is a complete
> logical equivalent to the proposed ResultSet, and executeStep() is a
> complete logical equivalent to the proposed getNextResult(). More than
> that, executeStep() is synchronous just as getNextResult().
How is the statement available to the client? They don't get it in the
async callback, and the api doesn't state that the stmt that
executeAsync is called on is the one that's actually used. In fact, in
terms of implementation we can't use the same statement - we'll have to
clone it and transfer the bindings.
The difference between executeStep and getNextResult is that the former
has to read from disk, while the latter reads from memory.
> So we are back to question:
>> 3. How many times is handleResult() expected to be called, when the
>> query yields 10, 100, 10000 rows?
>
> Now I see, that I didn't interpreted your answer correctly. My major
> problem here is that I need to know when everything is fetched. What's
> wrong with calling the callback for each fetched row? That's approach
> for processing large result sets. For small result sets, we can have a
> different function, which will return when everything is already
> fetched. I propose to name them executeStepAsync() and executeAsync()
> respectively, similar to how their synchronous analogs are called.
I don't see how getting a callback for each row would solve your problem
of knowing when everything is done. Getting a callback for each row
will perform much slower than just running the call synchronously since
you have lots and lots of event proxying. Not only that, we'd still
have to copy the results over into memory for each row, otherwise the
background thread is blocked waiting for the data to be consumed on the
calling thread before it can step through the results again.
Additionally, having one API for "large" data sets and another for
"small" data sets is just asking for problems. If we can make one API
that satisfies both cases, we don't leave a footgun lying around for
developers to come across and play with. I am not yet convinced that
what is in the wiki now will perform poorly with "large" data sets.
I've gone ahead and clarified the wording for executeAsync (revision
71470 [1]):
void executeAsync([optional] in Callback aCallback);
Executes a prepared mozIStorageStatement and calls the callback (if
provided) when a result is returned. This statement is then reset, and
can be reused synchronously or asynchronously again immediately.
Cheers,
Shawn
On May 12, 11:50 am, Shawn Wilsher <sdwi...@mozilla.com> wrote:
> I'm looking for feedback on the proposed API (with sample code) for an
> asynchronous API for mozStorage. What I've got so far can be found here:http://developer.mozilla.org/en/docs/User_talk:Comrade693#Async_Stora...
void handleCompletion();
Called when there are now more results to be obtained. This is called
only once.
Cheers,
Shawn
[1]
http://developer.mozilla.org/en/docs/index.php?title=User_talk:Comrade693&oldid=71471#Callback
Cheers,
Shawn
> void executeAsync([optional] in Callback aCallback);
> Executes a prepared mozIStorageStatement and calls the callback (if
> provided) when a result is returned. This statement is then reset, and
> can be reused synchronously or asynchronously again immediately.
Mike Shaver wrote:
> The callback can be called multiple times, with whatever the "natural"
> batching order is from the database. Early implementations might well
> use 1 as the batch size.
Callback looks great now. Thanks for cooperative approach to the things.
Natural batching order is exactly the way I wished to see, because it's
the fastest.
The last remaining point is asynchronous statement creation. The
database validates the statement, and it needs to read metadata from
disks for that. If the database is over NFS mount, and the metadata is
not cached in memory, it may well be a lengthy operation, blocking UI.
Synchronous API has mozIStorageConenction::executeSimple(), which
doesn't expose the statement. We may have
void executeSimpleAsync(in AUTF8Strings aSql, [optional] in Callback
aCallback);
with a similar behavior.
Cheers,
--
Sergey Yanovich
Metadata is eagerly cached in memory, and IMO the additional API
complexity is not worth it. Do you find that your app is unreasonably
blocked in statement preparation? If an NFS mount goes away anything
can freeze, since we treat file: loading as synchronous (for
performance reasons as well as portability), so defending against it
in this one case with additional API baggage doesn't feel necessary to
me.
I'm an open-minded sort of guy, though -- do any other APIs provide an
async single-statement preparation API? Let's see which, if so, and
why.
Mike
Most of my application database access calls use parameters, so this is
not an immediate concern. However, I will be implementing mozIStorage*
binding for MySQL. Synchronous statement preparation on a remote server
will certainly be a problem, but I will handle that in my code.
> If an NFS mount goes away anything
> can freeze, since we treat file: loading as synchronous (for
> performance reasons as well as portability), so defending against it
> in this one case with additional API baggage doesn't feel necessary to
> me.
Correct, NFS normally takes care of making file operations asynchronous,
but sqlite is using fsync(), which is the cause of unresponsiveness. So
sqlite is the only place, where we need to handle this issue.
The first time, I've heard about async API was related to NFS profiles,
so I'm just trying to ensure this particular issue is addressed. If
that's considered a non-blocker, I am just fine with everything else :)
Grepping the tree for 'ecuteSimple' shows, it is heavily used to
create/drop tables/indexes. Index operations may take some time on large
tables, so switching them to async API will improve UI responsiveness,
and maybe Ts also.
Other than that, I am fine with the latest version. Thanks for taking
MHO into consideration.
Cheers,
--
Sergey Yanovich
Abstract Accounting Ltd.
http://aasii.org/
If you have a long-running statement that requires some sort of
UI-feedback (eg a progress meter), it'd be nice to know when to stop
that progress meter without having to poll.
Also, how does one abort a long-running statement?
Dan
> If you have a long-running statement that requires some sort of
> UI-feedback (eg a progress meter), it'd be nice to know when to stop
> that progress meter without having to poll.
Yeah, handleCompletion() should handle this case.
> Also, how does one abort a long-running statement?
I missed this point :). What about making handleResult() and
handleError() return boolean: true - go on, false (or error) - stop here?
--
Sergey Yanovich
Cheers,
Shawn
So we have a queue, and the current call returns false. Can backend just
release the rest of the queue with callbacks?
--
Sergey Yanovich
I wanted to say "without invoking callbacks". :)
Cheers,
Shawn
It was said in a earlier post, that callbacks will be called in the
natural batching order of the underlying database. AFAICT unless
nsICancelable is going to kill the background working thread with brute
force system calls, the cancellation will be happening at the same
points, where callbacks are invoked.
In other words, the proposed nsICancelable functionality may be
implemented by the client in the Callback, without complicating the API.
But either way will work.
Cheers,
--
Sergey Yanovich
As a result, we have two options:
(1) Keep track of what is bound internally. For statements that are
executed synchronously, this won't be much use, but we'll need it for
the async case. This could consume a lot of memory if someone is
binding a large blob too.
(2) Pass in an "object" of bound parameters like earlier proposals of
the API suggested.
I'm OK with either solution here, but I wanted to get the opinion of
other folks.
Cheers,
Shawn
[1] http://www.mail-archive.com/sqlite...@sqlite.org/msg33987.html
It appears that this is no longer neccessary. One of my assumptions
about thread safety turned out to be incorrect for sqlite.
Cheers,
Shawn
Cheers,
Shawn
1) How much information is going to be provided to
storageIStatementCallback.handleError()? The SQL error code and the
corresponding error string, for sure. If that's all, then perhaps they
could be passed as parameters and we could get rid of storageIError. On
the other hand, having an object which you can throw may also be useful.
2) The established nomenclature in existing interfaces uses "row", not
"tuple". See mozIStorageStatementRow, mozIStorageStatementWrapper.row.
"Row" is also used in descriptions of other attributes and methods.
3) Which methods of the callback are used by non-SELECT statements?
handleError() and handleCompletion(), I assume?
4) I am not convinced of the value of separating the iterator
(storageIResultSet) and a result (storageITuple). The iterator could
easily provide the results, too. I know this approach causes horrible
overloading of the statement object in case of synchronous statements.
However, storageIResultSet doesn't share all the other functionality of
a statement, it only handles the results. Separating things further
seems like going from one extreme to the other.
The practical consequence for the consumers is a little less concise
code, because of an additional variable. Also, you have to choose
between assigning that variable inside |while|, which is considered a
bad practice by some people, and calling getNextTuple() in two different
places. [1]
Ideally, handleResult() would take a JS array of storageITuple's and
storageIResultSet wouldn't be needed.
I would go further and only return nsresult, at least for the beginning.
String manipulations are expensive, especially GC string manipulations
[1].
> 2) The established nomenclature in existing interfaces uses "row", not
> "tuple". See mozIStorageStatementRow, mozIStorageStatementWrapper.row.
> "Row" is also used in descriptions of other attributes and methods.
++here. And do we really need 'storageITutle'? Maybe we can just extend
mozIStorageValueArray to have those nice two methods?
> 4) I am not convinced of the value of separating the iterator
> (storageIResultSet) and a result (storageITuple). The iterator could
> easily provide the results, too. I know this approach causes horrible
> overloading of the statement object in case of synchronous statements.
> However, storageIResultSet doesn't share all the other functionality of
> a statement, it only handles the results. Separating things further
> seems like going from one extreme to the other.
>
> The practical consequence for the consumers is a little less concise
> code, because of an additional variable. Also, you have to choose
> between assigning that variable inside |while|, which is considered a
> bad practice by some people, and calling getNextTuple() in two different
> places. [1]
I had the same concerns, and was even proposing to use ExecuteStep()
here. But ExecuteStep() is begging to be replaced, and we are discussing
the new API. So we can just have the things right the first time.
> Ideally, handleResult() would take a JS array of storageITuple's and
> storageIResultSet wouldn't be needed.
We cannot communicate JS array via XPCOM to native callers, can we?
--
Sergey Yanovich
> ++here. And do we really need 'storageITutle'? Maybe we can just extend
> mozIStorageValueArray to have those nice two methods?
Can't - see Bug 435994 Comment 5 [1] for the detailed story as to why.
> We cannot communicate JS array via XPCOM to native callers, can we?
Not that I'm aware of.
Cheers,
Shawn
Error-case performance isn't something I think we need to be too
concerned with; if the error checking is in your hot path, you have
problems that the API isn't going to help you with.
Mike
> 2) The established nomenclature in existing interfaces uses "row", not
> "tuple". See mozIStorageStatementRow, mozIStorageStatementWrapper.row.
> "Row" is also used in descriptions of other attributes and methods.
It seems like vlad would prefer row over tuple as well. I'm fairly
indifferent about this.
> 3) Which methods of the callback are used by non-SELECT statements?
> handleError() and handleCompletion(), I assume?
That is likely the case.
> 4) I am not convinced of the value of separating the iterator
> (storageIResultSet) and a result (storageITuple). The iterator could
> easily provide the results, too. I know this approach causes horrible
> overloading of the statement object in case of synchronous statements.
> However, storageIResultSet doesn't share all the other functionality of
> a statement, it only handles the results. Separating things further
> seems like going from one extreme to the other.
I really don't like the idea of overloading an objects meaning like
that. I know we do that already with mozIStorageStatement, but it
doesn't mean we should continue to do that if it can be avoided.
> The practical consequence for the consumers is a little less concise
> code, because of an additional variable. Also, you have to choose
> between assigning that variable inside |while|, which is considered a
> bad practice by some people, and calling getNextTuple() in two different
> places. [1]
Not sure I'm buying this argument. What's wrong with:
handleResult: function(aResultSet)
{
for (let tuple = aResultSet.getNextTuple(); tuple;
tuple = aResultSet.getNextTuple()) {
// Stuff
}
}
or, even cleaner:
handleResult: function(aResultSet)
{
let tuple;
while ((tuple = aResultSet.getNextTuple()) {
// Stuff
}
}
Like any good rule, there are always exceptions to it, and in this case,
I don't see why assigning in while is bad.
> Ideally, handleResult() would take a JS array of storageITuple's and
> storageIResultSet wouldn't be needed.
That wouldn't work well for native code.
Cheers,
Shawn
We can, and in cases where the win is big enough should, specialize
for different consumer languages.
But an array of interface pointers isn't too hard to work with from
native code, IIRC, as long as it's an in-param.
Mike
> But an array of interface pointers isn't too hard to work with from
> native code, IIRC, as long as it's an in-param.
That involves passing the data and the size? Does that turn into a JS
array in JS land, or no?
Cheers,
Shawn
In our case, specializing for JS in mozStorage will add
mozStorage-over-JS dependency, right? Is that worth it?
--
Sergey Yanovich
It's already built after JS and XPConnect, and I don't think it's a
design goal at all to have parts of toolkit work without JS and
XPConnect present.
Mike