I need some more thoughts about how to best do completion APIs.
A concrete example: We will emit an event when we need to locate a
certificate for a server. We want to allow a user of the library to
go off and do IO to fetch it, and then tell us when they selected the
certificate.
In traditional node.js style, this would look something like:
tls.on("want-certificate", function(done) {
doSomethingAsync(tls.sni_name(), function() {
var cert = ".... some how got the cert";
done(null, cert);
});
});
If done was called with a non-null first parameter, this would be used
to indicate error.
So, how is it best to do this in C?
I see two approaches.
The first is for each event type, we have a related function that
completes it, for example:
find_certificate(selene_t *s, void *baton) {
selene_cert_t * cert = find_cert_by_name(s, baton);
selene_have_certificate(s, SELENE_SUCCESS, cert);
};
init () {
selene_handler_set(s, SELENE_EVENT_WANT_CERTIFICATE, find_certificate, NULL)
};
So, once selene_have_certificate is called, it would kick off the
state machine to continue processing.
I tend to think this is error prone in some ways, an alternative would
to have find_certificate be passed an exact function to call:
typedef void (selene_completion_cb)(selene_t *ctxt,
void *baton,
selene_error_t *err);
find_certificate(selene_t *s, void *baton, selene_completion_cb*
completion, void *completion_baton) {
selene_cert_t * cert = find_cert_by_name(s, baton);
selene_set_certificate(s, cert);
completion(s, completion_baton, SELENE_SUCCESS);
};
Passing the completion function is more analogous to the done function
in javascript, and it seems to enable better reporting of an error
state. It feels like it would create a buden on users of the library
however to keep this function pointer and the baton around in their
own structs.
We still need a 'set' function for each want event type, if we don't
want to just be cast things to void*s everywhere. That would be one
alternative, having the completion callback change to a signature
like:
completion(s, completion_baton, SELENE_SUCCESS, (void*)cert);
I'm not really happy with any of these approaches right now.
Thoughts?
Thanks,
Paul
Careful here. When the callback is invoked, the cert might be ready
(as your example above). The callback is already within the
selene_start() state machine processing, and if you "continue" from
within selene_have_certificate() then you're re-entering that loop.
I suspect you do not want a re-entrant state machine processor.
If you did, recognize that find_certificate() above wouldn't even get
a chance to exit.
It seems that you'd just hang the cert off the selene context,
signaling "readiness" and when you dropped back to the state machine,
it could then move from PENDING_CERTIFICATE to whatever the next state
would be. (making that name up, of course) ... or heck, the
selene_have_certificate() could advance the state itself, but just not
loop.
> I tend to think this is error prone in some ways, an alternative would
> to have find_certificate be passed an exact function to call:
>
> typedef void (selene_completion_cb)(selene_t *ctxt,
> void *baton,
> selene_error_t *err);
This seems a bit silly. The function will always be the same thing,
and there is no reason to make it anonymous.
>...
> Passing the completion function is more analogous to the done function
> in javascript, and it seems to enable better reporting of an error
> state. It feels like it would create a buden on users of the library
> however to keep this function pointer and the baton around in their
> own structs.
You've only anonymized the function. It has no greater potential for
error reporting than the first approach.
>
> We still need a 'set' function for each want event type, if we don't
> want to just be cast things to void*s everywhere. That would be one
> alternative, having the completion callback change to a signature
> like:
> completion(s, completion_baton, SELENE_SUCCESS, (void*)cert);
I would recommend the N callback setters, each with their own rigid
prototypes. It is easy boilerplate code to write, that you only need
to do once, and comes with little maintenance overhead. But you get
stronger typechecks, and it is easier to search for callers for a
specific event type.
>...
It seems that you need to clarify the async completion's operation
with respect to selene_start(). For example, if find_certificate() is
going to take a minute to complete, will selene_start() exit because
it has nothing to do? Then the application is expected to call it
again, once the app has completed certain things (like calling
selene_have_certificate()). IOW, I'm wondering how similar
selene_start() is to serf_context_run() or pc_channel_run_events()?
If selene_start() does NOT exit, then what is it doing to block? And
if it blocks, then are you assuming multi-threaded operation?
Naturally, if locating the certificate requires (say) some UI, then
the application may well choose to be multi-threaded, but if
selene_start() does not exit, then the work of callbacks would almost
seem to require multi-threading for any kind of asynchronous
operation. An alternative is that selene_start() exits, the app works
on the stuff callbacks have flagged, and when ready, it jumps back
into selene_start().
Now... with all that said, I would ask: why does certificate location
need to be asynchronous in the first place? If the selene state
machine is blocked, pending that cert, then why does it need to be
asynchronous?
Let's say that the actual cert location *is* asynchronous... well,
find_certificate() could launch that, and then block pending its
completion, and then return the cert to selene. Again: does selene
have something to do which mandates the callback operate
asynchronously?
I'm not familiar enough with the underlying operations to know the
details here. Just working through the mechanics and flows. If selene
*does* have work to do while that cert is being fetched, that seems to
imply multiple states within the machine that are being processed. One
that continues to function, and one state blocked waiting for that
cert. If multiple states *are* inside of selene, then any kind of
asynchronous callback may need a baton to indicate which state is
associated with the completion. (this would be akin to a machine
processing N sockets, and the completion indicating <this> socket is
ready to advance). You were concerned about apps managing a
completion_cb/baton pair, so I call this out because (depending on
selene's internals w.r.t. states) you may already need apps to do
that.
Cheers,
-g
My wording is a confusing about this -- Selene itself is just a state
machine, it doesn't incorporate any IO loop.
This means calling back into Selene may advance the state machine, but
the user still will need to do IO operations themselves.
>> I tend to think this is error prone in some ways, an alternative would
>> to have find_certificate be passed an exact function to call:
>>
>> typedef void (selene_completion_cb)(selene_t *ctxt,
>> void *baton,
>> selene_error_t *err);
>
> This seems a bit silly. The function will always be the same thing,
> and there is no reason to make it anonymous.
Agree after more thought.
As I mentioned on IRC, Selene makes no system or socket calls itself,
its purely a state machine with notifications that 'hey you should
send this data somewhere'. My own confusion has been about weither
its okay to re-enter the state machine from a callback saying we
finished doing an operation on behalf of Selene, or if it made the
most sense in the long run to have a selene_run() type method, that
would try to advance the state machine. Having a selene_run method
somewhat seems like the difference between Edge and Level triggering
in epoll, and I'm concerned with going with a method that in high
concurrency servers would mean calling a method on every connection
without any real need, which has led me to wanting to make it as edge
triggered as possible, and make the completion methods re-enter the
state machine.
The certificate example is one of many future APIs for which I wanted
to follow the same pattern.
After more thought of how this would be used in the real world, I like
the first proposal, that is having a public API method that is used to
signal completion of finding a certificate, and for once that function
is called, Selene will continue processing down the state machine.
No worries!
>...
> My wording is a confusing about this -- Selene itself is just a state
> machine, it doesn't incorporate any IO loop.
Right. I was just unclear on how the external interactions fit in with
the selene "loop". But you've clarified for me that a "loop" doesn't
exist, and that's helped frame my thinking.
>...
> As I mentioned on IRC, Selene makes no system or socket calls itself,
> its purely a state machine with notifications that 'hey you should
> send this data somewhere'. My own confusion has been about weither
> its okay to re-enter the state machine from a callback saying we
> finished doing an operation on behalf of Selene, or if it made the
> most sense in the long run to have a selene_run() type method, that
> would try to advance the state machine. Having a selene_run method
> somewhat seems like the difference between Edge and Level triggering
> in epoll, and I'm concerned with going with a method that in high
> concurrency servers would mean calling a method on every connection
> without any real need, which has led me to wanting to make it as edge
> triggered as possible, and make the completion methods re-enter the
> state machine.
Gotcha. For the C10k problem, when you're not The Loop, then that
approach does seem to make sense.
> The certificate example is one of many future APIs for which I wanted
> to follow the same pattern.
>
> After more thought of how this would be used in the real world, I like
> the first proposal, that is having a public API method that is used to
> signal completion of finding a certificate, and for once that function
> is called, Selene will continue processing down the state machine.
I might suggest that the docstrings for the functions which manipulate
the state machine are clarified/grouped in some way. With the
namespace, a simple line in the docstring, a doxygen group, ...
whatever. As a user, I'd like to know those functions which will "run"
the machine, as opposed to those which simply modify settings (for
example).
While selene functions are generally "zero execution time" because
they won't block, the callbacks may have a cost. So if I call
selene_foo(), I'd like to know that it may call back to a heavier
function in my application. You're also designing for zero-time across
your callbacks, so I think that should be made explicit in those
callbacks' docco.
Cheers,
-g