Hi Alexander,
(For context, Alexander's working on the RBTools API work, started originally as a UCOSP project in 2010 and being hopefully completed for GSoC this year. See the proposal at
http://reviews.reviewboard.org/r/2458/diff/)
This design looks like a good start and I like that server-side introspection plays a part in this. I have some questions and comments about the design though, before much work goes into it. A lot of this is based on working with developing async APIs for other software. The e-mail is long, but don't let that scare you :) Just a braindump of thoughts with some code samples.
1) Method names for getters
In your example, you access "review_requests()." This is a bit odd for a function name. It'd be better to rename it to "get_review_requests". In fact, I'd much prefer any function that's going to fetch information be prefixed with "get_".
However, "review_requests" makes perfect sense as an attribute name. Below I talk a bit about a proposal for handling both async and sync methods. I think if you had an attribute wrapper for each get_* that fetches lists of resources, that would be awesome. So, "resource.get_review_requests()" and "resource.review_requests". It'll make more sense later I hope.
2) fire()
I think it is important to have something that executes the request and takes a callback for the result. I'm not sure of the current design of having another object with a method that must be called, and want to get your opinions on my counter-proposal.
Rather than certain functions returning a Request object, which the caller then must call fire() on, why don't the methods themselves take the callback and immediately execute the query? This would look like:
def on_get_root_done(root):
root.get_review_requests(on_get_review_requests_done)
factory.get_root(on_get_root_done)
Note that the callback is the parameter to the getter, instead of on a fire() method provided by an object returned by the getter. It's more clear and natural, as you know that function will fetch the results, rather than fetching something that isn't a result but is instead a proxy to the results.
3) Creating and saving resources.
Looking at the create() call, I sort of want a better way to prepare a request before sending to the server. Sort of like the attribute-based updates you show. Being able to create an instance of a review request, set the data there, and then push to the server would be nice.
I think what I'm thinking is something different from the create/update you have right now, which takes parameters and does a push. Instead, maybe you'd do:
def on_get_review_requests_done(review_requests):
review_request = review_requests.create()
review_request.summary = "..."
review_request.repository = repository_id
review_request.save(on_review_request_saved)
So create() wouldn't immediately create, but rather would create a new object that could be saved. save() would be the new replacement for the current create().fire() and update().fire().
This mirrors Django's own way of handling objects, for the most part.
Updating would work similarly. Set some attributes, then do save().
The advantage here is that the caller doesn't have to distinguish between creating new objects and updating them. A function may take a review request and populate it and save it, without having to know if we've already created this object or if it's an existing object.
I think it'd be perfectly fine for create() to actually be able to take some default parameters to make the initial setup easier. So the above could be:
review_request = review_requests.create(summary="...", repository=repository_id)
review_request.save()
Implementation would be straight-forward. It would just do something like:
def create(self, **kwargs):
resource = Resource(...)
resource.__dict__.update(**kwargs)
return resource
4) Pagination of resources.
One thing that'll be important is iterating over objects in a list properly without blocking. They're paginated in the payload, instead of being one large list of results. This means if you do:
for review_request in review_requests:
...
It would either not be able to return all results (only the first set fetched), or it would synchronously have to fetch new results.
The solution I believe we need to investigate is to use Python coroutines with generators. These, basically, will allow iteration or yielding of values coming from asynchronous queries. Remember that our goal is to not stall a UI mainloop by blocking the main/only thread, so as long as we can achieve that, we're in good shape.
I'm no expert in coroutines, but there are a couple things to look at:
http://www.python.org/dev/peps/pep-0342/ - The introduction of coroutines into Python (sadly, Python 2.4 does not include it, but 2.5+ does)
http://code.activestate.com/recipes/577129-run-asynchronous-tasks-using-coroutines/ - Some examples of different coroutines
This is all something to keep in mind when developing. However, I think it's reasonable that the first pass at this API just blocks and fetches the extra pages of resources synchronously for now. Just as long as we make a note of it (a big comment would be good, talking about what needs to be done).
Potentially, we could add a method that calls a callback for each item in the list as well. Not as pythonic, but would allow for truly asynchronous iteration across large numbers of results.
5) Synchronous vs. Asynchronous
I'm not entirely sure if this is touched upon in your examples or not, but I think it's important we allow for both sync and async usage. The reason being that complex apps (GUI apps largely) will need to be async (so as not to block the UI event loop), yet simple scripts (of which I expect there will be many) don't need to worry about blocking.
So the design should ideally account for both.
The functions that take a callback could be made to be smart and could handle both. Basic logic:
def get_something(self, done_func=None, force_async=False):
# Create the initially empty resource that will be filled in upon fetching.
resource = Resource()
if done_func or force_async:
# Handle this asynchronously and later call done_func with the result.
else:
# Handle this synchronously.
return resource
The caller would immediately have a resource object (which for async calls would be unloaded by default -- we could have a flag indicating whether it's loaded). If the caller provided a done slot, it'd run in async mode, calling the done slot when done. Otherwise, it'd be a synchronous call.
That way, both of these work:
def on_review_request_saved():
print "Saved!"
review_request.save(on_review_request_saved)
And:
review_request.save()
print "Saved!"
Note force_async. If there was some way to be notified on that returned object when it was loaded, then a caller could potentially load the resource without having to provide a done function.
Why is this useful? Well, you may have some wrapping code doing the complex work of fetching some resources, but it doesn't care when they're loaded. The caller cares, but not the wrapper. Maybe there's a debug console in a graphical app that's showing the resources as they load. The debug console could be passed these resources, listen itself for when they load, and display something. No need for the wrapper to be involved, aside from returning the resource. All it would have to do is pass force_async=True. Without that, it'd need to set up a potentially empty function to pass in order to fetch asynchronously.
(Hope that made sense -- It's late and I may be rambling.)
6) Getting individual objects
You asked how we should get a particular resource directly. I think there are two ways.
The first way is to have a get() function on a resource list (like review_requests above) that takes an ID and fetches it. Should be straight-forward. It assumes you have the list, and the ID.
The second is URI templates. I think what you have could work, but there are two changes I would make. "get_" prefix, and requiring keyword args for the parameters (in order to know what to fill in where). So:
factory.get_screenshot(on_get_screenshot_done, review_request_id=review_request_id, screenshot_id=screenshot_id)
Now that said, URI templates only get us to certain items. You can't fetch *anything* from here. But it might be enough.
7) Singleton resources
Something you'll come across and have to deal with are special kinds of resources. There are singleton resources (like review_requests/draft) which are not a list and are not ID-based. They either exist at that location or they don't, and they're one object rather than a list of objects. If this is flexible, it should be able to handle it fine, but that's something to be aware of.
8) "Factory"
While technically ResourceFactory is indeed a factory, I think it's a lot more clear if we use the name "ReviewBoardServer." It reads a lot nicer. server.get_root(), etc. In fact, I think it could even have the root pre-loaded and built-in, so instead of the extra step of getting the root, you could do: server.get_review_requests().
Thoughts?
I'd love to hear thoughts from other people on this list, too.
Christian
--
Christian Hammond -
chi...@chipx86.comReview Board -
http://www.reviewboard.org
VMware, Inc. -
http://www.vmware.com