Google Protocol Buffers...

Glenn H Tarbox, PhD

unread,

Jul 8, 2008, 12:46:59 PM7/8/08

to sage-f...@googlegroups.com, sage-d...@googlegroups.com

All,

I've done quite a bit of reading and checking on this since yesterday
and my conclusion is we should embrace Google Protocol Buffers (PB),
perhaps in general, but certainly for the time being with the OpenTick
and IB integration efforts.

There are 3 types of interactions with the server.

1. Conventional RPC with request/response. Essentially, the call
returns "immediately" from the server's perspective. The client
can block or implement asynchronous callbacks.
2. RPC with finite response - when the server itself needs to wait
for data from the IB application (because it needs to call IB
over the wan) the RPC return should be an id to be used to
associate callback based responses with the request. The client
will need to implement an API to be called back with the data.
Sometimes its a single callback, other times its more involved
(e.g. search responses, portfolio updates etc)
3. RPC with infinite response - infinite might be a bit larger than
I mean, but here I'm talking about things like market data where
we're registering a general callback for the data we're
"subscribing" to.

IB seems more straightforward so we'll start there. For the most part,
I think we can just map our api to IB's java api for EWrapper and
EClientSocket. Support objects like Contract, ContractLeg etc. can
likely be modeled using data structures. Convenience methods we provide
for specific languages to help construct these objects is a separate
issue.

The OpenTick architecture is a bit more complicated in that we have a
lot of options and cython. The good news is that the api is less
complicated than IB's because we're just grabbing market data, not
executing trades or getting related information.

With OpenTick, nothing has changed... yet. We should wrap the API and
get the data into cython / python. However, with OpenTick, performance
is critical (particularly if we ever start doing things with Xasax). At
that stage, we might want to extend the lower layers to provide a PB
transport to remote clients. This would mean pure C/C++ from the source
to the final consumer. For now, this isn't critical, but if we start
really suckin' from the pipe, it could make a huge difference.

We should give one look-see at the Java api for OpenTick although I
doubt it makes sense given C++ and cython... but we should look to make
sure.

-glenn

--
Glenn H. Tarbox, PhD || 206-494-0819 || gl...@tarbox.org
"Don't worry about people stealing your ideas. If your ideas are any
good you'll have to ram them down peoples throats" -- Howard Aiken

Chris Swierczewski

unread,

Jul 8, 2008, 1:19:18 PM7/8/08

to sage-f...@googlegroups.com

Concerning wrapping the opentick APIs, methinks I can get started on
that today. I'll start with finance.Stock as a framework and move from
there. Should be quite natural using cython.

I read a bit about those Google APIs. Not sure where exactly they
would be implemented into the opentick situation.

--
Chris Swierczewski
cswi...@gmail.com
mobile: 253 2233721

On Jul 8, 2008, at 9:46 AM, "Glenn H Tarbox, PhD" <gl...@tarbox.org>
wrote:

Glenn H Tarbox, PhD

unread,

Jul 8, 2008, 1:28:07 PM7/8/08

to sage-f...@googlegroups.com

On Tue, 2008-07-08 at 10:19 -0700, Chris Swierczewski wrote:
> Concerning wrapping the opentick APIs, methinks I can get started on
> that today. I'll start with finance.Stock as a framework and move from
> there. Should be quite natural using cython.
>
> I read a bit about those Google APIs. Not sure where exactly they
> would be implemented into the opentick situation.

That'll come later. All I was saying is that there may be clients which
need to suck from the pipe at a high rate. Since we can generate the
transport fairly simply into C++, we might want to extend the server
(later) to support no-python routing... meaning in from OpenTick C++
directly out to PB using generated C++...

So, for OpenTick, its do nothing... just something to keep in mind.

IB should probably go VFR direct to PB... its gonna make things easier
and perform better than XML-RPC.

-glenn

Brett Nakashima

unread,

Jul 8, 2008, 4:48:48 PM7/8/08

to sage-f...@googlegroups.com

I like this approach for IB and I think I'm going to start doing it this
way. The problem however, is that EWrapper is not a class, it's an
interface, so I think we'll have to write our own sage-specific
implementing class but I think this is what we've been talking about all
along. I also don't think we have send the the support classes, we can
just use the PB data structures. I'm going to be looking into doing
some of this today as well as reading over a lot of the Google PB RPC API.

-Brett

Glenn H Tarbox, PhD

unread,

Jul 8, 2008, 5:57:16 PM7/8/08

to sage-f...@googlegroups.com

On Tue, 2008-07-08 at 13:48 -0700, Brett Nakashima wrote:
> I like this approach for IB and I think I'm going to start doing it this
> way. The problem however, is that EWrapper is not a class, it's an
> interface, so I think we'll have to write our own sage-specific
> implementing class but I think this is what we've been talking about all
> along.

right. The interface needs to be implemented as an rpc to the actual
client... another case of the inverted client-server model... in this
case, for callbacks, the client has the server api for cases 1) and 2)
below

For ESocketClient (i think that's the name) its the inverse... but this
is all entirely consistent with the current IB api programming strategy.
We're simply using the "over the wire" protocol to do our language
boundary mapping as well.

The only logic outside the conventional API is to support multiple
clients. The nailup typically involves making a call to the "root"
object inside the server you're writing and passing in the return server
reference as a parameter. This approach is superior to "inferring" the
existence of a particular instance on the client from the address,port
alone. How we decide to actually implement this is open but the general
strategy is the (host,port,objectid) distributed object remote reference
scheme (same as CORBA, foolscap etc)

So, when a client registers for something which is going to get a
callback, the server should return, as a return value, an integer or
some other id which the client (with its server object) can use to match
to which callback this incoming "response" is related.

Your code needs to maintain a reference for each callback from IB to
which client gets the information. My recommendation is to simply
increment an integer with each new request... then the value of the
integer uniquely identifies the remote object which is to get the
response.

class root(object):

def __init__(self):
self.remoteRefs={}
...
def doSomething(self, remoteRef, *args, **kargs):

id=generatorClass.getNext()
self.remoteRefs[id]=remoteRef
self.IBSocketClient.doSomething(id, blah blah)

class brettIbProxy(EWrapper): # this would be java... whatever the
syntax is.. I forget

def doSomethingCallback(id, blah blah)
rref = remoteRefs[id]
rref.doSomethingCallbackNamedSomething(id, blah blah)

it gets tricky if we want to handle garbage collection etc... but we
don't need to worry that up front. Lets minimize the work on the Java
side for now and we'll fix it after we've done it wrong a few times.

> I also don't think we have send the the support classes, we can
> just use the PB data structures. I'm going to be looking into doing
> some of this today as well as reading over a lot of the Google PB RPC API.

right... my point was that the support classes need to be implemented as
a library for each language supported (should we choose to do so).
We'll probably do something for python... but that's another task
entirely....

Glenn H Tarbox, PhD

unread,

Jul 8, 2008, 8:54:58 PM7/8/08

to Sage Finance

Chris,

Once you figure out the fundamentals of getting OpenTick and cython
integrated, you and I should chat about how to handle the event loop.
The easiest thing is to use twisted and nail up the necessary file
descriptors to invoke your code from clients, opentick or a timer (the
latter likely just for general status checks etc).

I can write a simple wrapper for you when you get to that point.

-glenn

Chris Swierczewski

unread,

Jul 8, 2008, 11:47:54 PM7/8/08

to sage-f...@googlegroups.com

Glenn,

> Once you figure out the fundamentals of getting OpenTick and cython
> integrated, you and I should chat about how to handle the event loop.
> The easiest thing is to use twisted and nail up the necessary file
> descriptors to invoke your code from clients, opentick or a timer (the
> latter likely just for general status checks etc).
>
> I can write a simple wrapper for you when you get to that point.

I was contemplating that on my way home today. Some guidance would be
nice. I'll begin to review some twisted documentation when that time
comes so I can keep up with you.

--
Chris

Glenn H Tarbox, PhD

unread,

Jul 9, 2008, 12:29:10 AM7/9/08

to sage-f...@googlegroups.com

So, here's the thing. What you almost certainly have is code which
expects to block on a pipe during a read. The natural tendency will be
to spawn a thread. While that might be the way we go, it may not be
optimal because the callback will emerge into python in a thread which
isn't the main thread...

There are a lot of ways to deal with this. We can implement the
callback to queue data for pickup by the main thread... this is easy to
do either using standard synchronized python queues... or if we use a
"native" thread, using posix synchronization mechanisms. The twisted
way is to do a reactor.callLater(0.0,...) with the payload. This is
thread safe and when the reactor gets the thread back it'll "do the
right thing" in the main thread.

But....

What we'd really like to do is hand blocks of bytes into something which
does as much as it can. if its got a full message, we want it to hand
it back to us or execute a callback which, since its the main thread,
would be ok. If it needs more bytes (partial message), it just returns
having done nothing.

Of course, the code then needs to pick up where it left off with
additional data... unless the code is written that way, this might get
ugly. if its written, for example, as some kind of recursive descent
parser (unlikely, but it might be using the stack / program counter as
the "implied" state machine) we'll need to think about it some.

I should take a look at the code myself and see if this is easily
implemented.

There is another consideration... if written properly, its not the worst
thing in the world to use an actual "native" machine thread to suck from
the pipe and handle messages as they come in. Here we'll get multiple
cores working for us. and as the code is C/C++, and if we don't screw
it up, all we need to do is make sure that the handoff into the upper
levels is done correctly.... this is more than possible... but care is
necessary. methinks that this code is sufficiently segmented that it
might not be a problem.

ok, I'm gonna download the latest version now... gotta look...

-glenn

There would be a fancier method using generators but that's a construct
not available in C/C++.

>
> --
> Chris

Glenn H Tarbox, PhD

unread,

Jul 9, 2008, 12:51:04 AM7/9/08

to sage-f...@googlegroups.com

ok, belay all that... clearly there's a lot more going on in the code...

So, looks like we wrap OTClient... let it do whatever thread madness it
wants, and synchronize the callbacks. Seems that synchronizing the
callbacks would take the form of reactor.callLater's

the next step would be to take the message format and rewrite the
code... I doubt it would be that hard but it would be silly to take that
on now.

The only possible problem, and I doubt that this is gonna be a problem,
is if there's any blocking on the client side... I doubt there is...

-glenn

Reply all

Reply to author

Forward