ibPy behavior for streaming real-time market data

1,798 views
Skip to first unread message

waiwai

unread,
Jun 27, 2008, 5:56:05 AM6/27/08
to IbPy Discuss
Hi all,

I would like to understand ibPy behavior for the following scenario:

- reqMktData to watch 1 stock (e.g. stock1) in real-time
- register the callback function (generic_handler) through
ibconnection object
- the generic_handler takes 10 secs to process

Let say IB delivers the market data of stock1 every 1 sec, so that a
time series will look like this, d1, d2 ... d10 ... d11 ...
After receiving d1, the generic_handler is executed. What is the next
data that generic_handler will received from ibPy? d2 or d11 or d12?

Best Regards,

waiwai

Glenn H Tarbox, PhD

unread,
Jun 27, 2008, 2:42:53 PM6/27/08
to ibpy-d...@googlegroups.com
On Fri, 2008-06-27 at 02:56 -0700, waiwai wrote:
> Hi all,
>
> I would like to understand ibPy behavior for the following scenario:
>
> - reqMktData to watch 1 stock (e.g. stock1) in real-time
> - register the callback function (generic_handler) through
> ibconnection object
> - the generic_handler takes 10 secs to process

any delay is due to the time it takes IB to nail the stream up on their
end. IbPy is just sucking on the pipe waiting for data


>
> Let say IB delivers the market data of stock1 every 1 sec, so that a
> time series will look like this, d1, d2 ... d10 ... d11 ...
> After receiving d1, the generic_handler is executed. What is the next
> data that generic_handler will received from ibPy? d2 or d11 or d12?

Ticks are ticks as defined by IB. Which means they're intended to be
the "current state" of the market from a human viewer perspective.
Using a computer means you can do stuff faster / better, but the data
quality is intended for displays not high-rate algorithmic trading.

So, to answer your question, the latest tick is when you get it. You
assume that you receive in correct temporal order... which you do, BTW,
because its a TCP link. What IB is actually sending is entirely another
matter, but I'm pretty sure that the order is temporally corrrect.

What is rolled into the numbers you get, however, are another story
altogether.

For example, if you aggregate the "last size" values you won't equal the
volume updates you also get. This is because you're not getting true
market "tick" data (real tick data requires big pipes and generally a
machine at the exchange PoP)

-glenn

>
> Best Regards,
>
> waiwai
> >
--
Glenn H. Tarbox, PhD || 206-494-0819 || gl...@tarbox.org
"Don't worry about people stealing your ideas. If your ideas are any
good you'll have to ram them down peoples throats" -- Howard Aiken

grahamc001uk

unread,
Jun 27, 2008, 3:10:26 PM6/27/08
to ibpy-d...@googlegroups.com
On Fri, 2008-06-27 at 02:56 -0700, waiwai wrote:
> Hi all,
>
> I would like to understand ibPy behavior for the following scenario:
>
> - reqMktData to watch 1 stock (e.g. stock1) in real-time
> - register the callback function (generic_handler) through
> ibconnection object
> - the generic_handler takes 10 secs to process
>
> Let say IB delivers the market data of stock1 every 1 sec, so that a
> time series will look like this, d1, d2 ... d10 ... d11 ...
> After receiving d1, the generic_handler is executed. What is the next
> data that generic_handler will received from ibPy? d2 or d11 or d12?

Simplest solution is to use a Python Queue object as a FIFO buffer - put the quote data into the queue in the callback function and take it out of the queue in a separate thread. This way you can take as long as you want to process the data without holding up the processing of other callbacks.

Only takes a few lines of Python code to create a Queue object and a thread.

__________________________________________________________
Not happy with your email address?.
Get the one you really want - millions of new email addresses available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html

vfu...@gmail.com

unread,
Jun 28, 2008, 12:18:06 PM6/28/08
to IbPy Discuss
"Only takes a few lines of Python code to create a Queue object and a
thread."

Would you have any favored links for those still climbing up the
curve?

TIA,

Vince Fulco

waiwai

unread,
Jun 28, 2008, 1:00:15 PM6/28/08
to IbPy Discuss
Hi all,

Thanks for the clarification. So what I understood so far is that IBPy
does not use thread. IBPy waits for the completion of the callback
function before it retrieves another market data from IB. So in the
above scenario, I will get d11 for the next callback function call. Am
I correct? This guarantees that I will always get the current data
from IB at the current time.

Best Regards,

-waiwai

grahamc001uk

unread,
Jun 28, 2008, 4:06:17 PM6/28/08
to ibpy-d...@googlegroups.com
> "Only takes a few lines of Python code to create a Queue object and a
> thread."
>
> Would you have any favored links for those still climbing up the
> curve?
>
> TIA,
>
> Vince Fulco

Only the Python docs for Queue etc. http://docs.python.org/lib/module-Queue.html

Here's a stripped down version of some code I was trying during the week. Although being a Saturday I can't actually run this at the moment.

It's a trivial example to display some forex quotes, runs forever, type control-break or similar to stop it). This example doesn't actually need the extra thread but you get the idea if there were some time consuming processing:

from ib.ext.Contract import Contract
from ib.opt import ibConnection, message
from Queue import Queue, Empty
from time import time, sleep
from thread import start_new_thread

symbols = ["AUDUSD","EURUSD","GBPUSD","USDCAD","USDCHF","USDJPY","GBPJPY","EURJPY"]
q = [None] * len(symbols)

def writer_thread_func(symbol_index):
global symbols, q
symbol = symbols[symbol_index]

while True:
try:
(time,price) = q[symbol_index].get(True,15)
print symbol, time, price
# Other processing with the data that might take some time.
# ...
except Empty:
print "no data for 15 seconds"
# Other processing for when 15 seconds elapsed without any data.
# ...

def my_callback(msg):
global symbols, q
if isinstance(msg,message.TickPrice):
q[msg.tickerId].put_nowait((time(),msg.price))
#elif:
# other message types etc.


def make_contract(symbol,sectype,exchange,currency,expiry,strike,right):
cont = Contract()
cont.m_symbol = symbol
cont.m_secType = sectype
cont.m_exchange = exchange
cont.m_currency = currency
cont.m_expiry = expiry
cont.m_strike = strike
cont.m_right = right
return cont

con = ibConnection('127.0.0.1', 7496, 1)
con.registerAll(my_callback)
con.connect()

req_id = 0
for symbol in symbols:

q[req_id] = Queue(0)
start_new_thread(writer_thread_func,(req_id,))
symbol_contract = make_contract(symbol[0:3], 'CASH', 'IDEALPRO', symbol[3:6], '', 0.0, '')
con.reqMktData(req_id, symbol_contract, '', False)
req_id += 1

sleep(-1)

merlinson

unread,
Jun 28, 2008, 9:36:03 PM6/28/08
to IbPy Discuss
Hi guys,

I wrote a Message back in September about an experiment I did with a
second thread doing most of the work. It was called "Python is fast
enough for trading". The first thread was for watching TWS and
stuffing the quotes into a queue. The second thread took out the
quotes and did everything else including Black-Scholes option
calculations every tick. I couldn't get the queue above about zero
length even with a thousand quotes per second dumped into it from a
calculated random quote generator, the speed limit of my timer.
( Now that I think about it, I could have dumped multiple quotes at a
time into it) The slow part (cpu intensive) was writing data to the
screen.

I'm just back from R simulation land again. I found a promising
algorithm that has looked good on historical data so it's ready for
some real-time testing. ( Famous last words, lol) I expect to be
around more than every 6 months or so and look forward to joining in
the discussions again. I'm going to need a lot of help.

Best Regards

On Jun 28, 2:06 pm, grahamc001uk <grahamc00...@yahoo.co.uk> wrote:
> > "Only takes a few lines of Python code to create a Queue object and a
> > thread."
>
> > Would you have any favored links for those still climbing up the
> > curve?
>
> > TIA,
>
> > Vince Fulco
>
> Only the Python docs for Queue etc.http://docs.python.org/lib/module-Queue.html

Glenn H Tarbox, PhD

unread,
Jun 29, 2008, 2:26:34 PM6/29/08
to ibpy-d...@googlegroups.com
On Fri, 2008-06-27 at 19:10 +0000, grahamc001uk wrote:
> On Fri, 2008-06-27 at 02:56 -0700, waiwai wrote:
> > Hi all,
> >
> > I would like to understand ibPy behavior for the following scenario:
> >
> > - reqMktData to watch 1 stock (e.g. stock1) in real-time
> > - register the callback function (generic_handler) through
> > ibconnection object
> > - the generic_handler takes 10 secs to process

I would like to be clear here. What takes 10 seconds? it takes 10
seconds for IB to return a request to generic? It takes 10 seconds for
you to handle the generic callback? If so, is it because you're
blocking on another resource (database) or because you're compute bound?
If so, you're doing something fundamentally wrong.

> >
> > Let say IB delivers the market data of stock1 every 1 sec, so that a
> > time series will look like this, d1, d2 ... d10 ... d11 ...
> > After receiving d1, the generic_handler is executed. What is the next
> > data that generic_handler will received from ibPy? d2 or d11 or d12?
>
> Simplest solution is to use a Python Queue object as a FIFO buffer -
> put the quote data into the queue in the callback function and take it
> out of the queue in a separate thread. This way you can take as long
> as you want to process the data without holding up the processing of
> other callbacks.

Generally speaking, threads should be a last resort option. Other than
special cases, you can use asynchronous event handling instead which is
much more powerful and far less likely to introduce non-deterniministic
bugs.

So, there are 2 approaches. First, is to use the twisted ibpy
integration which is discussed in this group.


However, Recently, I came across PyProcessing... which has been accepted
as a PEP for python 2.7

http://pyprocessing.berlios.de/

http://pypi.python.org/pypi/processing

But, its amazing, and more importantly, can take advantage of multiple
cores. Python, while it has the "thread" programming abstraction, only
has one real machine thread because of the GIL.

The real magic for this application is that you could use PyProcessing
queues and avoid all the synchronization / locking / bug-introducing
nastiness of threads. And use multiple cores if you're compute bound.

-glenn

>
> Only takes a few lines of Python code to create a Queue object and a thread.
>
>
>
> __________________________________________________________
> Not happy with your email address?.
> Get the one you really want - millions of new email addresses available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html
>
> >

Glenn H Tarbox, PhD

unread,
Jun 29, 2008, 2:30:35 PM6/29/08
to ibpy-d...@googlegroups.com
On Sat, 2008-06-28 at 10:00 -0700, waiwai wrote:
> Hi all,
>
> Thanks for the clarification. So what I understood so far is that IBPy
> does not use thread. IBPy waits for the completion of the callback
> function before it retrieves another market data from IB. So in the
> above scenario, I will get d11 for the next callback function call. Am
> I correct? This guarantees that I will always get the current data
> from IB at the current time.

In general, top-posting is bad. If you're going to respond to a thread,
respond in-line. trying to follow this discussion is far more difficult
than it should be.

But, to answer the above... no.

Ibpy spawns a thread to listen on the connection from IB. the main
thread is available for your code.

Of course, this is all unfortunate. By calling back to your code from
another thread, you need to worry about all the general synchronization
issues...

The twisted integration uses a single thread for everything.


>
> Best Regards,
>
> -waiwai
>
>
> On Jun 28, 3:10 am, grahamc001uk <grahamc00...@yahoo.co.uk> wrote:
> > On Fri, 2008-06-27 at 02:56 -0700, waiwai wrote:
> > > Hi all,
> >
> > > I would like to understand ibPy behavior for the following scenario:
> >
> > > - reqMktData to watch 1 stock (e.g. stock1) in real-time
> > > - register the callback function (generic_handler) through
> > > ibconnection object
> > > - the generic_handler takes 10 secs to process
> >
> > > Let say IB delivers the market data of stock1 every 1 sec, so that a
> > > time series will look like this, d1, d2 ... d10 ... d11 ...
> > > After receiving d1, the generic_handler is executed. What is the next
> > > data that generic_handler will received from ibPy? d2 or d11 or d12?
> >
> > Simplest solution is to use a Python Queue object as a FIFO buffer - put the quote data into the queue in the callback function and take it out of the queue in a separate thread. This way you can take as long as you want to process the data without holding up the processing of other callbacks.
> >
> > Only takes a few lines of Python code to create a Queue object and a thread.
> >
> > __________________________________________________________
> > Not happy with your email address?.
> > Get the one you really want - millions of new email addresses available now at Yahoo!http://uk.docs.yahoo.com/ymail/new.html
> >

waiwai

unread,
Jun 29, 2008, 11:58:02 PM6/29/08
to IbPy Discuss
Hi Glenn,

Thanks for your advices and clarification. It is getting clearer now.
Responds to your questions are written inline below.

On Jun 30, 2:26 am, "Glenn H Tarbox, PhD" <gl...@tarbox.org> wrote:
> On Fri, 2008-06-27 at 19:10 +0000, grahamc001uk wrote:
> > On Fri, 2008-06-27 at 02:56 -0700, waiwai wrote:
> > > Hi all,
>
> > > I would like to understand ibPy behavior for the following scenario:
>
> > > - reqMktData to watch 1 stock (e.g. stock1) in real-time
> > > - register the callback function (generic_handler) through
> > > ibconnection object
> > > - the generic_handler takes 10 secs to process
>
> I would like to be clear here.  What takes 10 seconds?  it takes 10
> seconds for IB to return a request to generic?  It takes 10 seconds for
> you to handle the generic callback?  If so, is it because you're
> blocking on another resource (database) or because you're compute bound?
> If so, you're doing something fundamentally wrong.
>

It takes 10 seconds to handle the generic callback. However, this is
just an example. In fact, I would like to understand what is the next
tick the generic_handler will process.
My real concern is that, if IBPy spawns a new thread for every tick
and my generic_handler takes long time to process, then I need to make
sure that the generic_handler is thread-safe.

-waiwai
> > Get the one you really want - millions of new email addresses available now at Yahoo!http://uk.docs.yahoo.com/ymail/new.html

waiwai

unread,
Jun 30, 2008, 12:07:14 AM6/30/08
to IbPy Discuss
On Jun 30, 2:30 am, "Glenn H Tarbox, PhD" <gl...@tarbox.org> wrote:
> On Sat, 2008-06-28 at 10:00 -0700, waiwai wrote:
> > Hi all,
>
> > Thanks for the clarification. So what I understood so far is that IBPy
> > does not use thread. IBPy waits for the completion of the callback
> > function before it retrieves another market data from IB. So in the
> > above scenario, I will get d11 for the next callback function call. Am
> > I correct? This guarantees that I will always get the current data
> > from IB at the current time.
>
> In general, top-posting is bad.  If you're going to respond to a thread,
> respond in-line.  trying to follow this discussion is far more difficult
> than it should be.
>
> But, to answer the above... no.
>
> Ibpy spawns a thread to listen on the connection from IB.  the main
> thread is available for your code.

Hi Glenn,

Does it mean there are 2 threads used in IBPy? 1 thread to listen on
the connection from IB and 1 (and only 1) thread is used to call the
generic_handler? If there are more than 1 thread to execute the
generc_handler and if there are some shared variables used in
generic_handler then I think I'm screwed.

Regards,

-waiwai

Glenn H Tarbox, PhD

unread,
Jun 30, 2008, 2:46:56 AM6/30/08
to ibpy-d...@googlegroups.com
On Sun, 2008-06-29 at 21:07 -0700, waiwai wrote:
> On Jun 30, 2:30 am, "Glenn H Tarbox, PhD" <gl...@tarbox.org> wrote:
> > On Sat, 2008-06-28 at 10:00 -0700, waiwai wrote:
> > > Hi all,
> >
> > > Thanks for the clarification. So what I understood so far is that IBPy
> > > does not use thread. IBPy waits for the completion of the callback
> > > function before it retrieves another market data from IB. So in the
> > > above scenario, I will get d11 for the next callback function call. Am
> > > I correct? This guarantees that I will always get the current data
> > > from IB at the current time.
> >
> > In general, top-posting is bad. If you're going to respond to a thread,
> > respond in-line. trying to follow this discussion is far more difficult
> > than it should be.
> >
> > But, to answer the above... no.
> >
> > Ibpy spawns a thread to listen on the connection from IB. the main
> > thread is available for your code.
>
> Hi Glenn,
>
> Does it mean there are 2 threads used in IBPy? 1 thread to listen on
> the connection from IB and 1 (and only 1) thread is used to call the
> generic_handler?

the first thread is the main thread. Internally, IbPy spawns a
"listener" thread when you establish a connection with the IB Java
client (the term client can get heavily overloaded here)

> If there are more than 1 thread to execute the
> generc_handler and if there are some shared variables used in
> generic_handler then I think I'm screwed.

so, the way it works is, everything is asynchronous (other than 1 little
place during initialization... which made it tricky for me but is of no
consequence to you.)

The listener thread waits for stuff from IB, parses it into full blown
callbacks (meaning understanding the protocol, unwinding the strings
into objects and making the right call into the "well defined"
interface). In java this means subclassing. In python, you have more
options

The main thread (or could be another thread depending on how u manage
things... but it should be the "same" thread that does the initial
nailup with IB) is responsible for outgoing calls... which return
immediately cuz all they do is throw bits out the wire...

the listening thread is exactly the opposite.

The key to the protocol is a "half sync, half async" pattern... meaning,
you send asynchronously, maintain information regarding what you expect
back, and match things up when you get asynchronously called back.

which, btw, is the entire point of the numbering scheme in the
protocol.. each message received is associated with a request from the
client so you can hook the process together... and not block sucking on
a pipe... unfortunately, while threads seem natural for this,
synchronization can be a real headache unless you are careful. so,
locks etc become important

The core differences are that there are 2 fundamental types of
communications... some are "streams" which continuously call back...
others are responses to a request. Which is which is unamgibuous...
getting portfolio info is request/response (albeit with an unknown
number of callbacks)... tick data is straight callback associated with a
market data request.

BTW, all this is why twisted was invented. Using threads like this is
unnecessary and dangerous... especially as part of a larger system.
Threads have their place but few realize the pain associated with the
type of non-deterministic bugs which are almost impossible to track down
with threads.

IMHO, if you're doing anything even halfway complicated, spending the
time to understand the points underlying the twisted architecture is
time well spent.

this is an older but clean and easy architecture discussing the point of
asychronous programming and twisted

http://www.linuxjournal.com/article/7871

Reply all
Reply to author
Forward
0 new messages