Handling a StreamReader data buffer when calling .read() with no arguments

Wellington Cordeiro

unread,

Jul 30, 2015, 4:39:51 PM7/30/15

to python-tulip

I'm currently working on a project where we need to implement a protocol on top of TCP, we're attempting to use the streams API and we have success connecting to a server and sending messages over the socket. What's causing trouble though is receiving messages back, since we don't know ahead of time the size of the response we're calling read() with no arguments and we'll get the first response just fine, but the next response will be chopped up.

For example:

We send a message like
8=FIX.4.3\x019=63\x0135=5\x0149=DEMO.ZION2_P.FIX\x0156=ABFX\x0134=4\x0152=20150730-18:42:07.013\x0110=130\x01

to the server, and it responds with something similar in format with no trouble. However, when we send a second message, the response will be chopped into
just the "8" and then after we send a third message the other chunk of the second response will come through like

=FIX.4.3\x019=63\x0135=5\x0149=DEMO.ZION2_P.FIX\x0156=ABFX\x0134=4\x0152=20150730-18:42:07.013\x0110=130\x01

I can add some code examples if needed but I think I just need help with understanding how to read() continuously since the documentation examples are just reading
single lines or to the EOF.

Luciano Ramalho

unread,

Jul 31, 2015, 12:56:24 AM7/31/15

to Wellington Cordeiro, python-tulip

It seems to me you can't use .read() with no arguments to read data
that is not line-oriented and is not the whole transmission either.
You must use .read(N), where N is a number of bytes. Then you parse
what you get and decide on a suitable value of N for the next read.
Rinse and repeat.

It's late and I may be writing nonsense...

Best,

Luciano

--
Luciano Ramalho
| Author of Fluent Python (O'Reilly, 2015)
| http://shop.oreilly.com/product/0636920032519.do
| Professor em: http://python.pro.br
| Twitter: @ramalhoorg

Victor Stinner

unread,

Jul 31, 2015, 11:02:27 AM7/31/15

to Luciano Ramalho, Wellington Cordeiro, python-tulip

2015-07-31 6:56 GMT+02:00 Luciano Ramalho <luc...@ramalho.org>:
> It seems to me you can't use .read() with no arguments to read data
> that is not line-oriented and is not the whole transmission either.
> You must use .read(N), where N is a number of bytes. Then you parse
> what you get and decide on a suitable value of N for the next read.
> Rinse and repeat.

Exactly.

Victor

Guido van Rossum

unread,

Jul 31, 2015, 12:12:29 PM7/31/15

to Victor Stinner, Luciano Ramalho, Wellington Cordeiro, python-tulip

Perhaps better to use readexactly(N), which raises EOF instead of returning fewer than N bytes if it hits EOF early.

--

--Guido van Rossum (python.org/~guido)

Wellington Cordeiro

unread,

Jul 31, 2015, 12:40:28 PM7/31/15

to python-tulip, victor....@gmail.com, luc...@ramalho.org, willy...@gmail.com, gu...@python.org

I'm not sure I entirely follow though, if I don't know the size of the response ahead of time, how will readexactly(N) help me?

Gustavo Carneiro

unread,

Jul 31, 2015, 12:58:56 PM7/31/15

to Wellington Cordeiro, python-tulip, Victor Stinner, luc...@ramalho.org, Guido van Rossum

I think your problem is lack of understand of how binary protocols work. It is not an asyncio question (you would have the same issues with reading from plain old sockets).

Basically, your protocol should allow you to understand how big a message is just by reading the first few bytes of the message. Another approach is for the protocol to have markers telling you where each message ends.

Looking at your example, it sounds like you have the second approach. It is less efficient, but it can be done. Basically, read char by char until you find \x01, e.g.:

data = []

while True:

c = yield from stream.readexactly(1)

if c == b'\x01':

break

message = b''.join(data)

# process this message

If you are designing a protocol, rather than just parse it, I advise you to follow a Type/Length/Value structure, as it is easier to read and extend, see https://en.wikipedia.org/wiki/Type-length-value

--

Gustavo J. A. M. Carneiro

Gambit Research
"The universe is always one step beyond logic." -- Frank Herbert

Guido van Rossum

unread,

Jul 31, 2015, 12:59:40 PM7/31/15

to Wellington Cordeiro, python-tulip, Victor Stinner, Luciano Ramalho

Oh, that's usually part of the protocol. How would you tell that you've got the end of the message if you read a sequence of messages from a file? (Note: rhetorical question -- this is meant for you to think about the problem you are having so you can solve it yourself.)

Note that .read(N) reads at least one but at most N bytes, blocking at most once, so maybe you can do something with that.

Wellington Cordeiro

unread,

Jul 31, 2015, 1:16:03 PM7/31/15

to python-tulip, willy...@gmail.com, victor....@gmail.com, luc...@ramalho.org, gu...@python.org

The protocol doesn't give you a direct length, it gives you a few hints though. A message looks like

8=FIX.4.3\x019=63\x0135=5\

x0149=DEMO.ZION2_P.FIX\x0156=ABFX\x0134=4\x0152=20150730-18:42:07.013\x0110=130\x01

So we've got a 'tag=value' format with the '\x01' as a delimiter, the first tag/value chunk indicates the protocol version, the second indicates the character count
from it's end (so the 3 of 35=5) up to the delimiter before the 10=130\x01. Then the 10=130 at the end is a checksum of the whole message, up to but not including itself.

I calculate those with these methods.
https://bpaste.net/show/31f328cdc145

So there is some data that indicates a length of transmission, I'm just not sure how to use that.

Wellington Cordeiro

unread,

Jul 31, 2015, 2:11:27 PM7/31/15

to python-tulip, willy...@gmail.com, victor....@gmail.com, luc...@ramalho.org, gu...@python.org

I've figured out a solution. Thanks for the help guys, it led me in the right path, especially your suggestions Guido and Gustavo.

Reply all

Reply to author

Forward