[Boost-users] [asio] skipping data in tcp stream

Roman Shmelev

unread,

Mar 29, 2009, 6:28:40 PM3/29/09

to boost...@lists.boost.org

HI!
What we have:
tcp connection, from which we read messages.
each message = header(fixed len) + body(random len)
header contains the length of body and crc to check if the header is
not corrupted.

Reading is done in two async steps - first the header is read, then
memory for body is allocated and the body is read.
Questions are:
1) if header is corrupted under some cases (header crc says body len
could be wrong) then as I understand we must skip all body data and
read the next message.
How to skip? async_read_until seemed to be the solution, but manual
told that it may read surplus data into streambuf - it will be hard to
deal with that data because I am reading directly from socket to some
allocated memory (as was told above)
2) actually, can it be that in tcp stream data will be corrupted? I
guess, yes - so header contains bodylen and crc for body and for
header? is such solution an overhead?
3) maybe I do some global design mistakes?

Thank you!
_______________________________________________
Boost-users mailing list
Boost...@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users

Rudolf Leitgeb

unread,

Mar 30, 2009, 3:34:32 AM3/30/09

to boost...@lists.boost.org

> 1) if header is corrupted under some cases (header crc says body len
> could be wrong) then as I understand we must skip all body data and
> read the next message.
> How to skip? async_read_until seemed to be the solution, but manual
> told that it may read surplus data into streambuf - it will be hard to
> deal with that data because I am reading directly from socket to some
> allocated memory (as was told above)

How do you know the number of bytes you have to skip? If you determine
that your header is corrupted, the length info can't be trusted any
more.

You'd have to employ some framing method for your packets so you can
reliably detect the next packet start. In this case you shouldn't skip
a given number of bytes but rather until a new frame start can be
detected.

> 2) actually, can it be that in tcp stream data will be corrupted? I
> guess, yes - so header contains bodylen and crc for body and for
> header? is such solution an overhead?

It's theoretically possible, but very unlikely. Note that TCP/IP
employs checksums itself and should retransmit any packet which is
detected as faulty. IIRC it's only a 16 bit checksum, so if you throw
completely mangled data every 65536th packet would make it through
the check sum (statistically of course). Note that you'd have to have
a very unreliable media for that to be of concern, something which
would make normal communication next to impossible.

> 3) maybe I do some global design mistakes?

Before you delve into check sum protection for you data too much,
you should check what TCP/IP already has to offer. Do you have an
analysis of your expected error pattern (bit errors, dropped bytes,
erased bits, bundle errors) ? What's the acceptable error rate of
the data you transfer ? What's the error rate of your TCP/IP channel ?

Roman Shmelev

unread,

Mar 30, 2009, 6:39:17 AM3/30/09

to boost...@lists.boost.org

> How do you know the number of bytes you have to skip? If you determine
> that your header is corrupted, the length info can't be trusted any more.
>
> You'd have to employ some framing method for your packets so you can
> reliably detect the next packet start. In this case you shouldn't skip
> a given number of bytes but rather until a new frame start can be detected.

I was thinking about it and got an idea, that I will read data until
byte with 0 value is detected
After this there will be a try to read the header. If the read header
is corrupted (probably we are on the
middle of the message body) then try again and again.. until normal
header is read.

> It's theoretically possible, but very unlikely. Note that TCP/IP
> employs checksums itself and should retransmit any packet which is
> detected as faulty. IIRC it's only a 16 bit checksum, so if you throw
> completely mangled data every 65536th packet would make it through
> the check sum (statistically of course). Note that you'd have to have
> a very unreliable media for that to be of concern, something which
> would make normal communication next to impossible.

> Before you delve into check sum protection for you data too much,

> you should check what TCP/IP already has to offer. Do you have an
> analysis of your expected error pattern (bit errors, dropped bytes,
> erased bits, bundle errors) ? What's the acceptable error rate of
> the data you transfer ? What's the error rate of your TCP/IP channel ?

I guess, connections will be very different - standard wire
connections, gprs, 3g, wifi..
And the aim is to provide max reliability with minimal cost - I try to
count every byte and so also thinking about need to implement own
additional checks for packet corruption.

Also I'm thinking about using UDP - as I understand, I will have lower
reliability, but I will not need to skip any data - each message is
delivered separately.

Nevertheless, Rudolf, thank you very much :)

One more: can boost::asio::async_read return without filling provided
buffer fully? I guess only in case of some error that will be set in
boost::system::error_code parameter.

Rudolf Leitgeb

unread,

Mar 30, 2009, 7:00:14 AM3/30/09

to boost...@lists.boost.org

> I was thinking about it and got an idea, that I will read data until
> byte with 0 value is detected
> After this there will be a try to read the header. If the read header
> is corrupted (probably we are on the
> middle of the message body) then try again and again.. until normal
> header is read.

This sounds like a protocol with message framing like HDLC.

> I guess, connections will be very different - standard wire
> connections, gprs, 3g, wifi..

But note that apart from TCP/IP capabilities some of these
transfer media have their own error detection or correction
schemes in their underlying layers.

> And the aim is to provide max reliability with minimal cost - I try to
> count every byte and so also thinking about need to implement own
> additional checks for packet corruption.

Do you have any practical tests which show excessive error rates
for one of these channels (apart from dropped packets) ?

If these are indeed you channels, I would expect very low error
rates. Chances are a decent check sum over your whole packet would
do the job. If a packet turns out corrupt, you may as well drop the
connection and start from scratch. If it's imperative that you
maintain the connection, packet framing might be inevitable. If
you expect rare bit errors, a simple FEC scheme might be the solution,
since it avoids a lot of protocol overhead for data retransmits.

> Also I'm thinking about using UDP - as I understand, I will have lower
> reliability, but I will not need to skip any data - each message is
> delivered separately.

UDP introduces a number of additional hurdles which TCP/IP handles
for you: correct ordering of packets, handling of dropped packets,
simpe bit error detection. Be sure you know these implications before
you drop TCP/IP

Igor R

unread,

Mar 30, 2009, 7:22:16 AM3/30/09

to boost...@lists.boost.org

> One more: can boost::asio::async_read return without filling provided
> buffer fully? I guess only in case of some error that will be set in
> boost::system::error_code parameter.

async_read has 4 overloads:
http://www.boost.org/doc/libs/1_38_0/doc/html/boost_asio/reference/async_read.html
2 of them have completion condtition parameter:
http://www.boost.org/doc/libs/1_38_0/doc/html/boost_asio/reference/async_read/overload2.html
http://www.boost.org/doc/libs/1_38_0/doc/html/boost_asio/reference/async_read/overload4.html

Reply all

Reply to author

Forward