Try changing the last line to:
while(readBytes != 0);
_______________________________________________
Boost-users mailing list
Boost...@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users
It would probably help to understand that TCP has no concept of a
"message". Anything you write to a socket is appended to a stream of
*bytes*. Several subsystems on both the sending and receiving computers
will have the option of splitting and combining adjacent buffers with no
consideration to how big each individual write was. Read_some has the
option of reading any size up to and including the buffer size, but a
read of less than that size does not mean "end of message". It could
also mean "network congestion", "cable unplugged", or "Windows just felt
lazy".
On further thought, I think I see the problem (and I apologize for the
bad recommendation in my last email). Your sender somehow needs to
communicate the message size or flag the end of the message. A partial
list of options includes:
* Begin the message with a field specifying its total length. The
receiving loop must read this length and then count bytes until it has
the whole message, keeping in mind that each read_some can read any
number of bytes.
* Begin each message with a message id, where each message id has a
known length. Once you calculate the length, count bytes as I described
above.
* End the message with a terminator. You could set up a line-oriented
protocol where a newline terminates the read. With some thought, you
might think of some other terminating byte or string appropriate to your
protocol.
In all cases, unless your final read_some has a carefully controlled
size, remember that your buffer may contain the beginning of the next
message, or even multiple complete messages. If so, it is your
responsibility to retain this until you are ready to process the next
message.
This is exactly what "read_some" does -- it reads SOME data. It may
read even 1 byte.
If you know exactly how many bytes you expect to get, use read() free
function with the appropriate completion condition:
http://www.boost.org/doc/libs/1_46_1/doc/html/boost_asio/reference/read.html
"The function call will block until one or more bytes of data has been
read successfully, or until an error occurs."
"Remarks:
The read_some operation may not read all of the requested number of
bytes. Consider using the read function if you need to ensure that the
requested amount of data is read before the blocking operation
completes."
> I sent 8004 bytes to server through TCP connection and successfully read it like this:
> But sometimes this code reads just 3752 (precisely) bytes and returns. After that it handles another async_read and reads 4525 bytes (which in sum gives 8004 bytes).
Well since your buffer is only 128 bytes long, I hope that you are getting that amount or less from each read_some call. But in general, what you are seeing is normal TCP behavior.
Brad
--
Brad Howes
Calling Team - Skype Prague
Skype: br.howes
No, it really doesn't. All you get is a stream of bytes,
reliably delivered, in sequence.
So if you write 3,732 bytes onto a socket, there is *NO WAY*
for the reader to tell that you did that. The reader might
get 1 read of 3732 bytes, or 3732 reads of 1 byte, or anything
in between.
The reader could even read *more* than 3732 bytes in one read,
if you wrote more than once.
If you think you're going to get repeatable, identical matching
pairs of reads and writes out of just a TCP socket (that is,
without imposing your own protocol on top of TCP), you are
in for endless hours/days/months of frustration.
TCP does not have a concept of an empty write.
> TCP contains info about message length - it is duplication
> to prefix all messages with it's length.
Where did you read that? TCP has NO concept of "messages". As such, it
has no concept of "message length". It is only a stream of bytes. The
sending computer can easily combine the buffers from two consecutive
write calls into a single packet, or split the buffer from a single
write call into multiple packets, or both. In either case, ALL
information about the size of the original write call(s), the number of
write calls, and anything else that you hope will provide a clue about
"messages" will be lost. Likewise, the receiving computer can and will
freely combine and split packets into whatever buffers it sees fit, with
similar effects on any "message boundaries". The only thing that will
remain is the sequence of bytes.
Do not try to search for TCP options to change this behavior. The
closest you can come is options that will *usually* keep the message
boundaries. This means that your program will *usually* not crash.
If you wish to preserve message boundaries, then you MUST provide your
own message framing, just as you would if writing to a file. If you
prefix each message with its length, then you can use the read function
to ensure you get the whole message in one call, as you will know the
message length. This will also be effective at ensuring you don't have
the beginning of the next message at the end of your buffer.
Alternatively, we could say that TCP, in fact, does have a well-defined
concept of messages: they are all exactly one byte long.
> Several subsystems on both the sending and receiving computers
... and sometimes boxes in the middle ...
> will have the option of splitting and combining adjacent buffers with no
> consideration to how big each individual write was.
I've done a little protocol stuff with ASIO now and I must say it's a
lot of fun and I can't go back to doing it any other way.
> On further thought, I think I see the problem (and I apologize for the
> bad recommendation in my last email). Your sender somehow needs to
> communicate the message size or flag the end of the message. A partial
> list of options includes:
The pattern I encounter over and over again (often at multiple levels in
a protocol) is:
class protocol_layer_context
{
vector<uint8> buffer;
void on_received_data(vector<uint8> & rx_bytes)
{
buffer.append(rx_bytes);
// perhaps a virtual override.
size_t msg_len = this->parse_len_from_start_of_buffer();
if (buffer.size() <= msg_len)
{
vector<uint8> msg_buf =
consume_data_from_front_of_buffer(buffer, msg_len);
// perhaps a virtual override
this->process_complete_message(msg_buf);
}
// post another ASIO read request
this->request_more_data();
}
...
But there are some important issues with this naive pseudocode:
1. It can result in recopying the data a bunch of times for every
protocol layer, killing performance.
2. It's susceptible to a denial-of-service (DoS). Bad guy can send trick
you into allocating all your memory.
3. Sometimes the length of a message is stated at the beginning of the
message, sometimes it isn't known until the end.
4. No processing happens on the message until it's completely read, but
some protocols really need the receiving endpoint to process it
incrementally.
5. Error handling
6. Optimal threading
7. Etc.
We find bugs in exactly this logic all the darn time. Often the data
being received is untrusted and possibly malicious. Real-world protocol
implementations will commonly crash under fragmentation fuzzing,
sometimes resulting in exploitable security holes.
In a sense, this is the general refactoring problem of
'incrementalizing' a parsing function by moving all its state from stack
variables into a longer-lived context object.
We've seen it done successfully with coroutines, but that's not a
commonly accepted solution because, frankly, the native C/C++ runtimes
have not yet given coroutines the love (i.e., portability and
performance guarantees) they really deserved.
If someone figured out how to leverage generic techniques to handle just
the unidirectional message delimiting problem in a bulletproof way I
think it would make a really great boost library.
- Marsh
I read, in the rationale part of the documentation for ASIO, the following:
"Basis for further abstraction. The library should permit the development of
other libraries that provide higher levels of abstraction. For example,
implementations of commonly used protocols such as HTTP."
It seems like such an obvious thing to do: to write a class library that
contains classes that use the TCP capabilities of boost::asio to
automagically take data read from the socket and do whatever is needed. For
example, one might want to construct a series of http requests from the data
coming in on port 443, and be able to relate the addressing data in the
application layer to that in the TCP layer, and use that comparison to
determine whether to forward the request to server A or server B. One
reason for doing so would be for, for example, my own edification (and
anyone else interested in learning) about how the different OSI layers work.
Another would be for security purposes (e.g. to know whether or not an
authorized user's session has been hijacked).
It seems to me to be an obvious thing to do, but my question to you is "Do
you know of anyone who has done it?" (in some kind of open source project)
If not, do you know of resources available online where I could learn how to
do it? I am finding it hard to find resources that are useful: I have well
developed C++ skills, e.g. to write custom IO stream classes, but need some
guidance on how to proceed with the 'further abstraction' the docs mention,
and what the recommended best practices are specific to (high performance,
secure) networking program development.
You said, " I've done a little protocol stuff with ASIO now and I must say
it's a lot of fun and I can't go back to doing it any other way." How did
you get started on it? Did you use any documentation other than the asio
docs? Do you know of any documents (ideally online) that show how you could
use this stuff to thwart the major kinds of attacks that can be made on a
web server?
Thanks
Ted
TCP will not alter the byte stream, also meaning it will not drop bytes.
If the receive buffer fills, then it will tell the other machine it is
sending data too fast and needs to slow down. It will also have the
other machine resend the data that couldn't fit in the receive buffer.
Your programs (on both ends) will not need to address this issue; the
operating system will handle it for you.
That said, experimenting with the receive buffer size may improve
performance, but will have no effect on correctness. Don't assume more
is better. If you make the buffers too big, you'll just increase
overhead and latency.
>
> It seems to me to be an obvious thing to do, but my question to you is "Do
> you know of anyone who has done it?" (in some kind of open source project)
What about: http://cpp-netlib.github.com/
Apparently it will be submitted for review someday:
http://comments.gmane.org/gmane.comp.lib.boost.user/67431
>
> Thanks
>
> Ted
>
Jerry
Looks interesting. This page in particular looks like it's getting close
to what I was talking about:
http://cpp-netlib.github.com/0.9.0/message.html
I realize the project is new and the docs may not be complete, but every
other page seems to be about its HTTP implementation. Even the generic
basic_message class presumes a headers/body structure.
HTTP is often thought of as a half-duplex message/response protocol
because it (mostly) stateless and originally closed the connection after
every response.
I was interested more in a general facility for a common low-level
protocol buffering pattern.
> Apparently it will be submitted for review someday:
> http://comments.gmane.org/gmane.comp.lib.boost.user/67431
Cool.
- Marsh
> http://cpp-netlib.github.com/0.9.0/message.html
>
> I realize the project is new and the docs may not be complete, but every
> other page seems to be about its HTTP implementation. Even the generic
> basic_message class presumes a headers/body structure.
>
what I haven't found, yet, is a way to compare the IP info in the TCP
packest with the IP info in the HTTP headers. That is in particualr. Mre
generally, I am looking for an online resource for learning network
programming in general and security related network proramming in
particular.
Cheers
Ted
> Related to this, I wonder if there are any class libraries that facilitate
> processing these byte streams.
Have you looked at boost::serialization? There is an example in boost::asio on how to use them together.
Brad
--
Brad Howes
Calling Team - Skype Prague
Skype: br.howes
_______________________________________________
Sometimes a proxy will add something, but usually there aren't any IP
addresses in HTTP headers.
> That is in particualr. Mre
> generally, I am looking for an online resource for learning network
> programming in general and security related network proramming in
> particular.
That's interesting. There are resources about secure programming, and
securing networks, but I don't see much new stuff about basic network
programming. They are probably casualties of the trend to make all
communications run over HTTP(s).
I don't recall ever seeing a book or online resource saying "here's how
to accept data from the network and process it in the most scalable and
secure way using C or C++".
On the crypto side of things I recommend:
> http://www.amazon.com/Cryptography-Engineering-Principles-Practical-Applications/dp/0470474246
I tweeted your question:
https://twitter.com/marshray/status/68810041234432000
Got this recommendation, doesn't seem to be too related to network
programming though. Perhaps we'll get more.
> http://www.amazon.com/Memory-Programming-Concept-Frantisek-Franek/dp/0521520436
- Marsh