Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Preventing Denial of Service Attack In IPC Serialization

12 views
Skip to first unread message

Le Chaud Lapin

unread,
May 27, 2007, 12:42:56 AM5/27/07
to
There are some problems that seem to have no good solution, and since
this is one of them, I decided to ask here rather than think too hard
about it myself. :)

I have a framework where I send strings between two nodes on a
network, serializing the strings through a Socket object:

Socket socket;
string s;

socket << s;

The obvious implementation of serializing a string is to have the
source first send the count of characters in the string, then the
characters themselves. The target will allocate a buffer to hold
"count" characters, then fill in the buffer with the actual characters
as they arrive from the target.

An attacker can wreak havoc with this model by injecting bogus packets
into the network to arrive at the target and present a "count" as a
very large number, say, 100,000,000. The target will unwittingly
invoke:

char *buffer = new char[100000000];

The attempt to allocate will either succeed or fail. If it succeeds,
100MB of virtual memory will be lost, which is, in a sense, worse than
if it fails.

I do have security mechanisms in my framework that eliminates this
problem, but there are scenarios where the user of my framework will
deliberately and necessarily choose not to enable the security
feature.

What then can I do to stop this problem?

I considered placing an artificial limit on allocation of memory for a
string or any other free-store-consuming object.
I also considered placing the entire thread that would invoke operator
new() on a kind of free-store limit, so that any attempt to breach
that limit would result in exception being thrown. Neither of these
solutions feel right.

My gut feeling is that I will eventually discover that no solution
feels right, but thought I would ask before giving up.

Any ideas?

-Le Chaud Lapin-


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Maciej Sobczak

unread,
May 27, 2007, 1:26:46 PM5/27/07
to
On 27 Maj, 06:42, Le Chaud Lapin <jaibudu...@gmail.com> wrote:

> The obvious implementation of serializing a string is to have the
> source first send the count of characters in the string, then the
> characters themselves. The target will allocate a buffer to hold
> "count" characters, then fill in the buffer with the actual characters
> as they arrive from the target.
>
> An attacker can wreak havoc with this model by injecting bogus packets
> into the network to arrive at the target and present a "count" as a
> very large number

[...]

> What then can I do to stop this problem?

Use hard limit on this count field and reject messages that do not
comply. Document this hard limit as part of the communication
protocol.
For those who want to take the responsibility on them, allow to modify
this limit by parameterizing the communication library (macro
definitions, configuration files, constructor parameters, etc.).

The following library is an example of using this strategy:

http://www.msobczak.com/prog/yami


In general, don't think that you should allow everybody to do
everything - there is absolutely no need to do so. Just set up your
rules and reject everything that looks strange. If there is a genuine
need for sending longer packets, users can either reconfigure the
library by using the limit parameter (and then it's *their* business
if they get into DOS) or they can send long content by chopping it
into smaller parts.
Another solution might be to use normal (short) messages to negotiate
opening a new dedicated and temporary channel for long content. This
provides nice hook for authentication and permission checks as well.

--
Maciej Sobczak
http://www.msobczak.com/

Branimir Maksimovic

unread,
May 27, 2007, 1:27:06 PM5/27/07
to

Don't allocate whole buffer immediately but realloc in chunks
as packets arrive. Let them send all 100 mb ;)
Also you can request ack from client
for every chunk sent, say I like every 512 bytes.
Since originating ip will probably be spoofed, protocol will break
on first ack.

Greetings, Branimir.

John Moeller

unread,
May 27, 2007, 9:24:58 PM5/27/07
to

I absolutely agree. Another thing you may want to consider is to get
away from the idea of a "secure mode" and offer several "tunable"
parameters. If the user wants to mess with the hard limit, they can do
so, at their own risk, but can leave the other parameters at their
defaults that help secure your protocol.

--

John Moeller
fish...@gmail.com

Le Chaud Lapin

unread,
May 28, 2007, 9:34:02 AM5/28/07
to
On May 27, 8:24 pm, John Moeller <fishc...@gmail.com> wrote:

> Maciej Sobczak wrote:
> > Use hard limit on this count field and reject messages that do not
> > comply. Document this hard limit as part of the communication
> > protocol.
> > For those who want to take the responsibility on them, allow to modify
> > this limit by parameterizing the communication library (macro
> > definitions, configuration files, constructor parameters, etc.).

So you are saying for all 190 C++ classes that I have that are
serializable, I should find a way to specify hard limits on the size
of what is being serialized, including not only strings, but a family
of containers that includes at least 30 containers? Do I specify a
maximum number of elements that can be serialized to/from the
container?


> > The following library is an example of using this strategy:
>
> >http://www.msobczak.com/prog/yami
>
> > In general, don't think that you should allow everybody to do
> > everything - there is absolutely no need to do so. Just set up your
> > rules and reject everything that looks strange. If there is a genuine
> > need for sending longer packets, users can either reconfigure the
> > library by using the limit parameter (and then it's *their* business
> > if they get into DOS) or they can send long content by chopping it
> > into smaller parts.

Let's say that I specify the hard limit for class String<> to be 4096
characters, an arbitrary but reasonable value. Let's say also that I
specify the number of elements in a List<> to be 65,636 elements,
again, an arbitrary but reasonable value. Calculating the maximum
amount of memory that can be consumed by a DoS attacker, we get
2^16*2^12 = 256 MB. So an attacker, using the system in a "safe"
mode, could easily break the model. It should be intuitively obvious
that it is impossible to have both generalized plurality and defense
against this type of attack simultaneously. Even with
parameterization of how much memory can be allocated, there is still
the question of where the user of my library should specify (and how),
what limits should be set. It should also be intuitively apparent
that there comes a point where, if the user of the library is so busy
putting checks in the code to limit this type of attack, the ease-of-
use is destroyed. And again, whatever values chosen would be
arbitrary, and because objects are hierarchical, with unpredictable
level of nesting, the whole thing would quickly turn into a monstrous
mess.

I would be very curious to know what Boost Serialization does in this
situation, if anyone knows.

> I absolutely agree. Another thing you may want to consider is to get
> away from the idea of a "secure mode" and offer several "tunable"
> parameters. If the user wants to mess with the hard limit, they can do
> so, at their own risk, but can leave the other parameters at their
> defaults that help secure your protocol.

The security mechanisms were not created for this problem. They were
created for the generalized problem of proving security in the
Internet (in a research sense). It is only by fortune that, if the
nodes are connected over a secure channel, the deliberate attacks are
no longer possible.

-Le Chaud Lapin-


--

Maciej Sobczak

unread,
May 28, 2007, 8:27:33 PM5/28/07
to
On 28 Maj, 15:34, Le Chaud Lapin <jaibudu...@gmail.com> wrote:

> > > Use hard limit on this count field and reject messages that do not
> > > comply. Document this hard limit as part of the communication
> > > protocol.
> > > For those who want to take the responsibility on them, allow to modify
> > > this limit by parameterizing the communication library (macro
> > > definitions, configuration files, constructor parameters, etc.).
>
> So you are saying for all 190 C++ classes that I have that are
> serializable, I should find a way to specify hard limits on the size
> of what is being serialized, including not only strings, but a family
> of containers that includes at least 30 containers? Do I specify a
> maximum number of elements that can be serialized to/from the
> container?

In this case you can avoid DOS attacks by using message headers with
authentication information and setting up a "circle of trust" in the
system.
In other words, you need to be able to tell whether the message is
valid or bogus *before* you come to the point where you dynamically
allocate buffers.
Alternatively, you can play tricks with both approaches - put hard
limit (even very small one) on messages that do not authenticate
("guests") and "no limit" on messages from trusted sources.

--
Maciej Sobczak
http://www.msobczak.com/

Sergey P. Derevyago

unread,
May 30, 2007, 9:28:14 AM5/30/07
to
Le Chaud Lapin wrote:
> My gut feeling is that I will eventually discover that no solution
> feels right, but thought I would ask before giving up.
>
IMHO you have to use some kind of digital signature. Corrupted
sequences will
also be filtered out.
--
With all respect, Sergey. http://ders.stml.net/
mailto : ders at skeptik.net

Zeljko Vrba

unread,
May 30, 2007, 4:48:35 PM5/30/07
to
On 2007-05-27, Le Chaud Lapin <jaibu...@gmail.com> wrote:
> very large number, say, 100,000,000. The target will unwittingly
> invoke:
>
> char *buffer = new char[100000000];
>
> The attempt to allocate will either succeed or fail. If it succeeds,
> 100MB of virtual memory will be lost, which is, in a sense, worse than
> if it fails.
>
-cut-

>
> What then can I do to stop this problem?
>

Leave the decision to the user. Make the user to implement function
with a signature like

void *allocate(int type, int nitems, int size);

where type is the object type being allocated (list, simple struct, ...),
nitems is the number of child items (if applies to the given type), and
size is the number of bytes to be allocated in this call. So the user
can collect statistics on each individual allocation type if he cares
about memory usage (and has the opportunity to say at some point "enough!"
by returning NULL pointer), or (if he's lazy) he can just implement it like

void *allocate(int, int, int size)
{
return malloc(size);
}

Or, as others have suggested, cryptography. You don't need DSIG, I believe
that a HMAC would be sufficient.

Le Chaud Lapin

unread,
May 31, 2007, 3:57:23 AM5/31/07
to
On May 30, 8:28 am, "Sergey P. Derevyago" <non-exist...@iobox.com>
wrote:

> Le Chaud Lapin wrote:
> > My gut feeling is that I will eventually discover that no solution
> > feels right, but thought I would ask before giving up.
>
> IMHO you have to use some kind of digital signature. Corrupted
> sequences will
> also be filtered out.

We have no problems with our secure links, which we already have.
With such links, there is nothing that a perpetrator can do to alter
or inject bogus packets into the communication stream to trick the
recipient of the packet in doing a massive new [], because the
security mechanisms, which includes digital signatures, will cause
packet to be dropped.

The problem is when the link is insecure. It ruins the entire
serialization framework. Note that ruin happens not just for strings,
but for any situation where there is a vector of elements, and the
source of an object is about to convey to the target that size of that
vector before serializing the individual elements of that vector.

Note again that this is a framework here, not a specific application,
so I cannot, for example, in the context of each serializable class,
specific an arbitrary limit on the number of elements involved,
because it would be, well...arbitrary. This is true especially if the
class contains a vector template, as it would not be known the size of
each element in the vector, so even if some arbitrary limit were set
for the size of the array, say 65,536, if each element of the array is
an object with multiple members, it is conceivable that one of those
members would be an array itself. This problem presents itself
recursively, so that, if N is limit on number of elements allowed to
be serialized for vector V, then recursively L levels, there would be
an exponential explosion in memory space required for new[] against a
Foo vector[N], equal to N^L, so that even for L=4 and N=65,536, N^L is
2^64, and we're back where we started.

But aside from the details, it should be intuitively apparent that
trying to put these artificial limits ruins the regularity of the
entire model, which again, is a framework and not a specific
application. As we all know, arbitrariness is a red-flag in good
design principles.

Consider defining the serialization function for a List<>:

Socket s;

List<Foo> l;

s << l;

One would not be able to specify a limit on the count of s without
knowing how much space Foo will take up. Big Foo, small limit. Small
Foo, large limit. Foo itself could contain members that contain
List<>, and so on, recursively.

The more I think about this problem, the more I am beginning to
believe that it is better to leave the classes themselves alone and
focus on the memory management itself. At least the regularity would
be preserved.

In that case, there are two possible "solutions", one that will not
work, the other that might:

The solution of putting a limit on the "archive" object (Socket in
this case) won't work because that will be meaningless for a long-
duration application that was meant to acquire and release terabytes
of memory throughout its natural life.

That leaves memory allocation against the thread itself. At any given
instant, on a server machine with 4GB of ram and 500 client connects,
if one server thread is hogging 2.5GB for itself, there is probably a
breach in progress. In that case, the memory allocation should fail
with an exception, the server thread will hard-abort, the evil client
connection will be broken, and the only entity unhappy at that point
will be the evil client.

Unfortunately, memory allocation quotas on on most OS's, if I am not
mistaken, are applied on a per-process, not per-thread, basis.

-Le Chaud Lapin-

Nevin :-] Liber

unread,
May 31, 2007, 9:21:47 AM5/31/07
to
In article <1180546963....@w5g2000hsg.googlegroups.com>,

Le Chaud Lapin <jaibu...@gmail.com> wrote:

> The problem is when the link is insecure. It ruins the entire
> serialization framework.

This problem has nothing to do with serialization per se (or even C++,
for that matter). You have input from an untrusted source. You have to
validate the heck out of it before you use it. Period.

--
Nevin ":-)" Liber <mailto:ne...@eviloverlord.com> 773 961-1620

Sergey P. Derevyago

unread,
May 31, 2007, 2:56:48 PM5/31/07
to
Le Chaud Lapin wrote:
> > IMHO you have to use some kind of digital signature. Corrupted
> > sequences will also be filtered out.
> >
> We have no problems with our secure links, which we already have.
> With such links, there is nothing that a perpetrator can do to alter
> or inject bogus packets into the communication stream to trick the
> recipient of the packet in doing a massive new [], because the
> security mechanisms, which includes digital signatures, will cause
> packet to be dropped.
>
> The problem is when the link is insecure.
>
Also the problem is when the link is unstable and therefore can deliver
corrupted Count members.
The signature will address both of these issues.

--
With all respect, Sergey. http://ders.stml.net/
mailto : ders at skeptik.net

--

jlin...@hotmail.com

unread,
May 31, 2007, 3:18:04 PM5/31/07
to

These lines are your problem:


Socket s(...);
s << l;


You're conflating serialization with transmission, and on the other
end, deserialization with reception. You need to serialize your data:

std::vector<char> data;
serialize(l, data);

, and then send it

s.send(data);

And on the other end, receive the data:

std::vector<char> data;
s.receive(data);

, and then deserialize it:

deserialize(l, data);

That gives you control over how much data moves in and out of your
application.

In the Socket::receive() function you put a limit on the number of
bytes you are willing to read off the network, and if you're still
suspicious, you allocate memory in smaller chunks and realloc as the
data comes in.

Jarl.

Le Chaud Lapin

unread,
May 31, 2007, 7:55:37 PM5/31/07
to
On May 31, 8:21 am, "Nevin :-] Liber" <n...@eviloverlord.com> wrote:
> This problem has nothing to do with serialization per se (or even C++,
> for that matter). You have input from an untrusted source. You have to
> validate the heck out of it before you use it. Period.

Actually, it does. If you have ever created a serialization
framework, you'd probably know that validation is not possible.

I am surprised at some of the responses here to be honest. It is
intuitively obvious to me that "validation" is not possible, and the
grab-and-realloc method facings the same issues - when is too much
memory too much? And given that class objects can be nested, per-
object limitations on allocated memory are completely arbitrary. This
should be evident from a container of objects.


map<int, map<int, map<int, map<int, Foo> > > > valid_object;

Socket s;

s << valid_object;

This is a reasonable piece of code in terms of the model it implies.
One can imagine how serialization for map<> might be implemented. If
Foo contains a string that is say, 800 bytes long, that is a
reasonable value for some strings. If the count of elements in a
map<> is 3500, that is a reasonable value for some map<>'s. But I can
take this structure and easily make its overall memory consumption on
the order of Gigabytes.

Then what? Should the internal map<>'s be artificially-intelligent
and say, "Uh oh...I detected that I am inside of something big and bad
going on..." Obviously they cannot. This will blow up at the server.

Furthermore, there is another problem that is insurmountable, I
think. It is highly reasonable that one legitimate client might
induce the server to allocate, say, 20 megabytes, on behalf of the
client. That 20 megabytes does not have to be allocated in one array-
chunk. It could be distributed over, say, the last 2000 objects
created by the server on behalf of the client.

I am going to repeat myself here because I can feel that, when I write
the last sentence, there are some reading this who are thinking, "Just
put limits on what's done."

I cannot do that. :) This is a serialization framework. I must be
able to provide a means of serializing an object, then hands off. If
I parameterize the allocation size, the size becomes arbitrary. And
given the nested map<>'s up above, it should be evident that, if I
make the per-array size too small, I deny legitimate clients. If I
make it too large, the malicious client can successfully attack. If I
choose something "reasonable", a malicious client can still attack.

An to reiterate, I have a secure-mode of operation where this issue is
not a problem.

The problem is when the link is insecure. And there are cases where
it is a legitimate necessity that the link be insecure.

I am beginning to think that a poor "solution" might be for the kernel
of the OS to allow per-thread quotas on allocated pages.

For those who keep saying, "Put a limit..", I encourage you to write C+
+ code to show how you would serialize a List<> template, and tell me
what limits you would use.

-Le Chaud Lapin-


--

Le Chaud Lapin

unread,
May 31, 2007, 7:56:27 PM5/31/07
to
On May 31, 1:56 pm, "Sergey P. Derevyago" <non-exist...@iobox.com>
wrote:
>

> Also the problem is when the link is unstable and therefore can deliver
> corrupted Count members.
> The signature will address both of these issues.
> --

Signatures are a form of security.

> > The problem is when the link is insecure.

What does one do when the link is insecure? (Ask in the spirit of
exploration).

-Le Chaud Lapin-

Branimir Maksimovic

unread,
Jun 1, 2007, 1:36:17 PM6/1/07
to
On Jun 1, 1:55 am, Le Chaud Lapin <jaibudu...@gmail.com> wrote:
> On May 31, 8:21 am, "Nevin :-] Liber" <n...@eviloverlord.com> wrote:
>
> > This problem has nothing to do with serialization per se (or even C++,
> > for that matter). You have input from an untrusted source. You have to
> > validate the heck out of it before you use it. Period.
>
> Actually, it does. If you have ever created a serialization
> framework, you'd probably know that validation is not possible.
>
> I am surprised at some of the responses here to be honest. It is
> intuitively obvious to me that "validation" is not possible, and the
> grab-and-realloc method facings the same issues - when is too much
> memory too much?

Realloc method will prevent attacker from allocating too much
memory in server by injecting packets (if somehow they can pass
router). Since these packets will break protocol and attacker cannot
establish connection, I can't see issue here.
So I assume that you are talking about legitimate, connected
clients that are trying to dos server or your application
accepts connections from any source and is not hidden
behind router.
If that is the case you can't prevent dosing without
imposing allocation limits as you can't prevent
users of any library to allocate all available memory.

>
>
.......


>
> The problem is when the link is insecure. And there are cases where
> it is a legitimate necessity that the link be insecure.

In other words you have to allow connections from any source?

>
> I am beginning to think that a poor "solution" might be for the kernel
> of the OS to allow per-thread quotas on allocated pages.

Or you can write per thread memory allocator.

>
> For those who keep saying, "Put a limit..", I encourage you to write C+
> + code to show how you would serialize a List<> template, and tell me
> what limits you would use.

There is always limit of available ram.
This is not about serialization nor security, but problem of
memory allocation. Whether you will allow all available
memory to be used or not. You can limit that by writing
custom allocators, don;t even have to limit per thread
but say per request. Just use allocator per request
that will limit available memory for request
to some reasonably large value.

Greetings, Branimir.

Lourens Veen

unread,
Jun 1, 2007, 1:34:14 PM6/1/07
to
Le Chaud Lapin wrote:
>
> An to reiterate, I have a secure-mode of operation where this issue
> is not a problem.
>
> The problem is when the link is insecure. And there are cases where
> it is a legitimate necessity that the link be insecure.

So, basically you're saying that:

- You want to avoid unauthorised clients inducing the server to
allocate lots of resources, which would constitute a DoS attack.

- You want to let authorised clients induce the server to allocate
lots of resources without impediment.

- You can't authenticate clients to differentiate between the two
cases.

I suggest magic.

Lourens

Nominal Pro

unread,
Jun 1, 2007, 1:37:52 PM6/1/07
to
On May 31, 6:55 pm, Le Chaud Lapin <jaibudu...@gmail.com> wrote:
> On May 31, 8:21 am, "Nevin :-] Liber" <n...@eviloverlord.com> wrote:
>
> > This problem has nothing to do with serialization per se (or even C++,
> > for that matter). You have input from an untrusted source. You have to
> > validate the heck out of it before you use it. Period.
>
> Actually, it does. If you have ever created a serialization
> framework, you'd probably know that validation is not possible.
>

Both of you are correct. The issue is not one of "validation" (if by
"validation" you mean sanity checks of the serialized data), but of
who is sending you that data (validating the sender, not the data
stream). Even IF the serialized data was correctly formatted and not
blatantly out of range, if it didn't come from the proper source, you
still have an avenue of attack. It might not result in a "denial of
service" attack, but the consequences of random data injection can be
equally bad.


> It is
> intuitively obvious to me that "validation" is not possible, and the
> grab-and-realloc method facings the same issues - when is too much
> memory too much?


The answer to that question lies somewhere outside of your code. It
might be safe to serialize 1Gb of objects to a server, but not to a
PDA.


> The problem is when the link is insecure. And there are cases where
> it is a legitimate necessity that the link be insecure.
>
> I am beginning to think that a poor "solution" might be for the kernel
> of the OS to allow per-thread quotas on allocated pages.


If it's insecure, then that's your answer: it's insecure. That means
injection attacks are possible, whether it's an attempt to force your
deserialization code to malloc too much, or something more subtle,
like bogus objects. Per-thread quotas on allocated pages is just an
attempt to move your heuristic sanity checks down into the OS. Those
sanity checks are not a substitute for validating your source and
preventing injection attacks. Use SSL tunneling or something similar.

Le Chaud Lapin

unread,
Jun 1, 2007, 7:13:58 PM6/1/07
to
On Jun 1, 12:37 pm, Nominal Pro <majorsc...@gmail.com> wrote:
> If it's insecure, then that's your answer: it's insecure. That means
> injection attacks are possible, whether it's an attempt to force your
> deserialization code to malloc too much, or something more subtle,
> like bogus objects. Per-thread quotas on allocated pages is just an
> attempt to move your heuristic sanity checks down into the OS. Those
> sanity checks are not a substitute for validating your source and
> preventing injection attacks. Use SSL tunneling or something similar.

Nice response, and I agree.

This leads us to a simple conclusion, was somewhat sure of when I
wrote the OP, but now I am certain of: one cannot have his cake and
eat it. Generalized serialization frameworks, the kind that many C++
programmers write, fail in the face of insecure IPC channels.

Being a researcher in computer networking, this is very troubling to
me. It means that the most wonder of feature of serialization,
obviation of microscopic attention to marshalling of data across the
channel, fails completely. On an insecure channel, every single
element just be range-checked, etc.

This means that if one wants to avoid DoS attacks, either through over
memory allocation or simple causing the server to choke on bad data,
one really should not use serialization at all over an insecure
channel.

-Le Chaud Lapin-

Le Chaud Lapin

unread,
Jun 1, 2007, 7:17:09 PM6/1/07
to
On Jun 1, 12:36 pm, Branimir Maksimovic <b...@hotmail.com> wrote:
> > The problem is when the link is insecure. And there are cases where
> > it is a legitimate necessity that the link be insecure.
>
> In other words you have to allow connections from any source?

Yes, that's what I keep saying. I have an IPC channel that has both a
secure mode and an un-secure mode. The secure mode provides rock-
solid security, in both directions. The un-secure mode provides
nothing. There are situations (just as exists in the Internet today),
where the un-secure mode is a necessary mode, but still provides some
value. It is the un-secure mode where there is a problem. My
contention is that using serialization on a socket that has not-yet-
been-secured is a bad idea, which is extremely unfortunate, IMO, as it
forces one to revert to picking apart every single vector whose size
is dynamic and potentially unlmited.

> > I am beginning to think that a poor "solution" might be for the kernel
> > of the OS to allow per-thread quotas on allocated pages.
>
> Or you can write per thread memory allocator.

Yes, but then that would ruin the serialization framework. I am too
lazy to prove this here, but think about how you would serialize an
object under Boost or MCF serialization or any other serialization,
and it should become clear very quickly that the code would become
intractable by providing a specialized memory allocator for every
serialized object, in addition to knowing just how much each object
should consume.

If this is not clear, think about it some more. :)

> There is always limit of available ram.
> This is not about serialization nor security, but problem of
> memory allocation. Whether you will allow all available
> memory to be used or not. You can limit that by writing
> custom allocators, don;t even have to limit per thread
> but say per request. Just use allocator per request
> that will limit available memory for request
> to some reasonably large value.

See above. Per request will render the serialization framework
intractable.

Best,

-Le Chaud Lapin-

Le Chaud Lapin

unread,
Jun 1, 2007, 7:13:00 PM6/1/07
to
On Jun 1, 12:34 pm, Lourens Veen <lour...@rainbowdesert.net> wrote:
> Le Chaud Lapin wrote:
>
> > An to reiterate, I have a secure-mode of operation where this issue
> > is not a problem.
>
> > The problem is when the link is insecure. And there are cases where
> > it is a legitimate necessity that the link be insecure.
>
> So, basically you're saying that:
>
> - You want to avoid unauthorised clients inducing the server to
> allocate lots of resources, which would constitute a DoS attack.
>
> - You want to let authorised clients induce the server to allocate
> lots of resources without impediment.
>
> - You can't authenticate clients to differentiate between the two
> cases.
>
> I suggest magic.

This is a most beautiful response.:) This is *exactly* what I have
been trying to say

It it is evident to me that, with no authentication, you cannot have
your cake and eat it. What you wrote above is inevitable.

What this means is that, any serialization framework, not just mine,
that claims that, "you can use it against sockets just as well as
files", is actually being somewhat dishonest. Again, I am curious to
know how Boost handles serialization of strings. What happens if I
want to serialize a 10,000-character string over a socket using
Boost's archive method.

Why is this important?

I means that, for all the applications on the Internet that uses
unprotected serialization of the kind provided by Boost,/etc...they
are all vulnerable to DoS attack.


All one has to do is super-saturate the server with bogus resource
consumption (memory allocation), and linger.

The most important observation, which I keep repeating, is that it
should also be evident that anything beyond a secure (authenticated)
connection won't work. It will result in quick and massive
degradation of the framework itself. For example, someone might
propose that the IP address of the server be checked, and if it makes
too many connections within a specified period, limit its memory
allocation. Or whatever.

It should be obvious that:

1. You are back to the original problem, which is "How much is too
much?"
2. There are legitimate cases to multiple connections.

One cannot have his cake and eat it without authentication.

If I were an evil person, I'd go hunting around the Internet finding
servers that use serialization against general-public links and do
naughty things to them. ;)

-Le Chaud Lapin

Branimir Maksimovic

unread,
Jun 2, 2007, 6:20:05 AM6/2/07
to
On Jun 2, 1:17 am, Le Chaud Lapin <jaibudu...@gmail.com> wrote:
> On Jun 1, 12:36 pm, Branimir Maksimovic <b...@hotmail.com> wrote:
>
>
> > > I am beginning to think that a poor "solution" might be for the kernel
> > > of the OS to allow per-thread quotas on allocated pages.
>
> > Or you can write per thread memory allocator.
>
> Yes, but then that would ruin the serialization framework. I am too
> lazy to prove this here, but think about how you would serialize an
> object under Boost or MCF serialization or any other serialization,
> and it should become clear very quickly that the code would become
> intractable by providing a specialized memory allocator for every
> serialized object, in addition to knowing just how much each object
> should consume.

I am not talking about anything I didn't already done.
I' have implemented serialization in a way from pdf document,
(I think that was first that presented serialization
with separate readers / writers so that serialization
is completely transparent)
Since it is transparent I use streambuf to send packets
via sockets and to receive at other end with internal
buffer protocol.
But allocator is even more transparent then that.
It uses thread specific storage to implement
per thread allocation and can easily be limited
to some maximum memory. When one thread allocates
and other frees, it simply switches blocks.
Since it replaces default global new, I cannot see an issue here?

>
> If this is not clear, think about it some more. :)

It is not clear, since per thread allocator doesn't
(nor way of writing and reading from sockets for that matter)
have to do anything with serialization.
If your single request requires all available ram
that means that server will be dosed by legitimate
clients sooner or later, by bugs or who knows what.

Greetings, Branimir.

Le Chaud Lapin

unread,
Jun 3, 2007, 6:44:27 AM6/3/07
to
{ Please confine responses to standard C++ or libraries of general
interest such as Boost. Thanks, -mod }

On Jun 2, 5:20 am, Branimir Maksimovic <b...@hotmail.com> wrote:
> On Jun 2, 1:17 am, Le Chaud Lapin <jaibudu...@gmail.com> wrote:
> But allocator is even more transparent then that.
> It uses thread specific storage to implement
> per thread allocation and can easily be limited
> to some maximum memory. When one thread allocates
> and other frees, it simply switches blocks.
> Since it replaces default global new, I cannot see an issue here?

That's interesting. How hard was it to make the per thread
allocator?

Just curious.

-Le Chaud Lapin-

co...@mailvault.com

unread,
Jun 3, 2007, 3:53:50 PM6/3/07
to
On Jun 1, 5:13 pm, Le Chaud Lapin <jaibudu...@gmail.com> wrote:
> On Jun 1, 12:34 pm, Lourens Veen <lour...@rainbowdesert.net> wrote:
>
> Again, I am curious to
> know how Boost handles serialization of strings. What happens if I
> want to serialize a 10,000-character string over a socket using
> Boost's archive method.
>
> Why is this important?
>
> I means that, for all the applications on the Internet that uses
> unprotected serialization of the kind provided by Boost,/etc...they
> are all vulnerable to DoS attack.
>

What is low class IMO is criticizing other attempts when
you have not published anything. I think the Boost library
has some weaknesses, but one nice thing about it is you can
use it. Do you plan to make available what you have been
describing?

Brian Wood
Ebenezer Enterprises

Branimir Maksimovic

unread,
Jun 4, 2007, 6:52:53 AM6/4/07
to
On Jun 3, 12:44 pm, Le Chaud Lapin <jaibudu...@gmail.com> wrote:
> { Please confine responses to standard C++ or libraries of general
> interest such as Boost. Thanks, -mod }
>
> On Jun 2, 5:20 am, Branimir Maksimovic <b...@hotmail.com> wrote:
>
> > On Jun 2, 1:17 am, Le Chaud Lapin <jaibudu...@gmail.com> wrote:
> > But allocator is even more transparent then that.
> > It uses thread specific storage to implement
> > per thread allocation and can easily be limited
> > to some maximum memory. When one thread allocates
> > and other frees, it simply switches blocks.
> > Since it replaces default global new, I cannot see an issue here?
>
> That's interesting. How hard was it to make the per thread
> allocator?

There is some work but it is not that hard.
Just implement casual allocator then make it thread specific.
Construct one either on thread creation or
first alloc and destruct on thread exit.
Keep global map/vector of pairs of thread ids and their
allocators for block transferring.
Transfer is pretty straightforward, first lookup
map for allocator to whom block belongs, if thread
is not there take ownership of block or return
to global allocator.
Each allocator is lock free, except when transferring
cached blocks or allocating/freeing blocks from
global allocator.
You can limit allocation by book keeping on alloc/free
operations how much memory is allocated, since
each thread specific allocator has state.

Greetings, Branimir.

jlin...@hotmail.com

unread,
Jun 4, 2007, 7:16:25 AM6/4/07
to
{ I'm sorry, but I see no C++-related content here. If there is,
please
just repost with explanation. -mod }

For the mods: just like the rest of this thread, this post concerns
the application of generalized C++ serialization frameworks, like the
one in Boost, to IPC applications.

>
> The most important observation, which I keep repeating, is that it
> should also be evident that anything beyond a secure (authenticated)
> connection won't work.


You're making a mountain out of a molehill here, Mr Rabbit :)

IPC systems commonly use a concept of a message, with a header and a
payload. Among other things, the header would contain the length of
the payload. When the server receives a message from a client, it
reads the header, and checks the payload length against a preset
limit. After that, it proceeds with deserialization of the payload,
and because it already knows the length of the payload, eg 4196 bytes,
it knows that it should not accept eg a single byte string claiming a
length of 5000, or a double byte string claiming a length of 2500,
etc.

Now if your serialization code blindly allocates buffers of arbitrary
size, then you obviously have a problem in your serialization code.
You need to improve it to be aware of the payload length of the
current message being processed. I'm curious as to why you think that
is a big deal?

Jarl.

Le Chaud Lapin

unread,
Jun 4, 2007, 7:18:05 AM6/4/07
to
On Jun 3, 2:53 pm, c...@mailvault.com wrote:
> What is low class IMO is criticizing other attempts when
> you have not published anything. I think the Boost library
> has some weaknesses, but one nice thing about it is you can
> use it. Do you plan to make available what you have been
> describing?

I never intended to denigrate Boost. I tried to point out that the
problem would manifest with any serialization framework, and that the
programmer should be aware of this.

I imagine a situation where Programmer B sees Programmer A using
serialization for, say, File I/O, and thinks, "Hmmm...I could do the
same thing for my Socket class as he is doing for his File class", and
proceeds to use the serialization library in a non-secure mode.
Naturally, when the problem that I described manifests, [DoS by
resource exhaustion], the serialization framework is not to be
blamed.

The fundamental issue is that, as Lourens Veen so succinctly pointed
out, when you use serialization in non-secure mode, you simply cannot
have your cake and eat it too. So if I berate Boost, then I berate
all serialization frameworks, including my own, that claim to be
useful in non-secure generalized IPC over some type of Socket class.
This is a very unfortunate, but I think it is important for
programmers to be aware of it, no matter how disappointing it is. It
is certainly very disappointing for me.

As for my work, I am on the final stretch, struggling through some
hairy maths. Should be at least a few months before things start
popping out for general consumption and criticism.

-Le Chaud Lapin-

Le Chaud Lapin

unread,
Jun 4, 2007, 5:35:45 PM6/4/07
to
On Jun 4, 6:16 am, jlind...@hotmail.com wrote:
> You're making a mountain out of a molehill here, Mr Rabbit :)
>
> IPC systems commonly use a concept of a message, with a header and a
> payload. Among other things, the header would contain the length of
> the payload. When the server receives a message from a client, it
> reads the header, and checks the payload length against a preset
> limit. After that, it proceeds with deserialization of the payload,
> and because it already knows the length of the payload, eg 4196 bytes,
> it knows that it should not accept eg a single byte string claiming a
> length of 5000, or a double byte string claiming a length of 2500,
> etc.

That's not really the problem. You are thinking of a single packet
representing information.

Let's have a more concrete example. Let's say a C++ programmer has
the task of defining serialization for a String class. This
programmer is told that the type used to represent the length of the
string is 'unsigned int'. We all know that, on many computers,
'unsigned int' is 32 bits, so that's 4,294,967,296 states, for maximum
length around 4.9 billion characters. Now technically, a program
running on a single computer would be able to allocate strings of this
size and use them with no problem, but 4.9 is quite a lot, so let's
reduce the amount that we might "typically" use in program to 1
million characters. Still extremely long, yes, but not so long as to
be inconceivable, at least in some context.

End the end, ones the entire string is serialized from one machine to
another, the string at the target node *will* have 1 million
characters allocated to it, no matter what intermediate steps were
used to encode the length, number of characters in a particular
packet, etc. For those that keep saying "just use realloc" - please
reconsider. Realloc those not get rid of the problem. The problem is
that, as Lorens Veen pointed out, at the end of the day, there will be
1 million-byte string at the target computer, or there will not be.

Now, if you nit-pick at every single serialization that involves
memory consumption on the target machine at a primitive-by-primitive
level, you _might_ be able to finally come up with a...ahem...solution
that does not look like two cats fighting.

But that is not the point of serialization. The point of
serialization has been...

"If you have a strings, and you want to serialize it from Node A to
Node B...write.."

Socket << s;

> Now if your serialization code blindly allocates buffers of arbitrary
> size, then you obviously have a problem in your serialization code.
> You need to improve it to be aware of the payload length of the
> current message being processed. I'm curious as to why you think that
> is a big deal?

Someone who writes the serialization function for a string will do
exactly that. They will blindly allocate little tiny 16-byte buffers
one by one until the entire 1-megabyte string is sent, using realloc
as some have suggested (...as if that would help).

If this is not clear, I suggest you think about how you would write
the serialization function that is to later be used by 100,000
programmers, and ask yourself if you will not have the problem
described here. Then, after that, imagine doing the same thing for
200+ basic C++ classes, and think about what the code will look like.

It will look like nothing, because there will not be any code. You
will not know what limits to place on length of string, number of
elements in a list<>, number of elements in matrix<>, number of
elements in a nonce<>...

-Le Chaud Lapin-

jlin...@hotmail.com

unread,
Jun 5, 2007, 11:36:30 AM6/5/07
to

>
> That's not really the problem. You are thinking of a single packet
> representing information.
>

It is because I'm using the concept of a "single packet", that the
problems you describe simply don't materialize.

If you insist on streaming your objects across a network without an
enveloping packet structure, then you will indeed have insurmountable
problems. But it is the lack of a packet structure that is your
problem, not the serialization framework.


> End the end, ones the entire string is serialized from one machine to
> another, the string at the target node *will* have 1 million
> characters allocated to it, no matter what intermediate steps were
> used to encode the length, number of characters in a particular
> packet, etc.

If you have a string on one machine that you want to send, that you
know is so big that it may well exceed the maximum packet size of your
network peer, then you need to send it in chunks, and extend your
network protocols to handle that. I.e. send a number of smaller
strings to the server, and ask it to assemble them. Once again, that
has nothing to do with the serialization framework.


>For those that keep saying "just use realloc" - please
> reconsider. Realloc those not get rid of the problem. The problem is
> that, as Lorens Veen pointed out, at the end of the day, there will be
> 1 million-byte string at the target computer, or there will not be.


Realloc actually _does_ get rid of the problem. If I know that the
packet I'm deserializing is 5000 bytes, then I can safely deserialize
e.g. a vector<T>, in a loop, until I've got all the elements, or there
was an error reading one of the objects (perhaps because there is no
more remaining data in the packet). In fact, if I have a lower bound
on the serialized size of T, I can use that to determine an upper
bound on how many T's could be remaining in the packet, and I can
vector::reserve() up to that amount and skip the realloc.

>
> But that is not the point of serialization. The point of
> serialization has been...
>
> "If you have a strings, and you want to serialize it from Node A to
> Node B...write.."
>
> Socket << s;
>

Once again, this is the root of your problems. Serialization is not
about sending from A to B, it is about converting a C++ object into a
sequence of bytes. Sending the serialized bytes across the network is
an orthogonal issue.

> Someone who writes the serialization function for a string will do
> exactly that. They will blindly allocate little tiny 16-byte buffers
> one by one until the entire 1-megabyte string is sent, using realloc
> as some have suggested (...as if that would help).
>

Once again, you are mixing serialization with sending.


> It will look like nothing, because there will not be any code. You
> will not know what limits to place on length of string, number of
> elements in a list<>, number of elements in matrix<>, number of
> elements in a nonce<>...


You're right, it does indeed look like nothing. That's because I have
a packet structure:

std::vector<T> vec;
packet >> count;
while ( vec.size() < count )
{
T t;
packet >> t;
vec.push_back(t);
}


Regards,
Jarl Lindrud.

Geert-Jan Giezeman

unread,
Jun 5, 2007, 11:50:52 AM6/5/07
to
Le Chaud Lapin wrote:

> End the end, ones the entire string is serialized from one machine to
> another, the string at the target node *will* have 1 million
> characters allocated to it, no matter what intermediate steps were
> used to encode the length, number of characters in a particular
> packet, etc. For those that keep saying "just use realloc" - please
> reconsider. Realloc those not get rid of the problem. The problem is
> that, as Lorens Veen pointed out, at the end of the day, there will be
> 1 million-byte string at the target computer, or there will not be.
>
> Now, if you nit-pick at every single serialization that involves
> memory consumption on the target machine at a primitive-by-primitive
> level, you _might_ be able to finally come up with a...ahem...solution
> that does not look like two cats fighting.
>
> But that is not the point of serialization. The point of
> serialization has been...
>
> "If you have a strings, and you want to serialize it from Node A to
> Node B...write.."
>
> Socket << s;

Would it not be an idea to limit the number of bytes that may be sent
through the socket? Instead of getting bytes from a socket directly,
have some filter that counts how many bytes have passed through and that
throws an exception if that count is larger than a user configurable number.

class MaxNFilter : public SocketInterface {
public:
MaxNFilter(size_t bytesLimit, SocketInterface &socket);
// pass on calls to socket, but throw 'TooManyBytes' if more than
// bytesLimit were read.
...
};

Geert-Jan Giezeman

Le Chaud Lapin

unread,
Jun 5, 2007, 3:46:42 PM6/5/07
to
On Jun 5, 10:36 am, jlind...@hotmail.com wrote:
> You're right, it does indeed look like nothing. That's because I have
> a packet structure:
>
> std::vector<T> vec;
> packet >> count;
> while ( vec.size() < count )
> {
> T t;
> packet >> t;
> vec.push_back(t);
>
> }

Do you still use the code above if count == 1024x1024x1024?

-Le Chaud Lapin-

Le Chaud Lapin

unread,
Jun 5, 2007, 10:36:18 PM6/5/07
to
On Jun 5, 10:50 am, Geert-Jan Giezeman <g...@cs.uu.nl> wrote:
> class MaxNFilter : public SocketInterface {
> public:
> MaxNFilter(size_t bytesLimit, SocketInterface &socket);
> // pass on calls to socket, but throw 'TooManyBytes' if more than
> // bytesLimit were read.
> ...
>
> };

That's an interesting idea and doable in my framework. I considered
it, then cast it aside in split second as I often do with ideas when
they seem suboptimal. In retrospect, it is looking increasingly like
the only one that will keep me sane.

The key is the what it means when bytesLimit is exceeded. Naturally,
resets will have to happen for the bytesLimit. Sending a few 100MB
file will ruin a bytesLimit that is defined and unchanged for the
lifetime of the socket.

Since I control the design of my Socket class, if I followed this
technique, I would integrate MaxNFilter directly into the socket
class.

It is certainly an idea worth exploring.

-Le Chaud Lapin-

jlin...@hotmail.com

unread,
Jun 5, 2007, 10:35:28 PM6/5/07
to
> > std::vector<T> vec;
> > packet >> count;
> > while ( vec.size() < count )
> > {
> > T t;
> > packet >> t;
> > vec.push_back(t);
>
> > }
>
> Do you still use the code above if count == 1024x1024x1024?
>

Absolutely. You'll get a no-more-data exception long before you get
that far, or else the packet actually is long enough to contain all
those elements, in which case we read them all in.

Jarl.

Le Chaud Lapin

unread,
Jun 6, 2007, 10:19:37 AM6/6/07
to
On Jun 5, 9:35 pm, jlind...@hotmail.com wrote:
> > > std::vector<T> vec;
> > > packet >> count;
> > > while ( vec.size() < count )
> > > {
> > > T t;
> > > packet >> t;
> > > vec.push_back(t);
>
> > > }
>
> > Do you still use the code above if count == 1024x1024x1024?
>
> Absolutely. You'll get a no-more-data exception long before you get
> that far, or else the packet actually is long enough to contain all
> those elements, in which case we read them all in.

Hmm..what do you mean by "no-more-data" exception?

Also, how does one send count=1024x1024 elements in your scheme?

-Le Chaud Lapin-

jlin...@hotmail.com

unread,
Jun 7, 2007, 5:05:53 AM6/7/07
to

>
> Hmm..what do you mean by "no-more-data" exception?
>

The line "packet >> t" will throw an exception once the data in the
packet has been consumed. Instead of deserializing from a socket, we
deserialize from a stream based on a fixed size buffer.

> Also, how does one send count=1024x1024 elements in your scheme?
>

In exactly the same way. Serialize a count and then serialize each
element in the container.

If we're talking about a single-byte string with 1024*1024 characters,
the resulting data packet will be a little bit larger than 1Mb. The
sender doesn't worry about that, but if the receiver has its maximum
packet size set to e.g. 1 Mb, then the packet, and the TCP connection,
will be discarded as soon as the packet length field is read. What the
client has to do then, is to resend the string, in several sub-1Mb
packets, and inform the receiver that the strings are to be assembled.

If the receiver has its maximum packet size set to e.g. 1.5Mb, then
the string will go right though in one transmission.

Jarl.

Le Chaud Lapin

unread,
Jun 7, 2007, 4:57:18 PM6/7/07
to
On Jun 7, 4:05 am, jlind...@hotmail.com wrote:
> If we're talking about a single-byte string with 1024*1024 characters,
> the resulting data packet will be a little bit larger than 1Mb. The
> sender doesn't worry about that, but if the receiver has its maximum
> packet size set to e.g. 1 Mb, then the packet, and the TCP connection,
> will be discarded as soon as the packet length field is read. What the
> client has to do then, is to resend the string, in several sub-1Mb
> packets, and inform the receiver that the strings are to be assembled.

Let us use the sub-1Mb method, so that the receiver is informed that
the strings are to be assembled. Let us also increase the size up a
notch so that it is a single-byte string of size 512x1024x1024, about
500 MB.

Does the receiver ultimately accept this 500MB, even use the sub-1Mb
method?

-Le Chaud Lapin-

jlin...@hotmail.com

unread,
Jun 8, 2007, 12:15:58 AM6/8/07
to
> Let us use the sub-1Mb method, so that the receiver is informed that
> the strings are to be assembled. Let us also increase the size up a
> notch so that it is a single-byte string of size 512x1024x1024, about
> 500 MB.
>
> Does the receiver ultimately accept this 500MB, even use the sub-1Mb
> method?
>

Sure. The higher-level chunking protocol would implement integrity/
authentication mechanisms to put an appropriate limit on DOS
vulnerabilities. I.e. as the substrings come in, we apply some
application specific logic to determine whether we consider the
situation reasonable or not. That logic has nothing to do with the
serialization of the substrings.

As I've repeated many times now, the whole issue here has nothing to
do with your serialization code. All your serialization code has to do
is make sure it doesn't blindly allocate memory of arbitrary size
(which it should obviously never do anyway), and you, as the
serialization framework user, just have to make sure you separate
deserialization from network reception, by applying a packet concept.

Jarl.

Le Chaud Lapin

unread,
Jun 8, 2007, 3:43:43 PM6/8/07
to
On Jun 7, 11:15 pm, jlind...@hotmail.com wrote:
>
> > Does the receiver ultimately accept this 500MB, even use the sub-1Mb
> > method?
>
> Sure. The higher-level chunking protocol would implement integrity/
> authentication mechanisms to put an appropriate limit on DOS
> vulnerabilities. I.e. as the substrings come in, we apply some
> application specific logic to determine whether we consider the
> situation reasonable or not. That logic has nothing to do with the
> serialization of the substrings.

Yes it does.

Let us get specific. How would you define serialization code for a
String class? If you prefer a different class, choose whatever. I
have 102 classes I just counted in my project for which I have defined
serialization code, so there is a reasonable chance that if you choose
something common, there were be overlap.

Also, the words "integrity" and "authentication" look suspect to me.
I was clear, in at least 3 of my posts, including the original posts,
that the very essence of my thesis is only applicable when there is no
security available. I was quite clear in stating that, if security is
allowed, then there is no issue.

> As I've repeated many times now, the whole issue here has nothing to
> do with your serialization code. All your serialization code has to do
> is make sure it doesn't blindly allocate memory of arbitrary size
> (which it should obviously never do anyway), and you, as the
> serialization framework user, just have to make sure you separate
> deserialization from network reception, by applying a packet concept.
>

Still not seeing what you see. How about some code.

-Le Chaud Lapin-

jlin...@hotmail.com

unread,
Jun 10, 2007, 5:15:19 AM6/10/07
to

>
> Let us get specific. How would you define serialization code for a
> String class?

As you wish. I have already shown you deserialization code for
std::vector. Here is totally analogous code for deserialization of
std::string .

std::string s;
packet >> count;
Buffer buffer(512);
while (s.length() < count)
{
packet >> buffer;
s += buffer.cpp_str();
}

If you think there is a DOS problem here, or with the deserialization
code of std::vector, you are welcome to point it out.

>
> Also, the words "integrity" and "authentication" look suspect to me.
> I was clear, in at least 3 of my posts, including the original posts,
> that the very essence of my thesis is only applicable when there is no
> security available. I was quite clear in stating that, if security is
> allowed, then there is no issue.

I was talking of things like checksums and acknowledgements. But fair
enough, let's disregard the chunking protocol. The senders will only
be able to send you messages that fit within the packet limit, say 1
Mb.

But please note, as I have shown you above: the serialization code of
std::string is totally agnostic to either chunking protocols or
message size limits. Those are application specific issues.


> Still not seeing what you see. How about some code.

It all boils down to the following point, which I keep on repeating
and you keep on studiously ignoring:

* You need to decouple deserialization from network reception.

I can't make it any clearer: if you put your serialized data in a
length-prefixed packet before sending it, and read the entire packet
before deserializing, your problems are going to disappear. If you
deserialize data straight out of a socket, with no idea how much data
the sender actually put on the wire, then you are indeed hosed. As you
point out yourself, your serialization code will be impossible to
write. But don't blame it on the serialization framework; blame it on
your lack of a packet based design.

Here's another way of putting it:

You cannot safely use generic deserialization code on data coming out
of an indefinite-EOF stream, such as a socket based stream. OTOH, as
I've shown you, you most certainly can use generic deserialization
code on definite-EOF streams, such as a stream based on a buffer of
data that has already been read, in its entirety, from a socket. Your
problem lies not in the deserialization code, but in the utilization
of an indefinite-EOF stream, as represented by your "Socket" object.

I am puzzled that you don't see the difference.

Jarl Lindrud.

Le Chaud Lapin

unread,
Jun 10, 2007, 3:27:40 PM6/10/07
to
On Jun 10, 4:15 am, jlind...@hotmail.com wrote:
> > Let us get specific. How would you define serialization code for a
> > String class?
>
> As you wish. I have already shown you deserialization code for
> std::vector. Here is totally analogous code for deserialization of
> std::string .
>
> std::string s;
> packet >> count;
> Buffer buffer(512);
> while (s.length() < count)
> {
> packet >> buffer;
> s += buffer.cpp_str();
>
> }
>
> If you think there is a DOS problem here, or with the deserialization
> code of std::vector, you are welcome to point it out.

The sender needs only to keep sending until the receiver is
saturated.

> But please note, as I have shown you above: the serialization code of
> std::string is totally agnostic to either chunking protocols or
> message size limits. Those are application specific issues.

The agnostic nature is the problem.

The serialization code for std::string would be the same code used if
the string is embedded in another object. Since the code is agnostic,
and would be used as is, it would be very easy for the sender to DoS-
attack the receiver:

A Gb/s Ethernet link can do 128MB/s or more. If 1024-byte chunks are
used (since Ethernet carries maximum payload of 1500 bytes) count is
set to be equal to 128MB x 16, that would be enough to lock up
available virtual memory on many machines on the Internet. If the DoS
attack is coming from an injected virus, the receiver would
eventually choke if the code unmodified.

So the programmer, without security, has to make a choice:

1. Use serialization and hope no one knows.
2. Avoid serialization and revert to incremental parameter checking,
and mitigate the memory allocation problem at least.

-Le Chaud Lapin-

jlin...@hotmail.com

unread,
Jun 11, 2007, 11:18:37 AM6/11/07
to
>
> > std::string s;
> > packet >> count;
> > Buffer buffer(512);
> > while (s.length() < count)
> > {
> > packet >> buffer;
> > s += buffer.cpp_str();
>
> > }
>
> > If you think there is a DOS problem here, or with the deserialization
> > code of std::vector, you are welcome to point it out.
>
> The sender needs only to keep sending until the receiver is
> saturated.
>

LOL. I am deserializing from a _packet_ ! A packet of fixed length,
completely unlike the socket that you are deserializing from. I am
guaranteed a successful reception or an EOF exception, without ever
reading more than e.g. 1 Mb from the client. The only DOS
vulnerability in sight is if my _application_ is reading an unlimited
number of strings, for reasons of its own. But that has nothing, I
repeat _nothing_, to do with the deserialization code of individual
strings. Do you not see that?

Do you really not see the difference between "socket >> s" and "packet
>> s" ?


>
> A Gb/s Ethernet link can do 128MB/s or more. If 1024-byte chunks are
> used (since Ethernet carries maximum payload of 1500 bytes) count is
> set to be equal to 128MB x 16, that would be enough to lock up
> available virtual memory on many machines on the Internet. If the DoS
> attack is coming from an injected virus, the receiver would
> eventually choke if the code unmodified.

What is it you don't understand about a length-prefixed packet? Why
would you have your receiver automatically read in all the data the
attacker is sending? And what does this have to do with the
deserialization code?

I've lost track of how many times I've repeated the following point to
you:

* You need to decouple deserialization from network reception.

Could you please address it! Please.


>
> So the programmer, without security, has to make a choice:
>
> 1. Use serialization and hope no one knows.
> 2. Avoid serialization and revert to incremental parameter checking,
> and mitigate the memory allocation problem at least.
>

You are speaking here only of your own framework. It is totally untrue
in general.

Jarl Lindrud.

Sebastian Redl

unread,
Jun 11, 2007, 11:24:58 AM6/11/07
to
On Sun, 10 Jun 2007, Le Chaud Lapin wrote:

> A Gb/s Ethernet link can do 128MB/s or more. If 1024-byte chunks are
> used (since Ethernet carries maximum payload of 1500 bytes) count is
> set to be equal to 128MB x 16, that would be enough to lock up
> available virtual memory on many machines on the Internet. If the DoS
> attack is coming from an injected virus, the receiver would
> eventually choke if the code unmodified.

Unless your serialization code compresses data (say, by using RLE on
arrays), there is a direct correlation between the size of an object
(including all its subobjects) and its serialized form.

If the networking code requires that the size of the serialized data be
sent first, it can make a decision to accept or reject the data without
ever allocating memory for it.

But, and this is something we keep trying to tell you, this is _completely
independent of the serialization code_. It's strictly the decision of the
networking code to set limits on how much data an untrusted connection can
send and how many untrusted connections are accepted at any single time.
(And in turn, the networking code should let the user configure these
parameters, because the values depend on the application and available
resources.)

Sebastian Redl

Le Chaud Lapin

unread,
Jun 11, 2007, 5:41:36 PM6/11/07
to
On Jun 11, 10:24 am, Sebastian Redl <e0226...@stud3.tuwien.ac.at>
wrote:

> But, and this is something we keep trying to tell you, this is _completely
> independent of the serialization code_. It's strictly the decision of the
> networking code to set limits on how much data an untrusted connection can
> send and how many untrusted connections are accepted at any single time.
> (And in turn, the networking code should let the user configure these
> parameters, because the values depend on the application and available
> resources.)

This does not make sense in the context of the problem that I have
presented.

You write "at any single time...", but I am not talking about per-
packet sends. Yes, in my original post, I used an example were
operator new () would be applied to a just-received scalar to allocate
a buffer to be read in. I only used this to avoid the (somewhat
weaker) problem of blind building of state at the receiver by
direction of the sender.

I am still waiting for someone to show me how they would "limit" data
by the resources. Again, I am not talking about packets. I am
talking about C++ objects that are to be serialized, objects of
arbitrary complexity.

-Le Chaud Lapin-

Le Chaud Lapin

unread,
Jun 11, 2007, 5:43:04 PM6/11/07
to
On Jun 11, 10:18 am, jlind...@hotmail.com wrote:
> LOL. I am deserializing from a _packet_ ! A packet of fixed length,
> completely unlike the socket that you are deserializing from. I am
> guaranteed a successful reception or an EOF exception, without ever
> reading more than e.g. 1 Mb from the client. The only DOS
> vulnerability in sight is if my _application_ is reading an unlimited
> number of strings, for reasons of its own. But that has nothing, I
> repeat _nothing_, to do with the deserialization code of individual
> strings. Do you not see that?

Why are you doing that? I mentioned that I was deserializing from a
socket, not a packet.

> Do you really not see the difference between "socket >> s" and "packet
>
> >> s" ?

Yes I do. I was referring to deserialization from a socket.

> What is it you don't understand about a length-prefixed packet? Why
> would you have your receiver automatically read in all the data the
> attacker is sending? And what does this have to do with the
> deserialization code?

In most serialization frameworks, when a programmer defines the
serialization code for a class, that code is written independently of
the "Archive" class that is being serialized to/from.

No matter what is done with a packet, it is conceivable to serialize a
1MB string object to/from a Socket Archive. There would be contexts
where this is legitimate, and contexts when it is not. In context
where 1MB would be legitimate, where sender is a friend, serialization
is ideal because it frees the user of the serialization code from
tedium of fix-sized arrays. As soon as the sender becomes foe, DoS
becomes a real issue. That same code would not be usable as written.
It would have to be replaced with code that uses fixed-size arrays,
and checks would be made. So the serialization code, which would
normally have been universally applicable, becomes not.

> I've lost track of how many times I've repeated the following point to
> you:
>
> * You need to decouple deserialization from network reception.
>
> Could you please address it! Please.

I empathize with your frustration. ;)

Let's say I have a class Archive which is a base class to which things
can be serialized to/from

class Archive {} ;

Then I have a class File that derives from Archive:

class File : protected Archive {} ;

I overload operator << for String and Foo

Archive &operator << (Archive &, const String &);
Archive &operator << (Archive &, const Foo &);

Are you implying that if I define a class Socket

class Socket : public Archive {}

...that, in general, I can use the same operator << for File, Archive,
and Socket, without modification?

That is the problem that I brought forth.

-Le Chaud Lapin-

jlin...@hotmail.com

unread,
Jun 12, 2007, 2:51:02 AM6/12/07
to

>
> Why are you doing that? I mentioned that I was deserializing from a
> socket, not a packet.

But _why_ do you deserialize out of a socket, when it has
insurmountable problems like these associated with it?

And even in secure mode, the problems remain. How do you handle a
misguided but wellmeaning sender who starts sending you a 1Gb string?
Does your receiver just stand there and say "This connection is
secured, so I will swallow every byte that comes down" ?

>
> No matter what is done with a packet, it is conceivable to serialize a
> 1MB string object to/from a Socket Archive. There would be contexts
> where this is legitimate, and contexts when it is not.

So despite the insurmountable problems you've found, you still find it
conceivable to serialize to/from a socket archive? In what context
would it possibly be legitimate?

>
> Are you implying that if I define a class Socket
>
> class Socket : public Archive {}
>
> ...that, in general, I can use the same operator << for File, Archive,
> and Socket, without modification?
>

What makes you think you can entwine serialization with storage/
network transmission/etc. , in the first place? Serialization is
conversion into a sequence of bytes, nothing more, nothing less.


> That is the problem that I brought forth.
>

Well, the problem you brought forth was stated like this:

"I means that, for all the applications on the Internet that uses
unprotected serialization of the kind provided by Boost,/etc...they
are all vulnerable to DoS attack."

, and is patently untrue. It only applies to frameworks that naively
conflate deserialization and network reception.

Jarl Lindrud.

kouznetso...@gmail.com

unread,
Jun 12, 2007, 2:50:41 AM6/12/07
to
Le Chaud Lapin wrote:
> No matter what is done with a packet, it is conceivable to serialize a
> 1MB string object to/from a Socket Archive. There would be contexts
> where this is legitimate, and contexts when it is not. In context
> where 1MB would be legitimate, where sender is a friend, serialization
> is ideal because it frees the user of the serialization code from
> tedium of fix-sized arrays. As soon as the sender becomes foe, DoS
> becomes a real issue. That same code would not be usable as written.
> It would have to be replaced with code that uses fixed-size arrays,
> and checks would be made. So the serialization code, which would
> normally have been universally applicable, becomes not.

You keep saying that everything is fine with authenticated good
clients and bad with non-authenticated malicious ones. Why do you
think that control of scarce resources is only applicable when you
need protection from attacks? What if your good authenticated users
consume resources extensively? Say, one good user serializes a map of
maps of strings with total size of 2GB, then your second good user
won't be able to allocate significantly smaller amount of data. Your
first good client may be greedy or just may require serialization of
extra data by mistake.

thanks,
Vlad

Joe

unread,
Jun 12, 2007, 12:29:31 PM6/12/07
to
>
> Let's say I have a class Archive which is a base class to which things
> can be serialized to/from
>
> class Archive {} ;
>
> Then I have a class File that derives from Archive:
>
> class File : protected Archive {} ;
>
> I overload operator << for String and Foo
>
> Archive &operator << (Archive &, const String &);
> Archive &operator << (Archive &, const Foo &);
>
> Are you implying that if I define a class Socket
>
> class Socket : public Archive {}
>
> ...that, in general, I can use the same operator << for File, Archive,
> and Socket, without modification?
>
> That is the problem that I brought forth.
>
> -Le Chaud Lapin-
>
>

You seem to want to use the same syntax to serialize to everything. Although

I do not understand all the issues that have been brought up about the
coupling of the serialization and transmission concepts, you could do
something similar to the following to "have your cake and eat it too";


class Socket:public Archive{};

template<typename T> Socket& operator<<(Socket& socket, const T& t){

ostringstream oss();
oss << T;

socket.send(oss.str().size());
socket.send(oss.str());

return s;

};


template<typename T> Socket& operator>>(Socket& socket, T& t){

string sBuf;
size_t n;

socket.get(&n);
sBuf.resize(n);
socket.get(&sBuf);

istringstream iss(sBuf);

iss >> T;

return s;


}

[note - not quite sure about your class hierarchy- the above functions may
need to be member function - and virtual]


You could use the same syntax. There is ample opportunity to some error
checking in both operator<< and >> functions. you could also get your
download string in chucks as well (not shown).

This seem to separate serialization and transmission while allowing you to
use the same syntax. Yes/No ?

Does this not satify everybody's requirements?


Joe

Le Chaud Lapin

unread,
Jun 12, 2007, 6:35:40 PM6/12/07
to
On Jun 12, 1:51 am, jlind...@hotmail.com wrote:
> But _why_ do you deserialize out of a socket, when it has
> insurmountable problems like these associated with it?

Because several designers of serialization frameworks (including
myself until recently) either stated explicitly or implicitly that it
was a good idea. I have refrained from identifying the other, well-
known, serialization frameworks that suggest that it is a good idea to
use serialization code that was meant for, say, File, against a
Socket.

> And even in secure mode, the problems remain. How do you handle a
> misguided but wellmeaning sender who starts sending you a 1Gb string?
> Does your receiver just stand there and say "This connection is
> secured, so I will swallow every byte that comes down" ?

When I write my software, I make a distinction between the warm-bodied
human being that we call "the user" and "the client program" that the
user wrote. In addition, "security" has multiple meanings. There are
some connections where there is privacy against a connection. This
type of connection is will not help. There are other connections where
there is authenticity of the transmission units (packets), and in this
connects, once the connection is kick-started, we can then proceed
discuss the following fact:

*If* the connection is secure, meaning the server is certain of the
authenticity of packets received from client, and if the server-end of
the protocol by which the connection between client and server is
brought to a secure state abstains from invoking operator new()
against potentially large, arbitrary values that would cause excessive
memory allocation (!GB) on the server end, then once the connection
has entered the secure state, the sever and client can then relax and
continue. Note that I have been portraying the client as the
perpetrator and the server as the victim, but this distinct is
arbitrary - it could very well be vice-versa.

Now you might say, "But there are situation where DoS could still
happen where server has published the specification for serialization
sequence, and clients connect 'securely', but the set of clients allow
to connect is large and unknown at the time the server begins
executing."

To this I would say you are right. I had intended to mentioned this
problem, and a third serious problem on this issue in a forthcoming
post.

> So despite the insurmountable problems you've found, you still find it
> conceivable to serialize to/from a socket archive? In what context
> would it possibly be legitimate?

No I guess I do not. As I mentioned, my gut feeling is that there is
no simple solution to this "problem". However, note that I am not the
only programmer who has created a serialization framework and
suggested, either implicitly or explicitly, that it is a good idea to
use the same code against a Socket as one has written to work against
a File. My point in writing this post is to at least make other
programmers aware that, if they want to use vanilla serialization code
against an un-secured or semi-secured socket, they are in trouble. I
know of at least one major financial company, with $1 Trillion (US) in
assests, who does this routinely for all their servers. If I wanted
to, I could probably write a program that would systematically crash
the machines 1 by 1, giving a block of IP addresses to start with,
assuming of course I managed to get past the firewalls.

IMO, the case were it is OK to use the same serialization code against
a Socket that has been written for, say, File, is when:

1. The Serialization code on client end matches that at the server
end, either by using a library or because the engineers at each end
were meticulous in getting the protocol correct.
2. The channel is secure in the sense of mutual certainty of
authenticity of the client and server, and the bootstrap procedure in
getting to a state of mutual certainty of authenticity strictly
abstains from using operator new() or anything that could result in
resource starvation.

If anyone says that you say that #1 is unrealistic if the client
programmer and the server programmer is disjoint, then I would say
that is the same issue as putting bad code in the market. For
instance, I have a device driver that I thought was pretty much bug-
free until I ran it under a virtual machine in Windows. It blue-
screens the virtual machine, each time, every time, though testing had
been done thoroughly on real machines. Who would have thought? So we
have to fix this, but once it is fixed, it is fixed.

> What makes you think you can entwine serialization with storage/
> network transmission/etc. , in the first place? Serialization is
> conversion into a sequence of bytes, nothing more, nothing less.

Microsoft made me think that:
http://msdn2.microsoft.com/en-us/library/dya2hz72(VS.80).aspx.
Though I have never used their serialization framework, I believe that
CFile is derived from CArchive, an so..

http://msdn2.microsoft.com/en-us/library/dya2hz72(VS.80).aspx

> Well, the problem you brought forth was stated like this:
>
> "I means that, for all the applications on the Internet that uses
> unprotected serialization of the kind provided by Boost,/etc...they
> are all vulnerable to DoS attack."
>
> , and is patently untrue. It only applies to frameworks that naively
> conflate deserialization and network reception.

This statement is contradictory. I said that if a user does X, Y will
happen. And you're saying, "that's not true", it will only happen
when the user does X.

-Le Chaud Lapin-

Le Chaud Lapin

unread,
Jun 12, 2007, 6:37:32 PM6/12/07
to
On Jun 12, 11:29 am, "Joe" <j...@junk.com> wrote:
> You seem to want to use the same syntax to serialize to everything.
Although
>
> I do not understand all the issues that have been brought up about the
> coupling of the serialization and transmission concepts, you could do
> something similar to the following to "have your cake and eat it too";
>
> class Socket:public Archive{};
>
> template<typename T> Socket& operator<<(Socket& socket, const T& t){
>
> ostringstream oss();
> oss << T;
>
> socket.send(oss.str().size());
> socket.send(oss.str());
>
> return s;
>
> };
>
> template<typename T> Socket& operator>>(Socket& socket, T& t){
>
> string sBuf;
> size_t n;
>
> socket.get(&n);
> sBuf.resize(n);
> socket.get(&sBuf);
>
> istringstream iss(sBuf);
>
> iss >> T;
>
> return s;
>
> }

The problem is "n". If an attacker at the other end of the connection
defines n to be, for example, 2^32, then you have a problem.

>
> You could use the same syntax. There is ample opportunity to some error
> checking in both operator<< and >> functions. you could also get your
> download string in chucks as well (not shown).

Chunks are nice, but breaking-into-chunks was have been taken for
granted from the beginning. The maximum size of Ethernet payload is
1500 bytes, so naturally, no one is sending 1MB packets. The problem
is the reassembly phase of the object at the receiver end. Without
security, the receiver is left vulnerable, known that, at any moment,
it will be induced to consume massive amounts of memory from the free-
store.

Note that any attempt to mitigate the problem put putting "checks" in
the serialization code to control somewhat the amount of memory
allocated will not work. The issue becomes the word "somewhat". What
is "somewhat?" 1KB? 16KB? 1MB? It's like a doctor being told to prep
drug for medically-induced coma for incoming patient without telling
him anything about the patient. Any preconceived dosage (limit on
memory allocation) would be "unreasonable", not to mention that the
serialization code would become very ugly, very fast.

> This seem to separate serialization and transmission while allowing you to
> use the same syntax. Yes/No ?

Yes but fundamental problem still persists.

-Le Chaud Lapin-

Le Chaud Lapin

unread,
Jun 12, 2007, 6:35:54 PM6/12/07
to
On Jun 12, 1:50 am, kouznetsov.vladi...@gmail.com wrote:
> Le Chaud Lapin wrote:
> > No matter what is done with a packet, it is conceivable to serialize a

> You keep saying that everything is fine with authenticated good


> clients and bad with non-authenticated malicious ones. Why do you
> think that control of scarce resources is only applicable when you
> need protection from attacks? What if your good authenticated users
> consume resources extensively? Say, one good user serializes a map of
> maps of strings with total size of 2GB, then your second good user
> won't be able to allocate significantly smaller amount of data. Your
> first good client may be greedy or just may require serialization of
> extra data by mistake.

I as I just pointed out in another post, I make a distinction between
a human being and the software that the human being wrote. In that
case, there are are several situations where the problem I mentioned
in the OP does not exist:

1. There is bi-directional certainty of authenticity of client and
server.
2. The same serialization code is used at both ends of client/server
pipe, either because that code came from the same library (codebase),
or because the client programmer and the server programmer were both
meticulous in getting the protocol right.

If these two conditions are true, then there is nothing to worry
about. The "user" at the client end is not going to do anything. He
might be on a beach in Tahiti. If the "software" that he wrote has
been engineered correctly, then his software is not cause trouble
either. If it has not been engineered correctly, then there might be
much trouble, as would be say, if he places a device driver in hands
of millions of users that blue-screens under peculiar (but common) set
of circumstances.

However, the situation can occur if the programmer at the client end
does not use the actual bit-for-bit copy of code as the programmer at
the server end. There would be a potential for mismatch. In that
case, there would still be security, still be good intention, but DoS
would still occur. This was a problem I had intended to mention after
we all agreed that the un-secure pipe was a real issue.

However, because this last problem can occur does not necessarily mean
that the secure-pipe, well-engineered-code model is illegitimate.

-Le Chaud Lapin-

jlin...@hotmail.com

unread,
Jun 13, 2007, 9:49:22 AM6/13/07
to
>
> Because several designers of serialization frameworks (including
> myself until recently) either stated explicitly or implicitly that it
> was a good idea. I have refrained from identifying the other, well-
> known, serialization frameworks that suggest that it is a good idea to
> use serialization code that was meant for, say, File, against a
> Socket.
>

Well, MFC is not a shining beacon of software design, is it :)

>
> IMO, the case were it is OK to use the same serialization code against
> a Socket that has been written for, say, File, is when:
>
> 1. The Serialization code on client end matches that at the server
> end, either by using a library or because the engineers at each end
> were meticulous in getting the protocol correct.
> 2. The channel is secure in the sense of mutual certainty of
> authenticity of the client and server, and the bootstrap procedure in
> getting to a state of mutual certainty of authenticity strictly
> abstains from using operator new() or anything that could result in
> resource starvation.
>

What you are saying is that your code is safe, but your network
protocol isn't.

Equivalently, what you are saying is that your server is safe as long
as you have absolute control over the software every single client is
running. And that you somehow have 100% confidence that there are
absolutely no bugs in your software.

I'm sorry, but if we are talking of production servers, as opposed to
toy systems, that is simply not acceptable. Unless you attack the
problem at its root, by separating deserialization from reception, all
you do is move the problem around. You are papering over a serious
design flaw, instead of addressing it directly.

Jarl Lindrud.

Nevin :-] Liber

unread,
Jun 13, 2007, 9:51:57 AM6/13/07
to
In article <1181575350.8...@m36g2000hse.googlegroups.com>,

Le Chaud Lapin <jaibu...@gmail.com> wrote:

> On Jun 11, 10:18 am, jlind...@hotmail.com wrote:
> > LOL. I am deserializing from a _packet_ ! A packet of fixed length,
> > completely unlike the socket that you are deserializing from. I am
> > guaranteed a successful reception or an EOF exception, without ever
> > reading more than e.g. 1 Mb from the client. The only DOS
> > vulnerability in sight is if my _application_ is reading an unlimited
> > number of strings, for reasons of its own. But that has nothing, I
> > repeat _nothing_, to do with the deserialization code of individual
> > strings. Do you not see that?
>
> Why are you doing that? I mentioned that I was deserializing from a
> socket, not a packet.

For the sake of argument, let's talk about about sending a non-simple
structure, such as a vector<string>.

Even if you determine that it would be a DoS attack in requesting too
much memory, how exactly do you reject a message?

What if it is a different DoS attack, such as a bad count of elements
(either in a given string and/or in the vector itself)?

W/o framing, checksums, etc., you are pretty much hosed, whether or not
you use serialization. How do you plan on syncing up with the next
message?

And if you add framing and checksums, you are talking about packets, not
just raw sockets...


(Also, could you please steer the discussion back towards C++?)

--
Nevin ":-)" Liber <mailto:ne...@eviloverlord.com> 773 961-1620

Le Chaud Lapin

unread,
Jun 13, 2007, 12:45:11 PM6/13/07
to
On Jun 13, 8:51 am, "Nevin :-] Liber" <n...@eviloverlord.com> wrote:
> For the sake of argument, let's talk about about sending a non-simple
> structure, such as a vector<string>.

Ok, so in this case, we would probably serialize a vector template by
sending the count of elements in the vector first, followed by the
serialization of each individual element.

At the source end of the connection:

vector<string> v1;
Socket s;
socket << s1.count();
// Now export "count" strings.
for (unsigned int i = 0; i < s1.count(); ++i)
socket << v1[i];

At the target end of the connection:
// Code to make enough vector space for "count" strings
// Code to import "count" strings.

A serialization constructor for vector<> is probably best way to do
this, but this is the general idea.

> Even if you determine that it would be a DoS attack in requesting too
> much memory, how exactly do you reject a message?

Well...that's just it. If one takes generic pre-written serialization
code for class File, and try to use it against Socket, it will not be
known that there is a DoS attack. And whether there is an attack or
not, excessive memory allocation, accidental or intentional, will be
indeterminate. There will be no point in the code where just before
invocation of operator new() one will be able to say...
5MB!!!!...that's too much...something must be wrong.

> What if it is a different DoS attack, such as a bad count of elements
> (either in a given string and/or in the vector itself)?

Correct observation. Here checking of the imported data makes sense.
Notice the distinction between making space for too much data using
operator new(), and checking the data after it has been received. In
your vector<string> example above, the vector might contain the names
of people that are suspected of terrorist activity and should be
placed under surveillance. But if one of the names is George W. Bush,
then someone probably made a gross mistake or is pulling a prank, so
the object containing the vector would throw and exception due to
faulty data. It is important to note that such checking would
normally occur after space has already been allocated for the entire
vector<string>.

> W/o framing, checksums, etc., you are pretty much hosed, whether or not
> you use serialization. How do you plan on syncing up with the next
> message?

One could take a C++ object implant it using any of a number of
formats on the wire. FYI, in my own scheme, if I have to send an
object, I serialize its elements one-by-one to a buffer. I follow
this principle recursively until the field of an object is a vector or
a scalar, at which point it is trivial to write the elements with
appropriate counts of total elements, elements in this fragment of
vector, etc. It should be evident that

> And if you add framing and checksums, you are talking about packets, not
> just raw sockets...

I am puzzled why some think that the intermediate state of a
serialized object in transit has any affect on the problem outlined in
my original post. No matter what the intermediate state, no matter it
it is sent over packets that allow only 16 bytes total at a time, the
problem still persists. In the end, the source is sending a
vector<string>. The target is receiving a vector<string>. The source
determines how many elements are in the vector<>. And the target,
without any arbitrary thresholds, imports each string into the vector
one-by-one. If the transmission rate of the link is 1 bit every 15
years, that does not matter. After a long time, if the source
declares that there are 1,000,000 strings, each of length 1,000; then
1GB of data will be sent. This will happen no matter what the format
of the data is on the wire.

-Le Chaud Lapin-


--

I V

unread,
Jun 15, 2007, 9:40:30 PM6/15/07
to
On Mon, 11 Jun 2007 15:41:36 -0600, Le Chaud Lapin wrote:
> I am still waiting for someone to show me how they would "limit" data
> by the resources. Again, I am not talking about packets. I am
> talking about C++ objects that are to be serialized, objects of
> arbitrary complexity.

But the _objects being serialized_ don't limit how much data is read, the
objects doing the reading impose those limits. So you'ld do something like:

bool read_connection_data(socket s, std::vector<std::string>& data)
{
reader r(s, max_size) // where the author chooses max_size
// to be appropriate for the context

try
{
r >> data;
return true;
}
catch( reader::size_error e)
{
std:: cerr << "The client sent too much data";
}

return false;
}

And the basic functions of the reader class (those that read in the
built-in types) check that the size hasn't been exceeded, and throw
reader::size_error if it has.

What's the problem with this approach?

Le Chaud Lapin

unread,
Jun 16, 2007, 9:59:55 AM6/16/07
to
On Jun 15, 8:40 pm, I V <ivle...@gmail.com> wrote:
> But the _objects being serialized_ don't limit how much data is read, the
> objects doing the reading impose those limits. So you'ld do something like:
>
> bool read_connection_data(socket s, std::vector<std::string>& data)
> {
> reader r(s, max_size) // where the author chooses max_size
> // to be appropriate for the context
>
> try
> {
> r >> data;
> return true;
> }
> catch( reader::size_error e)
> {
> std:: cerr << "The client sent too much data";
> }
>
> return false;
>
> }
>
> And the basic functions of the reader class (those that read in the
> built-in types) check that the size hasn't been exceeded, and throw
> reader::size_error if it has.
>
> What's the problem with this approach?

First note that the statement "r >> data;" is highly vague. Here you
have, a "reader" object that is importing into a non-trivial data
structure. Where is the code that defines how this occurs?

But in any case, I think the spirit of what you are suggesting is the
same as several others have suggested, that you can some how supply a
limit to the socket from which data is extracted, and if an object
attempts to extract from that socket data whose size exceeds the
specified limit, and exception is thrown. If this is what you are
suggesting, the problem will still persist.

The key here is "std::vector<std::string>& data". Look closely at
it. It is a vector of strings. In general, there are two things you
do not know:

1. The number of strings in the vector (its count).
2. The length of each individual string in the vector.

Now surely, serialization code that effects "r >> data" will probably
first ascertain how many strings are are to be put into the vector
(total), and also the size of each string as each is read in.

Both of these operations, building the vector and building each
string, presents opportunity for DoS.

Now matter what intermediate trickery is used to incrementally build
the vector, in the end, it is not inconceivable that the source of the
serialization at the other end of the pipe will want the vector to
have 1,000,000. Nor is it inconceivable that the source of the
serialization at the other end of the pipe will want the average
length of each string to be 1,000 characters. So this presents an
opportunity for 1GB of memory to be consumed.

Now, to answer a question that probably popped in your brain before
you finished reading the last paragraph:

"Le Chaud Lapin is not understanding what I am saying. Certainly he
must see that the 'control' of how much data being read in is not part
of the serialization code of the object, that it is in fact the
limiting operation is being provided externally."

The problem is that, in general, the writer of a class that is to be
placed in a library must write the serialization code for each class
of that library, then turn his head, long before the user of that
library employs the library in an actually application.

So the statement: "std::vector<std::string>& data" could very well be
only one statement that helps to implement a larger, serialization
sequence, perhaps as part of a big object.

Now you might say, "That's fine, the big object will know how big to
set the limit..."

While this is true, there, a remaining problem becomes evident when
one realizes that there are situations where 10,000,000 bytes would be
just as reasonable a limit as 1,000 bytes. This would occur when one
is about to set the serialization limit, for example of
std::list<string>, not as part of serializing a larger-scoped object,
but "in an outer naked scope" as you have presented above.

At that point, the only thing question that remains is:

"Is it possible for the malicious attacker to induce consumption of
10,000,000 bytes 'legitimately' in rapid-succession?" Without showing
proof, the answer is 'Yes', unfortunately.

However, I will say this: Your solution is the best so far. I had
thought about this before as a solution. Though I deplore
arbitrariness in software engineering, it is not so unreasonable that
a class might know in advance what a "reasonable maximum size of
itself" should be:

struct Employee
{
unsigned short int age;
string first_name;
string last_name;
float annual_salary;
// etc.
} ;

The serialization code for this would know reasonable limits on the
size of an employee. Assuming that floats are 32-bits, short is 16-
bits, and each of the strings can be up to 50 bytes each on average,
128 bytes would be a reasonable limit.

So we would tell the socket to throw exception as you suggested when
128 bytes is exceeded for serializing the employee. It might be
intuitively apparent that, because objects can be arbitrarily nested,
a stack-based model implemented inside the Socket is a appropriate,
where the elements on the stack are limits that the outer-scope of
serialization places as each nested object is to be serialized.

Socket & operator >> (Socket &s, Employee &e)
{
s.push_limit (128); // e had better be 128 bytes or less
s >> first_name;
s >> last_name;
s >> age;
s >> annual_salary;
s.pop_limit();
return s;
}

If 128 bytes is exceeded, as you pointed out, an exception will be
thrown. If it is not, the some number of bytes, N, will have been
read. Then, when pop_limit() is executed, N is subtracted from the
stack element that was top-most before the 128 was pushed. That way,
if an outer-class had set a limit, and Employee is an inner-class,
then the limit will be honored for the outer class too. If the stack
is empty, then there is no limit set.

This last sentence is the culprit with this scheme. The moment one is
faced with the challenge of trying to set a reasonable limit, and
"reasonable" is a significant fraction of the available RAM on a
typical computer, it falls apart.

Nevertheless, it is good to see someone else thought of this model. If
it were not for this last issue, it would probably have been the
solution I would have chosen (my alternative is no solution).

But again, arbitrariness is typically a serious no-no in software
engineering, so it would make me very nervous to do this.

-Le Chaud Lapin-

Keith H Duggar

unread,
Jun 16, 2007, 5:49:17 PM6/16/07
to
Branimir Maksimovic wrote:
> Le Chaud Lapin wrote:
> > What then can I do to stop this problem?
>
> Don't allocate whole buffer immediately but realloc in chunks
> as packets arrive. Let them send all 100 mb ;) Also you can
> request ack from client for every chunk sent, say I like every
> 512 bytes. Since originating ip will probably be spoofed,
> protocol will break on first ack.

That is one of the simpler and best solutions as it's easy to
implement and forces the DOS attacker to tie up their bandwith
as well. I also like adding moving average bandwith limits.

Keith

Sebastian Redl

unread,
Jun 16, 2007, 5:58:10 PM6/16/07
to

On Sat, 16 Jun 2007, Le Chaud Lapin wrote:

> "Le Chaud Lapin is not understanding what I am saying. Certainly he
> must see that the 'control' of how much data being read in is not part
> of the serialization code of the object, that it is in fact the
> limiting operation is being provided externally."
>
> The problem is that, in general, the writer of a class that is to be
> placed in a library must write the serialization code for each class
> of that library, then turn his head, long before the user of that
> library employs the library in an actually application.
>
> So the statement: "std::vector<std::string>& data" could very well be
> only one statement that helps to implement a larger, serialization
> sequence, perhaps as part of a big object.
>
> Now you might say, "That's fine, the big object will know how big to
> set the limit..."

No, you've got it wrong again! The limit has nothing to do at all with the
object being serialized. The limit has nothing to do with serialization
code. The limit has neraly nothing to do at all with what the data sent is
used for afterwards.
The limit is set simply in anticipation of the data sent causing
allocation of server resources proportional to the amount of data being
sent.
Whether this is because objects are deserialized, because the data is
directly written into a buffer without being interpreted, the data is
saved as a file on the hard disk of the server or the data is forwarded
through another socket to some other place, with limited bandwidth and/or
transfer volume on that connection (that, too, is a resource that needs to
be managed) doesn't matter. It could even simply be processing time
spent. All that matters is that some external contact causes the server
to use resources.

Therefore, the server must limit the amount of resources it will allocate
for any given external contact to an amount that is unlikely to prevent
the server from normal operation. The server can decide to grant
authenticated contacts ("secure connections") a larger amount of
resources, but that doesn't change the underlying principle.

If your server says, "Every external contact may only send 1 MB of data,
and I will not accept more than 100 connections at a time," and if further
the usage of the data is such that 1 MB of data sent results in the
allocation of about 1 MB of RAM (for example, but in no way limited to,
deserializing this data into an arbitrarily complex tree of objects),
then, and if finally this allocation is transient, i.e. the memory is
freed when the connection ends, then by definition external contacts can
never cause allocation of more than 100 MB of RAM.

For you, the developer of the serialization library, all that is left is
to add a statement of caution to the documentation of the library, worded
something like this:

"Since deserialization allocates memory necessary to hold the deserialized
objects, you should never deserialize directly data coming from an
untrusted source, such as a network connection. Doing so can lead to
vulnerability to denial of service attacks, either by a malicious attacker
or by benign but malfunctioning software at the other end of the
connection. The data from the source should therefore at least be checked
for length and data that exceeds an application-defined size limit should
be rejected."

> But again, arbitrariness is typically a serious no-no in software
> engineering, so it would make me very nervous to do this.

Welcome to the real world. Every bit of server software has a
configuration option for the amount of simultaneous network connections it
accepts. What is that but an arbitrary decision?

The important thing about such decisions is that they are configurable
without recompiling the application, through a reasonably simple
interface.

Sebastian Redl

Jeff Koftinoff

unread,
Jun 17, 2007, 8:17:48 AM6/17/07
to
On Jun 16, 6:59 am, Le Chaud Lapin <jaibudu...@gmail.com> wrote:
>
> Socket & operator >> (Socket &s, Employee &e)
> {
> s.push_limit (128); // e had better be 128 bytes or less
> s >> first_name;
> s >> last_name;
> s >> age;
> s >> annual_salary;
> s.pop_limit();
> return s;
>
> }
>
> If 128 bytes is exceeded, as you pointed out, an exception will be
> thrown. If it is not, the some number of bytes, N, will have been
> read. Then, when pop_limit() is executed, N is subtracted from the
> stack element that was top-most before the 128 was pushed. That way,
> if an outer-class had set a limit, and Employee is an inner-class,
> then the limit will be honored for the outer class too. If the stack
> is empty, then there is no limit set.
>

I like your approach here; however it would be better to do:

Socket & operator >> (Socket &s, Employee &e)
{

SerializeLimitFrame limit( s, 128 ); // e had better be 128 bytes


or less
s >> first_name;
s >> last_name;
s >> age;
s >> annual_salary;

return s;
}

where the constructor of SerialLimitFrame does s.push_limit() and the
destructor does s.pop_limit()...

--jeffk++

Le Chaud Lapin

unread,
Jun 17, 2007, 8:20:34 AM6/17/07
to
On Jun 16, 4:58 pm, Sebastian Redl <e0226...@stud3.tuwien.ac.at>
wrote:

> For you, the developer of the serialization library, all that is left is
> to add a statement of caution to the documentation of the library, worded
> something like this:
>
> "Since deserialization allocates memory necessary to hold the deserialized
> objects, you should never deserialize directly data coming from an
> untrusted source, such as a network connection. Doing so can lead to
> vulnerability to denial of service attacks, either by a malicious attacker
> or by benign but malfunctioning software at the other end of the
> connection. The data from the source should therefore at least be checked
> for length and data that exceeds an application-defined size limit should
> be rejected."

Ok, so let us address an issue that I have be raising: "How much is
too much?"

I stated in a previous post that I have a working system that uses
serialization where the target of the serialized data might be asked
to deserialize 1,000 bytes or 1,000,000 bytes. Note that the
1,000,000-byte situation is entirely legitimate - There really is a
case, in the normal course of operation, where 1,000,000 bytes is
reasonable.

The problem is that any limit that you pick is arbitrary.

Furthermore, you have not specified how the serialization framework
would be engineered so that there is a "limit" on the amount of data
transferred. A bit of sample code would go along way. I would like
to see where these limits are placed in the serialization framework so
that the framework does not become unnecessarily cluttered.

-Le Chaud Lapin-

I V

unread,
Jun 17, 2007, 8:22:01 AM6/17/07
to
On Sat, 16 Jun 2007 07:59:55 -0600, Le Chaud Lapin wrote:
> First note that the statement "r >> data;" is highly vague. Here you
> have, a "reader" object that is importing into a non-trivial data
> structure. Where is the code that defines how this occurs?

I would envision something like:

template<typename T>
deserialize_stream& operator>> (deserialize_stream& s,
std::vector<T>& result)
{
result.clear();

int num_items;

s >> num_items;

for( int i = 0; i < num_items; ++i )
{
T item;

s >> item;
result.push_back(s);
}

return s;
}

With a roughly similar operator>> for std::string

> The problem is that, in general, the writer of a class that is to be
> placed in a library must write the serialization code for each class
> of that library, then turn his head, long before the user of that
> library employs the library in an actually application.
>
> So the statement: "std::vector<std::string>& data" could very well be
> only one statement that helps to implement a larger, serialization
> sequence, perhaps as part of a big object.
>
> Now you might say, "That's fine, the big object will know how big to
> set the limit..."

Well, what I was thinking is that it's not the big object that sets the
limit, but the application. The point being, that the serialization code
should never create a socket reader class, but only use what it is given.
That way, the application can impose limits at a level of granularity that
makes sense; for instance, it could create one reader object per
connection, and thereby impose a limit on the amount of data read per
connection. Combined with a limit on the number of connections active at
any one time, this would limit the amount of memory required for
serialized data at any one time.

It would still be possible for user-provided deserialization functions to
cause problems, though, as with, say, a deserialization function for a
vector that will happily call "reserve" for however many objects it is
asked for. In the operator>> above, for example, it would be faster to
call reserve at the beginning of the function, rather than allocating
incrementally in calls to push_back, but obviously that would allow for
DoS attacks. OTOH, the code for deserializing standard library classes
would be provided by the library writer, so the application author would
get a measure of safety as long as the complex objects they used were
themselves built out of standard library components.

Sebastian Redl

unread,
Jun 17, 2007, 5:59:38 PM6/17/07
to

On Sun, 17 Jun 2007, Le Chaud Lapin wrote:

> Ok, so let us address an issue that I have be raising: "How much is
> too much?"
>
> I stated in a previous post that I have a working system that uses
> serialization where the target of the serialized data might be asked
> to deserialize 1,000 bytes or 1,000,000 bytes. Note that the
> 1,000,000-byte situation is entirely legitimate - There really is a
> case, in the normal course of operation, where 1,000,000 bytes is
> reasonable.
>
> The problem is that any limit that you pick is arbitrary.

Of course it is arbitrary. I even said so in my own post. I added that,
because it is arbitrary, it must be user-configurable.

If 1,000,000 bytes is reasonable, then you will have to allow it. If your
server has only 10 MB of RAM to spare, then you will just have to limit
the number of simultaneous connections accordingly.

The point is, arbitrary limits in software are an unfortunate necessity in
a world where hardware is also arbitrarily limited. Given the usual
situation of client software, these arbitrary limits are so far out that
they are basically invisible (std::vector has a max_size() of 4 billion?
Whoop-dee-doo, who cares?), but in server software, this is not so. The
hardware limits very quickly become visible, and because software that
hits the hardware limits usually behaves ... unpleasantly, limits must be
added to the software that are designed to be hit (with more predictable
and benign results) just before the hardware limits are hit.

(On a side note, even in some client software, hardware limits are
visible. Take games, for example. Sure, the designers would love it if
they could throw several billion vertices at the graphics card and push
the screen resolution into hundreds of DPIs. But the hardware ain't gonna
handle it, not yet. These are "arbitrary" limits.)

> Furthermore, you have not specified how the serialization framework
> would be engineered so that there is a "limit" on the amount of data
> transferred.

Yes, I have. There is no such engineering. The limit is placed outside the
serialization framework.

Although, another poster has made a point that I have to concede. There
actually is one requirement on the serialization framework if you want to
avoid sending the limit all along, and it's this: deserialization code
should never allocate memory if it can't prove that it has data to fill
it.

This sounds like a weird requirement. The problem is, basically, this:
Let's take a struct:

struct RatherBigStruct
{
char fat[1024];
};

Now let's take a vector of this struct.

typedef std::vector<RatherBigStruct> FatVector;


Suppose further that a vector is serialized by first sending the number of
elements, and then sending the elements in sequence. My protection scheme
is in place, and it is set up so that it first receives all data before
passing it on. The limit is set to 500k.

Now, Legitimate comes along and sends a vector.

FatVector fat1(100);
Socket << fat1;

The server reads it:

FatVector incoming;
ProtectedSocket >> incoming;

The data is 100k large. ProtectedSocket accepts it all, finds that it
doesn't exceed the limit, and lets deserialization proceed. All is fine.

Now L33tH4xx0r comes along and tries this:

FatVector fat2(1000);
Socket << fat2;

The server reads it:

FatVector incoming;
ProtectedSocket >> incoming;

The data is 1M large. ProtectedSocket accepts 500k, then notices that the
limit was exceeded. It closes the connection, releases the 500k, and
nothing further happened.


OK, but here's the hole in the system. ReallyEvil now comes along and does
this:

Socket << "FatVectorClassId" << 1000;

The server reads it:

FatVector incoming;
ProtectedSocket >> incoming;

The data is perhaps a hundred bytes large. ProtectedSocket reads it, finds
that it doesn't exceeed the limit, and lets deserialization proceed.

And now it depends. The class ID is fine, so the deserialization code for
FatVector is called.

This function can cause DoS:

void unserialize(FatVector &target, Archive &data)
{
size_t size;
data >> size;
target.resize(size); // Oops, 1M allocated.
for(RatherBigStruct &s : target) {
data >> s;
}
}

This one can't (unless error handling is broken):

void unserialize(FatVector &target, Archive &data)
{
target.clear();

size_t size;
data >> size;
while(size--) {
RatherBigStruct t;
data >> t; // Throws EOF - no data actually available.
target.push_back(t);
}
}


See what I mean?

Anyway, the obvious downside is that the second function will have to
reallocate several times.

There _are_ ways around this, but they actually do require re-engineering
of the serialization framework. One method I can come up with quickly, one
that works together with the external protection that's already in place,
would be to just pass the amount of remaining data along with all
operations. The unserialize operation would then look like this (but
beware of integer overflow):

void unserialize(FatVector &target, Archive &data)
{
target.clear();

size_t size;
data >> size;
if(size * sizeof(RatherBigStruct) > data.remaining()) {
// Nonsensical allocation.
throw Corrupt;
}
target.resize(size);
// ...
}


Hmm ... in conclusion, I will have to admit that you're right about one
thing: deserializing from untrusted sources can be dangerous, even if you
can place an upper limit on the amount of data sent.

Sebastian Redl

Nominal Pro

unread,
Jun 19, 2007, 4:55:16 AM6/19/07
to
On Jun 13, 11:45 am, Le Chaud Lapin <jaibudu...@gmail.com> wrote:
> Well...that's just it. If one takes generic pre-written serialization
> code for class File, and try to use it against Socket, it will not be
> known that there is a DoS attack. And whether there is an attack or
> not, excessive memory allocation, accidental or intentional, will be
> indeterminate. There will be no point in the code where just before
> invocation of operator new() one will be able to say...
> 5MB!!!!...that's too much...something must be wrong.

True, you can't really predict if you're going to run out of memory
before it happens.

Those who keep saying the solution is to "check n before calling
malloc(n)" are forgetting that checking n alone still leaves your
library open to a different avenue of attack. Specifically, what if
an attacker forced your application to deserialize 1 million strings
that are each 1000 bytes long? That amounts to asking your application
to malloc(1000) 1 million times. That's harder to recognize as a DoS
attack than, say, trying to deserialize a single single string that is
1 billion bytes in size.

A variation of this attack is the example of deserializing
vector<string>. You can't determine from the size of the vector how
much actual memory the vector and all of its strings will consume
until you've deserialized the whole thing. Why? Because string
requires a variable amount of memory. Deserializing a vector<string>
of 1,000,000 strings might be fine if the strings were all 10 bytes
long, but if they were all 32,000 bytes long, you're in trouble.
Trying to place limits on the count of a vector isn't a very good
solution.

The way I would suggest handling all of these cases is in C++ is use a
specialized operator new() for objects created through serialization.
This operator new() would allocate from a limited pool of memory. If
you exhaust this memory pool, it throws std::bad_alloc, and let the
user of your library deal with it, but at least you won't exhaust all
the memory available to your application (which might cripple or crash
it), and it handles the case where an attacker forces you to
deserialize too many smaller-sized objects.

Note that your specialized operator new() implementation doesn't have
to draw memory from any special pool; it only needs to keep track of
the new() and delete() calls that have been made, and if the allocated
memory consumed by deserialized objects exceeds the allowed limit,
simply have your operator new() throw std::bad_alloc, as prescribed by
the C++ Standard. This limit is going to depend on the runtime
environment.

jlin...@hotmail.com

unread,
Jun 19, 2007, 7:36:37 AM6/19/07
to
>
> A variation of this attack is the example of deserializing
> vector<string>. You can't determine from the size of the vector how
> much actual memory the vector and all of its strings will consume
> until you've deserialized the whole thing. Why? Because string
> requires a variable amount of memory. Deserializing a vector<string>
> of 1,000,000 strings might be fine if the strings were all 10 bytes
> long, but if they were all 32,000 bytes long, you're in trouble.
> Trying to place limits on the count of a vector isn't a very good
> solution.

And just how would an attacker be able to do that, if the receiver
only accepts messages whose total length is less than some
predetermined value?

I have given detailed descriptions in earlier posts, apparently to no
effect, on how one would deserialize both std::vector and std::string.
The problems discussed in this thread are 100% due to the misguided
use of constructs like "socket >> s". If programmers use such flawed
constructs, of course they will have problems. The problems will be of
such a difficult nature that they might even be tempted to start a
thread here on clcm about them. And then invent complex "solutions".

As soon as you scrap the deserialization-out-of-a-socket idiom, all
the "problems" discussed in this thread just vanish.

Jarl Lindrud.

Le Chaud Lapin

unread,
Jun 19, 2007, 11:48:30 AM6/19/07
to
On Jun 19, 6:36 am, jlind...@hotmail.com wrote:
> > A variation of this attack is the example of deserializing
> > vector<string>. You can't determine from the size of the vector how
> > much actual memory the vector and all of its strings will consume
> > until you've deserialized the whole thing. Why? Because string
> > requires a variable amount of memory. Deserializing a vector<string>
> > of 1,000,000 strings might be fine if the strings were all 10 bytes
> > long, but if they were all 32,000 bytes long, you're in trouble.
> > Trying to place limits on the count of a vector isn't a very good
> > solution.
>
> And just how would an attacker be able to do that, if the receiver
> only accepts messages whose total length is less than some
> predetermined value?

Accepting a "message" whose length is of some predetermined value
would be ruinous to the whole serialization model. I must admit, I do
not (yet) see what you see.

Also, I re-read one of your earlier posts to try to understand what
you mean:

I wrote:
> > Also, how does one send count=1024x1024 elements in your scheme?
>

You wrote:

> In exactly the same way. Serialize a count and then serialize each
> element in the container.
>
> If we're talking about a single-byte string with 1024*1024 characters,
> the resulting data packet will be a little bit larger than 1Mb. The
> sender doesn't worry about that, but if the receiver has its maximum
> packet size set to e.g. 1 Mb, then the packet, and the TCP connection,
> will be discarded as soon as the packet length field is read. What the
> client has to do then, is to resend the string, in several sub-1Mb
> packets, and inform the receiver that the strings are to be assembled.

I do not think this is a solution. Breaking the string into sub-1MB
"packets" will not solve the problem. In the end, there will receiver
will have still allocated 1MB data, which might turn out to be a DoS
attack. Furthermore, the allocate/deallocate/allocate/deallocate
method that you use for std::string and std::vector<string>
serialization probably takes a heavy toll on the memory allocator. In
my original post, I showed how the receiver could be tricked into
invoking operator new() on say, a bogus unsigned int sent by the
sender. But I should have showed that the problem is more general,
meaning that the goal is to prevent the sender from inducing the
receiver to "eventually" allocate too much memory, in this case, 1MB.

> I have given detailed descriptions in earlier posts, apparently to no
> effect, on how one would deserialize both std::vector and std::string.
> The problems discussed in this thread are 100% due to the misguided
> use of constructs like "socket >> s". If programmers use such flawed
> constructs, of course they will have problems. The problems will be of
> such a difficult nature that they might even be tempted to start a
> thread here on clcm about them. And then invent complex "solutions".
>
> As soon as you scrap the deserialization-out-of-a-socket idiom, all
> the "problems" discussed in this thread just vanish.

First, surely you will admit that there are many programmers who think
that they can take their serialization code, write it once for a base
class (let's call it Archive), and then use it later against any class
that derives from Archive, including a Socket.

Also, there are some subtleties with your only-accept-less-than-1MB
scheme that I did not want to mention since this is new territory for
some of us. It involves, again, the serialization framework itself.

If we are to use any serialization at all against a socket, then the
code has to be "clean". The serialization code must be encapsulated
in a library. One cannot go back and twiddle with it after it exists
as a binary.

That said, it is not clear to me how you would define where one 1MB
message begins and the other ends. 1MB? What is that? Is it a TCP
segment? It is certainly not a UDP payload or Ethernet frame. The
latter is limited to 1500 bytes, and the former must be even smaller.

Furthermore, serialized data is boundary agnostic. Let us assume that
your 1MB buffer is 1024x1024 bytes. Then:

Socket s;
int i; // assume that sizeof(int) == 4, and char is 8-bit-byte
s >> i; // now we have taken 4 bytes from 1MB buffer.

char c;
for (i = 0; i < 1024*1024 - 17)
s >> c;

// Now we have 13 bytes left:

std::list<string> names;
s >> names; // Oooops...tried to read the "size" of one of the
strings in "names", failed.

Because a "weird" number of bytes were left, building one of the
strings in names failed. There was an underflow from insufficient
data. What do we do now IIUC, you stated that the way to check that
there is a DoS attack is when the buffer underflows.? How do we
distinguish between DoS attack and simply underflow?

-Le Chaud Lapin-

Nevin :-] Liber

unread,
Jun 20, 2007, 1:52:30 AM6/20/07
to
In article <1182261888.9...@w5g2000hsg.googlegroups.com>,

Le Chaud Lapin <jaibu...@gmail.com> wrote:

> First, surely you will admit that there are many programmers who think
> that they can take their serialization code, write it once for a base
> class (let's call it Archive), and then use it later against any class
> that derives from Archive, including a Socket.

I don't know if there are many programmers who would advocate this. If
this thread is any indication, you seem to be the only one.

Having Socket derive from Archive doesn't seem like a prudent design
choice, as it is hard to envision how a Socket is-an Archive.

> If we are to use any serialization at all against a socket, then the
> code has to be "clean". The serialization code must be encapsulated
> in a library. One cannot go back and twiddle with it after it exists
> as a binary.

I don't see any reason why it has to be in a library. You might have to
version the format of the data stream, and you might not be able to
change the format for a given version of the data stream, you might even
have to version the objects in the data stream, but that says nothing at
all about the actual code that is used to (de)serialize the data.

> It is certainly not a UDP payload or Ethernet frame. The
> latter is limited to 1500 bytes, and the former must be even smaller.

Not true. There is a maximum of 65507 bytes of user data in a UDP
datagram (this is straight out of the Stevens TCP/IP Illustrated book),
although many implementations provide less than this maximum.

> Furthermore, serialized data is boundary agnostic.

Maybe in your implementation. That certainly isn't true in mine.

> Because a "weird" number of bytes were left, building one of the
> strings in names failed. There was an underflow from insufficient
> data. What do we do now

I know what I do. If underflow occurs (not enough data to deserialize),
I throw an exception. If overflow occurs (more user data in the packet
than consumed), I throw an exception. If string.resize(size) cannot get
enough memory to satisfy size, it throws.

But I don't attempt to deserialize raw data from a socket. And I have
packet level stuff that catches the exception, cleans up, logs an error,
and throws away the packet. You refuse to packetize, add framing, data
lengths or checksums, so I have no idea what you do. It's your design;
how do you recover?

Heck, even without serialization just using fixed size messages (greater
than 1 byte), all it takes is one missing / extra byte and you'll never
recover. Talk about a denial of service attack...

> IIUC, you stated that the way to check that
> there is a DoS attack is when the buffer underflows.? How do we
> distinguish between DoS attack and simply underflow?

In general, you can't, because this is way too low a level deduce the
intentions of the entity sending you the data.

--
Nevin ":-)" Liber <mailto:ne...@eviloverlord.com> 773 961-1620

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

jlin...@hotmail.com

unread,
Jun 20, 2007, 11:35:03 AM6/20/07
to
>
> Accepting a "message" whose length is of some predetermined value
> would be ruinous to the whole serialization model. I must admit, I do
> not (yet) see what you see.
>

The point is that the message concept is external to the
(de)serialization code. A message would contain header information and
a body. The header specifies the length of the body, and it is the
body that is deserialized into C++ objects.

Here is how it works. When the sender wants to send some C++ objects
across the wire, it serializes them into a buffer. It then takes the
length of the buffer, and embeds it as part of the header information.
The header and the buffer are then written to a socket, in their
entirety.

The receiver begins by reading the header information from the socket,
and examines the length field to see if it falls within acceptable,
application-defined, limits. If it doesn't, the connection is
unceremoniously closed, perhaps after sending a short informative
reply back to the sender. If the length field is ok, however, the
receiver proceeds to read in the entire body of the message into a
buffer. If it succeeds to do that, it proceeds to deserialize the
contents of the buffer, into bona fide C++ objects.

So as you see, the maximum message size is totally independent of the
serialization code. The application can set that limit as it pleases,
and the serialization code knows nothing about it. The only thing the
serialization code needs to be careful about, is to avoid allocation
of memory based on untrusted data such as a serialized count field. As
I've shown with std::vector and std::string, that is no problem.

>
> I do not think this is a solution. Breaking the string into sub-1MB
> "packets" will not solve the problem. In the end, there will receiver
> will have still allocated 1MB data, which might turn out to be a DoS
> attack. Furthermore, the allocate/deallocate/allocate/deallocate
> method that you use for std::string and std::vector<string>
> serialization probably takes a heavy toll on the memory allocator. In
> my original post, I showed how the receiver could be tricked into
> invoking operator new() on say, a bogus unsigned int sent by the
> sender. But I should have showed that the problem is more general,
> meaning that the goal is to prevent the sender from inducing the
> receiver to "eventually" allocate too much memory, in this case, 1MB.

If the sender is in a situation where it needs to send a message to
the receiver, which it knows will exceed the message limit, then it
has two options:

1) Don't do it at all

, or

2) Use an application-specific protocol (ie sequence of messages
between sender and receiver, messages defined as above), whose end
result will be the transferral of all the data to the receiver, in
messages, all of whose lengths are in accordance with the receivers
message limit. That protocol is free to implement any kind of DOS
protection it can think of, or none if it prefers that. But that
protocol has absolutely nothing to do with the serialization code. I
really do mean absolutely nothing.

Here's an example. Lets say the sender needs to be able to send really
long std::string's . You could have the sender request a handle from
the receiver, and then send the string in several substrings, each
time accompanied by the handle. The receiver uses the handle to
identify previously sent substrings, joins them together, and
eventually the whole string will have been transferred to the
receiver. All this takes place in the _application_, far removed from
the serialization code.

Now before you jump in and say "Aha! DOS vulnerability!", just
remember that the protocol above has absolutely nothing to do with the
serialization code. The DOS vulnerability is the same as that of _any_
network server, regardless of whether it uses C++ serialization
frameworks or not. The DOS vulnerability can be addressed in any of a
number of ways, without touching the serialization code _at all_ . Not
even a single line of it.


>
> First, surely you will admit that there are many programmers who think
> that they can take their serialization code, write it once for a base
> class (let's call it Archive), and then use it later against any class
> that derives from Archive, including a Socket.

There are also many programmers who think it's ok to sprinkle explicit
deletes all through their code. So what?


> That said, it is not clear to me how you would define where one 1MB
> message begins and the other ends.

I'm not following you here. A message begins wherever the application
wants it to begin. It isn't until the application actually has a
complete message in its hands, that it invokes the deserialization
code on the body of the message.

Jarl Lindrud.

Le Chaud Lapin

unread,
Jun 21, 2007, 1:44:39 AM6/21/07
to
On Jun 20, 10:35 am, jlind...@hotmail.com wrote:

I can at least say that, thanks to your clear explanation of your
point of view, I am now sure that you were thinking what I thought you
were thinking, whereas before, I was not 100% sure. I still think,
however, that it does not address the fundamental issues.

> If the sender is in a situation where it needs to send a message to

[snipped]

> Here's an example. Lets say the sender needs to be able to send really
> long std::string's . You could have the sender request a handle from
> the receiver, and then send the string in several substrings, each
> time accompanied by the handle. The receiver uses the handle to
> identify previously sent substrings, joins them together, and
> eventually the whole string will have been transferred to the
> receiver. All this takes place in the _application_, far removed from
> the serialization code.

Excellent example. Let us say that, in non-adversarial situation, the
typical length of a string is 64 bytes. Let us say that in the
adversarial situation, the length of a string jumps to 5MB.

> Now before you jump in and say "Aha! DOS vulnerability!", just
> remember that the protocol above has absolutely nothing to do with the
> serialization code. The DOS vulnerability is the same as that of _any_
> network server, regardless of whether it uses C++ serialization
> frameworks or not. The DOS vulnerability can be addressed in any of a
> number of ways, without touching the serialization code _at all_ . Not
> even a single line of it.

That is the whole point of my original post, to point out that blind
use of serialization incites the vulnerability. Also, I state again:
there are many, many programmers who not only want, but expect, to be
able to use serialization code written for a File "archive" against a
Socket archive.

At the very least...

...the importance of this thread is to let them know that they should
not, because right now, that is exactly what they are doing. So if
anything, this thread exposes a behavior that might be best avoided.
However, personally, I am not entirely convinced that one cannot have
his cake and eat it too, after discussions with Jeff Koftinoff.

>
> > First, surely you will admit that there are many programmers who think
> > that they can take their serialization code, write it once for a base
> > class (let's call it Archive), and then use it later against any class
> > that derives from Archive, including a Socket.
>
> There are also many programmers who think it's ok to sprinkle explicit
> deletes all through their code. So what?

We should tell them that they should not use serialization code
written for an "archive" class for a "socket". A good place to start
is to tell the author(s) of Boost::asio, for example. All novice
programmers who look up to Boost programmers might then take note.

> > That said, it is not clear to me how you would define where one 1MB
> > message begins and the other ends.
>
> I'm not following you here. A message begins wherever the application
> wants it to begin. It isn't until the application actually has a
> complete message in its hands, that it invokes the deserialization
> code on the body of the message.

So I guess that is your solution - Do not deserialize from an
archive. Or, if you do, do not do it the way you might do it from a
locally stored database, for example?

-Le Chaud Lapin-

jlin...@hotmail.com

unread,
Jun 21, 2007, 11:18:01 AM6/21/07
to
>
> I can at least say that, thanks to your clear explanation of your
> point of view, I am now sure that you were thinking what I thought you
> were thinking, whereas before, I was not 100% sure. I still think,
> however, that it does not address the fundamental issues.
>

The fundamental issue _is_ decoupling of deserialization and
reception.

>
> That is the whole point of my original post, to point out that blind
> use of serialization incites the vulnerability.

It's not the use of serialization that is your problem. It is your
coupling of serialization with transmission.

>
> ...the importance of this thread is to let them know that they should
> not, because right now, that is exactly what they are doing. So if
> anything, this thread exposes a behavior that might be best avoided.
> However, personally, I am not entirely convinced that one cannot have
> his cake and eat it too, after discussions with Jeff Koftinoff.
>
>

All you are doing is patching the consequences of a fundamental design
error.

>
> We should tell them that they should not use serialization code
> written for an "archive" class for a "socket". A good place to start
> is to tell the author(s) of Boost::asio, for example. All novice
> programmers who look up to Boost programmers might then take note.
>

?

Have you read the Boost.Asio docs at all? It took me less than 2
minutes to ascertain that the examples that use boost.serialization
use it in exactly the way that I've described. Which is totally
different from what you've been describing.

>
> So I guess that is your solution - Do not deserialize from an
> archive. Or, if you do, do not do it the way you might do it from a
> locally stored database, for example?

Don't confuse serialization with storage/transmission/etc. That is
really all there is to it. Funny, though, I could have sworn I've said
that before.

Jarl Lindrud.

co...@mailvault.com

unread,
Jun 21, 2007, 11:17:37 AM6/21/07
to
On Jun 20, 9:35 am, jlind...@hotmail.com wrote:
> > That said, it is not clear to me how you would define where one 1MB
> > message begins and the other ends.
>
> I'm not following you here. A message begins wherever the application
> wants it to begin. It isn't until the application actually has a
> complete message in its hands, that it invokes the deserialization
> code on the body of the message.
>

I suggest you keep in mind that complete may be hard to
nail down.

http://groups.google.com/group/comp.lang.c++.moderated/browse_thread/thread/
d5d3f9733ed00fe3/ed7ad13050b9c705

Also if I understand it correctly your approach would
require 2 times the hefty amount of memory that has been discussed.

Brian Wood
Ebenezer Enterprises

jlin...@hotmail.com

unread,
Jun 21, 2007, 10:22:27 PM6/21/07
to
{ Please try to keep on topic for clc++m in follow-ups. -mod }

>
> I suggest you keep in mind that complete may be hard to
> nail down.

Without following the link (it seems broken), what could be ambiguous
about "complete"? If the sender says that the message is 5000 bytes
long, I will read 5000 bytes from the connection, and by definition
the data I've received is a complete message. If I don't get all 5000
bytes, by definition the message is incomplete. What am I missing?

> Also if I understand it correctly your approach would
> require 2 times the hefty amount of memory that has been discussed.

Correct. I hope you are also aware that with the rabbits approach, the
receiver can at a moments notice be induced to allocate far more than
just 2 times the amount.

Jarl Lindrud.

Jeff Koftinoff

unread,
Jun 21, 2007, 10:34:12 PM6/21/07
to
On Jun 21, 8:18 am, jlind...@hotmail.com wrote:
>
> Have you read the Boost.Asio docs at all? It took me less than 2
> minutes to ascertain that the examples that use boost.serialization
> use it in exactly the way that I've described. Which is totally
> different from what you've been describing.
>

Hi Jarl. Your comment confuses me...

I am looking at asio examples:

See the definition for the serialization code for the 'stocks' class
at:

http://asio.sourceforge.net/asio-0.3.7/doc/examples/a00224.html

and the code which receives a vector<stock> from the socket in
handle_connect() at:

http://asio.sourceforge.net/asio-0.3.7/doc/examples/a00210.html#5b5468393f
5122efc423bccc7bcd5458

I do not understand why this code in this example is not susceptible
to a DoS attack due to a compromised server.

At what point does the boost serialization code limit the size of the
vector of stocks_ or the size of the stocks.code or stocks.name
strings?

also may I suggest that reading from a file is typically a potential
for DoS just as reading from a socket. A few years back, Microsoft
Windows had a deserialization bug in their BMP image file format
reading code - The BMP file could be emailed to you, or can be read
from a cached copy instead of a live socket.

--jeffk++
je...@jdkoftinoff.com
www.jdkoftinoff.com

jlin...@hotmail.com

unread,
Jun 22, 2007, 11:52:06 AM6/22/07
to
>
> I do not understand why this code in this example is not susceptible
> to a DoS attack due to a compromised server.
>

Look in handle_read_data(), near the bottom of this page:
http://tinyurl.com/2lde9f

The complete contents of a message are copied into a buffer (in this
case a std::string), and then the contents of the string are
deserialized into a C++ object.

By putting a limit on how many bytes you are willing to allow in a
message, you can limit the DOS vulnerability, without touching the
serialization code of the "stock" object, and without touching the
underlying serialization code in Boost.Serialization.


> At what point does the boost serialization code limit the size of the
> vector of stocks_ or the size of the stocks.code or stocks.name
> strings?

It doesn't, and it shouldn't.

>
> also may I suggest that reading from a file is typically a potential
> for DoS just as reading from a socket. A few years back, Microsoft
> Windows had a deserialization bug in their BMP image file format
> reading code - The BMP file could be emailed to you, or can be read
> from a cached copy instead of a live socket.

Definitely. Thats why deserialization code should never allocate
memory based on a count field that it has read in. It should only
allocate memory for the particular object it is deserializing, and
rely on reallocation instead, if it e.g. is filling a std::vector<> .

Jarl.

Jeff Koftinoff

unread,
Jun 22, 2007, 4:27:35 PM6/22/07
to
On Jun 22, 8:52 am, jlind...@hotmail.com wrote:
> > I do not understand why this code in this example is not susceptible
> > to a DoS attack due to a compromised server.
>
> Look in handle_read_data(), near the bottom of this
page:http://tinyurl.com/2lde9f
>
> The complete contents of a message are copied into a buffer (in this
> case a std::string), and then the contents of the string are
> deserialized into a C++ object.
>

Ok, I see that the header is read, then there is a vector<char> which
is resized based on information in the header:

inbound_data_.resize(inbound_data_size);

When that data read is complete, then handle_read_data() is called
which copies all the data to a string and then deserializes the string
into the object.

Are you saying that just relying on resize(inbound_data_size) to throw
std::bad_alloc is sufficient to avoid a DoS?

The problem is not in handle_read_data() - it is in
handle_read_header()

<snip>

>
> Definitely. Thats why deserialization code should never allocate
> memory based on a count field that it has read in.

This is exactly what handle_read_header does:

is >> std::hex >> inbound_data_size;
inbound_data_.resize(inbound_data_size);

Regards,

Jeff Koftinoff

je...@jdkoftinoff.com
www.jdkoftinoff.com

co...@mailvault.com

unread,
Jun 22, 2007, 9:06:47 PM6/22/07
to
On Jun 21, 8:22 pm, jlind...@hotmail.com wrote:
> > I suggest you keep in mind that complete may be hard to
> > nail down.
>
> Without following the link (it seems broken), what could be ambiguous
> about "complete"? If the sender says that the message is 5000 bytes
> long, I will read 5000 bytes from the connection, and by definition
> the data I've received is a complete message. If I don't get all 5000
> bytes, by definition the message is incomplete. What am I missing?
>

Previously you wrote, "It isn't until the application


actually has a complete message in its hands, that it
invokes the deserialization code on the body of the

message." Say 5000 is the size of a derived instance
and that after receiving 90% of the data an error
occurs. You can't safely deserialize the desired
object, but you might be able to build a base instance.
Throwing away everything because of the error would be
wasting work done by the parties involved when
something valuable may already be available.

> Correct. I hope you are also aware that with the rabbits approach, the
> receiver can at a moments notice be induced to allocate far more than
> just 2 times the amount.
>

OK, but do you agree that the message length can be
tampered with the same as other values? By not
deserializing anything until all of the data has
been received you offer some safety, but the software
could still be fooled into allocating large amounts
of memory.

Brian Wood
www.webebenezer.net

jlin...@hotmail.com

unread,
Jun 23, 2007, 2:14:48 AM6/23/07
to

>
> Previously you wrote, "It isn't until the application
> actually has a complete message in its hands, that it
> invokes the deserialization code on the body of the
> message." Say 5000 is the size of a derived instance
> and that after receiving 90% of the data an error
> occurs. You can't safely deserialize the desired
> object, but you might be able to build a base instance.
> Throwing away everything because of the error would be
> wasting work done by the parties involved when
> something valuable may already be available.
>

This has nothing to do with the serialization framework.

If your application chooses, for application-specific reasons, to
attempt to deserialize a base instance from the message fragment, more
power to it. There is certainly no ambiguity in saying that the
message was incomplete. The handling of an incomplete message is the
application's business, and doesn't concern the serialization
framework in any way, shape or form.

>
> OK, but do you agree that the message length can be
> tampered with the same as other values? By not
> deserializing anything until all of the data has
> been received you offer some safety, but the software
> could still be fooled into allocating large amounts
> of memory.
>

Of course it can be tampered with! That is why the _application_ must
impose a limit. But as I have repeated, ad nauseum: that has nothing
to do with serialization framework. Absolutely nothing. Zero, nil,
squat.

Jarl.

jlin...@hotmail.com

unread,
Jun 23, 2007, 2:12:04 AM6/23/07
to

>
> When that data read is complete, then handle_read_data() is called
> which copies all the data to a string and then deserializes the string
> into the object.
>
> Are you saying that just relying on resize(inbound_data_size) to throw
> std::bad_alloc is sufficient to avoid a DoS?
>

No. What I am saying is that the application can put a limit on
inbound_data_size, and that will take care of the DOS vulnerabilities.
inbound_data_size has absolutely nothing to do with the serialization
framework.

>


> > Definitely. Thats why deserialization code should never allocate
> > memory based on a count field that it has read in.
>
> This is exactly what handle_read_header does:
>
> is >> std::hex >> inbound_data_size;
> inbound_data_.resize(inbound_data_size);
>

That's not part of the serialization framework! It is part of the
application code. As it stands, it is a DOS vulnerability. That
vulnerability can be eliminated without touching even a *single* line
of the serialization framework, simply by limiting the value of
inbound_data_size .

As this example shows, the OP's claim that C++ serialization
frameworks cannot be used safely in network communication, is nothing
but a myth. This whole thread has been about a non-issue, from
beginning to end.

Jarl.

Le Chaud Lapin

unread,
Jun 23, 2007, 6:40:40 PM6/23/07
to
On Jun 23, 1:12 am, jlind...@hotmail.com wrote:
> > This is exactly what handle_read_header does:
>
> > is >> std::hex >> inbound_data_size;
> > inbound_data_.resize(inbound_data_size);
>
> That's not part of the serialization framework! It is part of the
> application code. As it stands, it is a DOS vulnerability. That
> vulnerability can be eliminated without touching even a *single* line
> of the serialization framework, simply by limiting the value of
> inbound_data_size .

What value should be chosen as a limit on inbound_data_size?

-Le Chaud Lapin-

Nevin :-] Liber

unread,
Jun 24, 2007, 12:42:34 AM6/24/07
to
In article <1182621228....@g4g2000hsf.googlegroups.com>,

Le Chaud Lapin <jaibu...@gmail.com> wrote:

> What value should be chosen as a limit on inbound_data_size?

1. How can we possibly answer that question for *your* application?

2. How is this question or its answer related to C++?

--
Nevin ":-)" Liber <mailto:ne...@eviloverlord.com> 773 961-1620

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

jlin...@hotmail.com

unread,
Jun 24, 2007, 12:42:54 AM6/24/07
to
> > That's not part of the serialization framework! It is part of the
> > application code. As it stands, it is a DOS vulnerability. That
> > vulnerability can be eliminated without touching even a *single* line
> > of the serialization framework, simply by limiting the value of
> > inbound_data_size .
>
> What value should be chosen as a limit on inbound_data_size?
>

Don't ask me. Ask the person who will be deploying the application.
They will make an arbitrary decision based on such criteria as what
kind of a network the application is running on, how many clients are
anticipated, what amounts of data a typical client needs to transfer,
etc.

That person will not be the least bit interested in what kind of C++
serialization framework your application happens to use.

Jarl.

Le Chaud Lapin

unread,
Jun 24, 2007, 3:30:51 PM6/24/07
to
On Jun 23, 11:42 pm, jlind...@hotmail.com wrote:
> > > That's not part of the serialization framework! It is part of the
> > > application code. As it stands, it is a DOS vulnerability. That
> > > vulnerability can be eliminated without touching even a *single* line
> > > of the serialization framework, simply by limiting the value of
> > > inbound_data_size .
>
> > What value should be chosen as a limit on inbound_data_size?
>
> Don't ask me. Ask the person who will be deploying the application.
> They will make an arbitrary decision based on such criteria as what
> kind of a network the application is running on, how many clients are
> anticipated, what amounts of data a typical client needs to transfer,
> etc.
>
> That person will not be the least bit interested in what kind of C++
> serialization framework your application happens to use.

As I said before, whatever value "you" pick is probably the wrong
one. In any case, the 1MB value that was chosen before is probably
inappropriate.

It does not matter. After a bit of thinking over the past few days, I
have found a solution that works well enough. It does not require the
allocation of (large, arbitrary) buffers that you propose, which would
still result in DoS on some machines, like the PDA's we plan to use.

-Le Chaud Lapin-

jlin...@hotmail.com

unread,
Jun 25, 2007, 4:58:57 AM6/25/07
to
>
> > Don't ask me. Ask the person who will be deploying the application.
> > They will make an arbitrary decision based on such criteria as what
> > kind of a network the application is running on, how many clients are
> > anticipated, what amounts of data a typical client needs to transfer,
> > etc.
>
> > That person will not be the least bit interested in what kind of C++
> > serialization framework your application happens to use.
>
> As I said before, whatever value "you" pick is probably the wrong
> one. In any case, the 1MB value that was chosen before is probably
> inappropriate.
>

Why do you think that everyone must use a 1Mb buffer?

Why would "I" pick a maximum buffer size for you?

Do you understand that the maximum buffer size can be varied, even at
runtime?

Do you understand that this has nothing to do with C++ serialization
frameworks?

> It does not matter. After a bit of thinking over the past few days, I
> have found a solution that works well enough. It does not require the
> allocation of (large, arbitrary) buffers that you propose, which would
> still result in DoS on some machines, like the PDA's we plan to use.
>

Perhaps you could enlighten the rest of us as to your solution, since
it "works well enough" .

Jarl.

I V

unread,
Jun 25, 2007, 4:54:42 AM6/25/07
to
On Sun, 24 Jun 2007 13:30:51 -0600, Le Chaud Lapin wrote:
> On Jun 23, 11:42 pm, jlind...@hotmail.com wrote:
>> That person will not be the least bit interested in what kind of C++
>> serialization framework your application happens to use.
>
> As I said before, whatever value "you" pick is probably the wrong
> one. In any case, the 1MB value that was chosen before is probably
> inappropriate.

It depends who "you" are. The author of the serialization framework will
probably pick the wrong value; that's why jlindrud has been explaining a
method where the decision is made by the application author, who is the
most likely person to be able to pick the right value.

Diego Martins

unread,
Jun 25, 2007, 3:23:53 PM6/25/07
to
On Jun 25, 5:58 am, jlind...@hotmail.com wrote:

> > LCL wrote:
> > It does not matter. After a bit of thinking over the past few days, I
> > have found a solution that works well enough. It does not require the
> > allocation of (large, arbitrary) buffers that you propose, which would
> > still result in DoS on some machines, like the PDA's we plan to use.
>
> Perhaps you could enlighten the rest of us as to your solution, since
> it "works well enough" .
>
> Jarl.

I agree with Jarl.
Don't be selfish. This thread was long enough without a conclusion.
Share with us what is your solution, please.

Diego

Le Chaud Lapin

unread,
Jun 25, 2007, 3:24:11 PM6/25/07
to
On Jun 25, 3:58 am, jlind...@hotmail.com wrote:

> Why do you think that everyone must use a 1Mb buffer?

You are the one who picked 1MB.

> Why would "I" pick a maximum buffer size for you?

In your buffer scheme, a size has to be picked.

> Do you understand that the maximum buffer size can be varied, even at
> runtime?

Yes, and as I and others have stated, once you start "picking the
maximum buffer sizes", you still have an issue. If the sender
specifies the buffer size, it could force the receiver to allocate a
large buffer.

> Do you understand that this has nothing to do with C++ serialization
> frameworks?

I disagree with this. As I mentioned in one of the post, the problem
where the sender sends a large unsigned int to trick the receiver to
invoke operator new() on that large unsigned int is the most obvious
of several problems.

// At sender
Socket s;
s << SET_BUFFER_SIZE
s << ~0UL;

// At receiver
Socket s;
unsigned long int buffer_size;
s >> buffer_size
char *p = new char[buffer_size];

You scheme called for allocation of a buffer from which serialization
would be effected against. You said that the buffer size should be
limited. I asked what was the limit, and you said that the limit
should be specified by the "application programmer". The problem with
your scheme, no matter who choses this limit, is that it will have to
be "big enough". You picked 1MB as an example. That is too much for
the PDA's on which our code will run. It should be evident that any
value chosen will be inappropriate...unless...

the values choses are interwoven with the serialization code itself.

> Perhaps you could enlighten the rest of us as to your solution, since
> it "works well enough" .

My solution involves augmenting the "Archives" of the serialization
framework with a stack<unsigned long int>:

struct Archive
{
stack<unsigned long int> ;
unsigned long int push_limit ();
unsigned long int pop_limit ();
} ;


The premise of my solution is that, in many cases, the serialized
object itself knows best how big it would be under "reasonable"
circumstances. For example, I have object that consists of four
smaller objects: 2 64-bit structs, and 2 list<string>;

struct Foo
{
DA rda; // 64-bits;
DA cda; // 64-bits;
TA rta; // list<string>;
TA cta; // list<string>;
} ;

A reasonable size of a Foo is 8 bytes, plus 8 bytes, plus...whatever,
so long it is not huge. The 16-bytes is easy calculated. The
"whatever, so long as it is not huge" is the critical part. I know
the nature of a Foo, and I know that, normally, list<string> should be
allowed to be "as big as it needs to be."

However, in this particular context, unlimited is not appropriate. I
know that each of these lists should not consume much more than 2KB
each. So 2KB * 2 = 4KB, plus the 16 bytes, ...but since we are
estimating anyway, 4KB should be sufficient for the entire Foo
structure.

In that case, just before a Foo is about to serialize itself from a
Socket, it declares its own limit:

Socket & operator >> (Socket &s, Foo &foo)
{
Socket s;

s.push_limit (4*1024);
s >> foo.rda;
s >> foo.cda;
s >> foo.rta;
s >> foo.cta;
s.pop_limit();

return s;
}

The Foo structure will build itself by serializing from the socket,
decimating the limit that it specified in push_limit() piece by
piece. If the limit is completely depleted before Foo is fully
constructed, an exception is thrown. If no exception is thrown, then
the Foo successfully read from the socket. At this point, he next top-
of-stack is decimated by the amount that was subtracted from the
current top of stack, and the remnant that is the current top of stack
is popped. If it ever occurs that there that the stack is empty, that
brings us back to the original situation, the problem, to say that
there is no limit, which is a legitimate case in circumstances that
can be inferred.

There are some points to note about this scheme:

1. The serialization framework itself determines limits because only
it knows what the limits should be.
2. The "application programmer" is most relieved of the burden/tedium
of choosing "maximum buffer sizes"
3. There are no "maximum buffers". If anything, there is only the,
say, 1500-byte Ethernet payload.

#3 is important, especially on a PDA with only 64MB of RAM.

However, there are some flaws with this scheme that might be apparent
to anyone who has ever developed a large-scale serialization
framework. Naturally, it is optimal in many cases that an object be
serialized from an archive by construction only, not by assign-after-
construct. Some objects have heavy-weight default-construction, and
if one uses this scheme to deserialize say, a 1-million-element
list<Heavyweight_Class_With_Massive_Constructor>, the performance
penalty will be interesting indeed.

There are other problems, which I do not care to mention, but the
solution works "well enough".

-Le Chaud Lapin-

Sebastian Redl

unread,
Jun 26, 2007, 3:34:46 AM6/26/07
to
On Fri, 22 Jun 2007 jlin...@hotmail.com wrote:

> Definitely. Thats why deserialization code should never allocate
> memory based on a count field that it has read in. It should only
> allocate memory for the particular object it is deserializing, and
> rely on reallocation instead, if it e.g. is filling a std::vector<> .

A rule which B.Ser doesn't follow, unfortunately, and doesn't mention in
its docs. The deserialization code for std::vector calls reserve() with
the length field that is read in.

Sebastian Redl

jlin...@hotmail.com

unread,
Jun 26, 2007, 11:13:31 AM6/26/07
to

>
> > Why do you think that everyone must use a 1Mb buffer?
>
> You are the one who picked 1MB.
>
> > Why would "I" pick a maximum buffer size for you?
>
> In your buffer scheme, a size has to be picked.
>

Either you are being facetious, or you really haven't understood a
single thing of what I've been telling you.

> > Do you understand that the maximum buffer size can be varied, even at
> > runtime?
>
> Yes, and as I and others have stated, once you start "picking the
> maximum buffer sizes", you still have an issue. If the sender
> specifies the buffer size, it could force the receiver to allocate a
> large buffer.

Who are the "others"? You keep on quoting them. Are they also
disciples of your couple-serialization-with-transmission approach?

> appropriate. I
> know that each of these lists should not consume much more than 2KB
> each. So 2KB * 2 = 4KB, plus the 16 bytes, ...but since we are
> estimating anyway, 4KB should be sufficient for the entire Foo
> structure.

LOL. You criticize me for requiring a single estimate to be made, yet
here you are, happily estimating away, not just a single value, but a
value for every single type of object you ever intend to serialize...

You haven't written a serialization framework. You've written a
serialize-send-receive-deserialize chunk of code, whose source code
must be modified to fit in with any particular application.
Congratulations.

The rest of us will stick to general purpose serialization frameworks.

>
> 1. The serialization framework itself determines limits because only
> it knows what the limits should be.
> 2. The "application programmer" is most relieved of the burden/tedium
> of choosing "maximum buffer sizes"
> 3. There are no "maximum buffers". If anything, there is only the,
> say, 1500-byte Ethernet payload.
>

These "points" are so far off base that I don't even know where to
begin. You are theorizing on subjects you obviously have no practical
experience of.

Jarl.

jlin...@hotmail.com

unread,
Jun 26, 2007, 11:19:13 AM6/26/07
to

>
> A rule which B.Ser doesn't follow, unfortunately, and doesn't mention in
> its docs. The deserialization code for std::vector calls reserve() with
> the length field that is read in.
>

Yes, and that's a bug, if you ask me.

Jarl.

Jeff Koftinoff

unread,
Jun 26, 2007, 2:51:25 PM6/26/07
to
On Jun 26, 8:19 am, jlind...@hotmail.com wrote:
> > A rule which B.Ser doesn't follow, unfortunately, and doesn't mention in
> > its docs. The deserialization code for std::vector calls reserve() with
> > the length field that is read in.
>
> Yes, and that's a bug, if you ask me.
>
> Jarl.

Yes, it is a bug.

So how would you fix the Boost serialization library?

Obviously, the maximum size can not be hardcoded into the deserialize
function for std::vector...

I don't think it can be 'fixed' - not without changing it
significantly so that the application code which tries to call the
deserialize function for a std::vector can communicate what the
maximum size for that specific vector can be.

Mr. Lapin's solution (modified to be exception-safe) is definitely one
way to achieve this so that your high level application code can make
the decisions and the low level library deserialization code can
enforce them at the points where necessary.

There is a reason why the old "C" functions 'gets()' and 'sprintf()'
are deprecated.

--jeffk++
www.jdkoftinoff.com

Sebastian Redl

unread,
Jun 26, 2007, 6:27:52 PM6/26/07
to

On Tue, 26 Jun 2007, Jeff Koftinoff wrote:

> On Jun 26, 8:19 am, jlind...@hotmail.com wrote:
> > > A rule which B.Ser doesn't follow, unfortunately, and doesn't mention
in
> > > its docs. The deserialization code for std::vector calls reserve()
with
> > > the length field that is read in.
> >
> > Yes, and that's a bug, if you ask me.
> >
> > Jarl.
>
> Yes, it is a bug.
>
> So how would you fix the Boost serialization library?

Easy.

template <typename T, typename Archive>
void deserialize_vector(std::vector<T> &target, Archive &ar)
{
typedef std::vector<T> vector_t;
typename vector_t::size_type size;
ar >> size;

target.clear();
while(size--) {
T t;
ar >> t;
target.push_back(t);
}
}


Yes, it may have to reallocate log2(size) times, but I think that's an
acceptable price to pay for security.

> There is a reason why the old "C" functions 'gets()' and 'sprintf()'
> are deprecated.

Yes. It's because you cannot stop them from overflowing the buffer you
allocated. Which has nothing to do with the problem at hand.

Sebastian Redl

co...@mailvault.com

unread,
Jun 27, 2007, 1:58:11 AM6/27/07
to
On Jun 26, 9:19 am, jlind...@hotmail.com wrote:
> > A rule which B.Ser doesn't follow, unfortunately, and doesn't mention in
> > its docs. The deserialization code for std::vector calls reserve() with
> > the length field that is read in.
>
> Yes, and that's a bug, if you ask me.
>

I don't think its a bug if everything occurs behind a
firewall. The simpler approach makes sense in some
cases.

The following is in B.Ser terms but you can
answer from your approach. If you have

short a = 1;
short b = 2;
short c = 3;

oa << a << b << c;

Will that lead to 3 packets -- each with a header and
a body being sent? If so it is a lot of overhead.

I'm of the opinion it should be more like
msgs.Send(a, b, c); // and one packet needed.

That is written in terms of what I usually advocate,
but we don't currently support what you've been describing
either. We're thinking about it and leaning towards it
with an acknowledgement of you. We could do it without
waiting for compilers to support variadic templates.
My guess is it will be at least 3 or 4 years before a
handful of compilers officially do.

Brian Wood
Ebenezer Enterprises
www.webebenezer.net

jlin...@hotmail.com

unread,
Jun 27, 2007, 1:52:19 AM6/27/07
to

> > Yes, it is a bug.
>
> > So how would you fix the Boost serialization library?
>
> Easy.
>
> template <typename T, typename Archive>
> void deserialize_vector(std::vector<T> &target, Archive &ar)
> {
> typedef std::vector<T> vector_t;
> typename vector_t::size_type size;
> ar >> size;
>
> target.clear();
> while(size--) {
> T t;
> ar >> t;
> target.push_back(t);
> }
>
> }
>
> Yes, it may have to reallocate log2(size) times, but I think that's an
> acceptable price to pay for security.
>

Definitely.

You can even avoid reallocating, if the serialization code can query
the archive for the number of remaining bytes. If the serialization
code for std::vector<T> knows the minimum serialized length of a T,
then it can determine the maximum number of T's that could be
remaining in the archive, and it can safely reserve() space for them.

Jarl.

Jeff Koftinoff

unread,
Jun 27, 2007, 1:51:49 AM6/27/07
to
On Jun 26, 3:27 pm, Sebastian Redl <e0226...@stud3.tuwien.ac.at>
wrote:

> On Tue, 26 Jun 2007, Jeff Koftinoff wrote:
>
> > So how would you fix the Boost serialization library?
>
> Easy.
>
> template <typename T, typename Archive>
> void deserialize_vector(std::vector<T> &target, Archive &ar)
> {
> typedef std::vector<T> vector_t;
> typename vector_t::size_type size;
> ar >> size;
>
> target.clear();
> while(size--) {
> T t;
> ar >> t;
> target.push_back(t);
> }
>
> }
>
> Yes, it may have to reallocate log2(size) times, but I think that's an
> acceptable price to pay for security.

The problem is that this does not solve the DoS. It is a good example
of an unvalidated security fix!

Try it out on your own computer which has virtual memory and flood
either over a network or via a file with values for 2 billion items.

It is still a DoS when virtual memory gets exhausted on most operating
systems.

See my article http://opensource.jdkoftinoff.com/jdks/trac/wiki/alloctests

>
> > There is a reason why the old "C" functions 'gets()' and 'sprintf()'
> > are deprecated.
>
> Yes. It's because you cannot stop them from overflowing the buffer you
> allocated. Which has nothing to do with the problem at hand.
>

My point is that gets() and sprintf() are deprecated because they are
dangerous. So is the default design of boost/serialization/vector.hpp
and boost/serialization/string.cpp.

Why have library functions where the default design is dangerous?

Both the default boost vector deserialize code and your example code
are missing something like:
if( size > max_size ) throw std::range_error( "Serialization count
out of range" );

or something similar.

The problem is, where does max_size come from?

The nicest place max_size can come from is the application code.
Ideally this would come through a function in Archive. Or perhaps a
traits class that the application specifies?

--jeffk++
www.jdkoftinoff.com

co...@mailvault.com

unread,
Jun 27, 2007, 9:19:49 AM6/27/07
to
On Jun 26, 4:27 pm, Sebastian Redl <e0226...@stud3.tuwien.ac.at>
wrote:

> On Tue, 26 Jun 2007, Jeff Koftinoff wrote:
> > So how would you fix the Boost serialization library?
>
> Easy.
>
> template <typename T, typename Archive>
> void deserialize_vector(std::vector<T> &target, Archive &ar)
> {
> typedef std::vector<T> vector_t;
> typename vector_t::size_type size;
> ar >> size;
>
> target.clear();
> while(size--) {
> T t;
> ar >> t;
> target.push_back(t);
> }
>
> }
>
> Yes, it may have to reallocate log2(size) times, but I think that's an
> acceptable price to pay for security.
>

I think that could be improved a little bit. Just because
you have a maximum message size doesn't mean that someone
can't tamper/increase the value that will be stored in the
size variable. The approach Jarl uses might run out of
data in this situation, but not before placing too much
data in the container. It might be better to keep track
of how much data has been used (say its a struct and the
first member was a string and the second a vector) and
subtract that from the total size of the message. That
value could be used to figure out an upper bound on the
size of the container. If the value in size is greater
than the upper bound, you have a problem. It would still
be possible to be fooled at this point, but I like it
as you have a chance at finding out sooner when something
is out of whack.

The deserialization function wouldn't have a hard-coded
value for the max size, but it could take an argument that specifies
that value. The approach could still be fooled,
but it would take more thought/info to do so.

Brian

jlin...@hotmail.com

unread,
Jun 27, 2007, 11:18:19 AM6/27/07
to

> > Easy.
>
> > template <typename T, typename Archive>
> > void deserialize_vector(std::vector<T> &target, Archive &ar)
> > {
> > typedef std::vector<T> vector_t;
> > typename vector_t::size_type size;
> > ar >> size;
>
> > target.clear();
> > while(size--) {
> > T t;
> > ar >> t;
> > target.push_back(t);
> > }
>
> > }
>
> > Yes, it may have to reallocate log2(size) times, but I think that's an
> > acceptable price to pay for security.
>
> The problem is that this does not solve the DoS. It is a good example
> of an unvalidated security fix!

What it does, is that it moves the responsibility for avoiding DOS up
to the application layer, which is precisely where it belongs. If you
are deserializing from unbounded amounts of data, then you are abusing
the serialization framework, and there's nothing the serialization
framework can, or should, do about that.

>
> Try it out on your own computer which has virtual memory and flood
> either over a network or via a file with values for 2 billion items.
>
> It is still a DoS when virtual memory gets exhausted on most operating
> systems.

And that DOS is solely the responsibility of the application. It has
nothing to do with the serialization framework.

>
> See my articlehttp://opensource.jdkoftinoff.com/jdks/trac/wiki/alloctests


>
>
>
> > > There is a reason why the old "C" functions 'gets()' and 'sprintf()'
> > > are deprecated.
>
> > Yes. It's because you cannot stop them from overflowing the buffer you
> > allocated. Which has nothing to do with the problem at hand.
>
> My point is that gets() and sprintf() are deprecated because they are
> dangerous. So is the default design of boost/serialization/vector.hpp
> and boost/serialization/string.cpp.

The only thing that is dangerous here is the naive idea of
deserializing data straight out of a socket. That is nothing less than
abuse of the serialization framework, and the consequences need to be
born by the abuser, and not shunted on to the serialization framework.

Jarl.

jlin...@hotmail.com

unread,
Jun 27, 2007, 11:17:55 AM6/27/07
to

> > template <typename T, typename Archive>
> > void deserialize_vector(std::vector<T> &target, Archive &ar)
> > {
> > typedef std::vector<T> vector_t;
> > typename vector_t::size_type size;
> > ar >> size;
>
> > target.clear();
> > while(size--) {
> > T t;
> > ar >> t;
> > target.push_back(t);
> > }
>
> > }
>
> > Yes, it may have to reallocate log2(size) times, but I think that's an
> > acceptable price to pay for security.
>
> The problem is that this does not solve the DoS. It is a good example
> of an unvalidated security fix!
>

You're missing the point. What Sebastians code does, is that it moves
the resposibility for avoiding DOS up to the application layer, which


is precisely where it belongs.


>


> My point is that gets() and sprintf() are deprecated because they are
> dangerous. So is the default design of boost/serialization/vector.hpp
> and boost/serialization/string.cpp.
>

The only thing that is dangerous here is the naive notion of
deserializing directly from a socket, or any other indefinite-EOF data
stream, for that matter.

That's quite simply abuse of the serialization framework, and the
consequences need to be blamed on the abuser, not the serialization
framework.

Jarl.

Jeff Koftinoff

unread,
Jun 27, 2007, 2:38:16 PM6/27/07
to
On Jun 27, 8:17 am, jlind...@hotmail.com wrote:
> > > template <typename T, typename Archive>
> > > void deserialize_vector(std::vector<T> &target, Archive &ar)
> > > {
> > > typedef std::vector<T> vector_t;
> > > typename vector_t::size_type size;
> > > ar >> size;
>
> > > target.clear();
> > > while(size--) {
> > > T t;
> > > ar >> t;
> > > target.push_back(t);
> > > }
>
> > > }
>
> > > Yes, it may have to reallocate log2(size) times, but I think that's an
> > > acceptable price to pay for security.
>
> > The problem is that this does not solve the DoS. It is a good example
> > of an unvalidated security fix!
>
> You're missing the point. What Sebastians code does, is that it moves
> the resposibility for avoiding DOS up to the application layer, which
> is precisely where it belongs.
>

How exactly does the application limit the number of push_back's that
deserialize_vector performs? It can't! this code IS the DoS.

>
>
> > My point is that gets() and sprintf() are deprecated because they are
> > dangerous. So is the default design of boost/serialization/vector.hpp
> > and boost/serialization/string.cpp.
>
> The only thing that is dangerous here is the naive notion of
> deserializing directly from a socket, or any other indefinite-EOF data
> stream, for that matter.
>

So you are saying that the boost::asio designers are naive???

> That's quite simply abuse of the serialization framework, and the
> consequences need to be blamed on the abuser, not the serialization
> framework.
>

Everything you just said can be said for users of 'gets()'.

* Don't use gets() on untrusted data, or any other indefinite EOF data
stream

See the parallels? gets() is broken by design. It should not be used
by anyone. So is the above deserialize_vector() function. Relying on
bad_alloc to be thrown is not good practice. Requiring serialization
to only be used by 'trusted connections' or 'trusted files' is
"troublematic".

--jeffk++
www.jdkoftinoff.com

Sebastian Redl

unread,
Jun 27, 2007, 2:41:03 PM6/27/07
to

On Tue, 26 Jun 2007 co...@mailvault.com wrote:

> I don't think its a bug if everything occurs behind a
> firewall. The simpler approach makes sense in some
> cases.

What do you mean by firewall? A firewall that simply locks out untrusted
data? Or a firewall that actually inspects the sent data, knows that it is
serialized data and checks it for sanity?

> short a = 1;
> short b = 2;
> short c = 3;
>
> oa << a << b << c;
>
> Will that lead to 3 packets -- each with a header and
> a body being sent? If so it is a lot of overhead.

No. Why should it? It's sent as one packet announcing that it is 6 bytes
large.

Sebastian Redl

Sebastian Redl

unread,
Jun 27, 2007, 3:12:08 PM6/27/07
to

On Wed, 27 Jun 2007 co...@mailvault.com wrote:

> I think that could be improved a little bit.

Probably. It was just a demonstration that the issue is trivial to fix.

> Just because
> you have a maximum message size doesn't mean that someone
> can't tamper/increase the value that will be stored in the
> size variable. The approach Jarl uses might run out of
> data in this situation, but not before placing too much
> data in the container.

It can never place more data than 2*allowed maximum (2 times because of
vector's allocation strategy), and if size was tampered with, it will run
out of data indeed. Then it will throw and the memory will be deallocated
again. The connection dies, but no DoS is possible.

> It might be better to keep track
> of how much data has been used (say its a struct and the
> first member was a string and the second a vector) and
> subtract that from the total size of the message. That
> value could be used to figure out an upper bound on the
> size of the container. If the value in size is greater
> than the upper bound, you have a problem. It would still
> be possible to be fooled at this point, but I like it
> as you have a chance at finding out sooner when something
> is out of whack.

Yes, this is certainly an improvement, but it requires a change of the
interface of the serialization library.

Sebastian Redl

jlin...@hotmail.com

unread,
Jun 27, 2007, 9:53:01 PM6/27/07
to
>
> How exactly does the application limit the number of push_back's that
> deserialize_vector performs? It can't! this code IS the DoS.
>
>

Sigh. Here we go again.

By deserializing from a buffer of data of fixed and known length, it
is *impossible* to wind up in an indefinite push_back() loop. The
maximum number of push_back()'s you could end up doing is directly
proportional to the number of bytes in the buffer you are
deserializing from.

And who controls the number of bytes in the buffer you deserialize
from? Surprise, surprise, it's the *application* .


> > The only thing that is dangerous here is the naive notion of
> > deserializing directly from a socket, or any other indefinite-EOF data
> > stream, for that matter.
>
> So you are saying that the boost::asio designers are naive???
>

You really need to reread my posts. You, and the Hot Rabbit, still
seem to think that it is acceptable to serialize your C++ objects
straight into a socket with no enveloping message structure. The rest
of us, including the Boost.Asio author, are under no such illusions.


>
> See the parallels? gets() is broken by design. It should not be used
> by anyone. So is the above deserialize_vector() function. Relying on
> bad_alloc to be thrown is not good practice. Requiring serialization
> to only be used by 'trusted connections' or 'trusted files' is
> "troublematic".

Who is relying on bad_alloc? Have you read my posts at all? The point
is that some kind of exception will be thrown when the deserialization
code has consumed all the data in the buffer. Hence you are safe and
sound as long as you only pass in bounded, definite-EOF streams to the
deserialization code.

Why is this so incredibly difficult to understand? How many times does
it have to be repeated?

Jarl.

co...@mailvault.com

unread,
Jun 28, 2007, 8:27:45 AM6/28/07
to
On Jun 25, 1:24 pm, Le Chaud Lapin <jaibudu...@gmail.com> wrote:
>
> Naturally, it is optimal in many cases that an object be
> serialized from an archive by construction only, not by assign-after-
> construct. Some objects have heavy-weight default-construction, and
> if one uses this scheme to deserialize say, a 1-million-element
> list<Heavyweight_Class_With_Massive_Constructor>, the performance
> penalty will be interesting indeed.
>

I've thought about this some also and like the term
stream constructor here. Recently I've thought that
if a derived object is being received and an error
occurs late in the process, it makes sense to attempt
to salvage what you can.

class B {...};

class I : public B {...};

class D : public I {...};

B* b = new D(stream_identifier_here);

If D's constructor releases an exception, the
standard says the sub-objects should be destructed.
Since that is how things have been set up over the
years, it can't easily be changed, but it might be
helpful if there was a way to indicate to the
compiler that a constructor is a stream constructor
and then instead of giving up, it could return an I.
The main reason I think this way is the sender,
network and receiver have put in a lot of work to
get to where it fails.

Brian
Ebenezer Enterprises

Jeff Koftinoff

unread,
Jun 28, 2007, 8:27:40 AM6/28/07
to
On Jun 27, 6:53 pm, jlind...@hotmail.com wrote:
>
> By deserializing from a buffer of data of fixed and known length, it
> is *impossible* to wind up in an indefinite push_back() loop. The
> maximum number of push_back()'s you could end up doing is directly
> proportional to the number of bytes in the buffer you are
> deserializing from.
>
> And who controls the number of bytes in the buffer you deserialize
> from? Surprise, surprise, it's the *application* .
>


> > > The only thing that is dangerous here is the naive notion of
> > > deserializing directly from a socket, or any other indefinite-EOF data
> > > stream, for that matter.
>
> > So you are saying that the boost::asio designers are naive???
>
> You really need to reread my posts. You, and the Hot Rabbit, still
> seem to think that it is acceptable to serialize your C++ objects
> straight into a socket with no enveloping message structure. The rest
> of us, including the Boost.Asio author, are under no such illusions.
>

You haven't read the boost asio examples then. The serialization
example clearly does exactly what you are saying it does not do.

http://asio.sourceforge.net/boost_asio_0_3_7/libs/asio/doc/examples/a00263.h
tml

asio is designed to work with boost::serialization over sockets
without an enveloping message structure.

Do you see that it is perfectly reasonable to need to deserialize a
file?

Whether or not the file is received by a socket does not matter.

The problem is that the typical deserialization code has embedded into
it a DoS.

>
>
> > See the parallels? gets() is broken by design. It should not be used
> > by anyone. So is the above deserialize_vector() function. Relying on
> > bad_alloc to be thrown is not good practice. Requiring serialization
> > to only be used by 'trusted connections' or 'trusted files' is
> > "troublematic".
>
> Who is relying on bad_alloc? Have you read my posts at all? The point
> is that some kind of exception will be thrown when the deserialization
> code has consumed all the data in the buffer. Hence you are safe and
> sound as long as you only pass in bounded, definite-EOF streams to the
> deserialization code.

and gets() is safe as long as you pass bounded definite-EOF streams to
it. It is still a problem, and it is asking for trouble. Typically a
library function is deserializing an object that contains more than
just a vector - and different parts of the object may have different
limits needing to be enforced!

It is poor design, and causes double the amount of memory and cpu
usage as necessary! It can do everything required, be safe, parse on
an as-needed basis without data copying, and can be controllable by
the application code nicely.

>
> Why is this so incredibly difficult to understand? How many times does
> it have to be repeated?
>

Yes! I will read your posts and you read mine.

--jeffk++
www.jdkoftinoff.com

It is loading more messages.
0 new messages