A protocol message was rejected because it was too big ???

ksamdev

unread,

Mar 6, 2011, 12:19:23 PM3/6/11

to prot...@googlegroups.com

Hi,

I generate a huge number of the same messages and save them one by one in a file. Each message is generated and then saved on the fly. This way I do not keep in memory large array of messages, only one at a time. Everything works fine. The largest message written is about 2K (serialized string size).

Then I read these messages one by one from the file and use. I keep only one message in memory at a time again. Everything works fine if I have, say ~10e4 messages.

Once the number of saved messages is increased to something like 10e6 then I get warnings from ProtoBuf, like:

libprotobuf WARNING google/protobuf/io/coded_stream.cc:478] Reading dangerously large protocol message. If the message turns out to be larger than 67108864 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.

and then:

libprotobuf ERROR google/protobuf/io/coded_stream.cc:147] A protocol message was rejected because it was too big (more than 67108864 bytes). To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.

What might be wrong?

Here is my code (it is very short and simple):

Message: http://goo.gl/mzmTB

Write executable: http://goo.gl/SH41R

Writer (Output Wrapper): http://goo.gl/Fr0Rf

Read executable: http://goo.gl/UpC5i

Reader (Input Wrapper): http://goo.gl/zAeuU

The errors/warnings start if one changes 1e4 to 1e6 at: http://goo.gl/1IBZS

Thanks.

Evan Jones

unread,

Mar 6, 2011, 5:04:58 PM3/6/11

to prot...@googlegroups.com

On Mar 6, 2011, at 12:19 , ksamdev wrote:
> libprotobuf ERROR google/protobuf/io/coded_stream.cc:147] A protocol
> message was rejected because it was too big (more than 67108864
> bytes). To increase the limit (or to disable these warnings), see
> CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/
> coded_stream.h.

Protocol buffers limit the parsed size to 64 MB by default. You have
generated a very large message. You either need to set the limit
larger, or split your message into multiple messages. See:

http://code.google.com/apis/protocolbuffers/docs/techniques.html#large-data

http://code.google.com/apis/protocolbuffers/docs/reference/cpp/google.protobuf.io.coded_stream.html#CodedInputStream.SetTotalBytesLimit.details

Hope this helps,

Evan

--
http://evanjones.ca/

ksamdev

unread,

Mar 6, 2011, 6:29:05 PM3/6/11

to prot...@googlegroups.com

How come? I explicitly track the larges message written to the file with: http://goo.gl/SAKlU

Here is an example of output I get:

[1 ProtoBuf git.hist]$ ./bin/write data.pb && echo "---===---" && ./bin/read data.pb

Saved: 100040 events

Largest message size writte: 1815 bytes

---===---

File has: 100040 events

libprotobuf WARNING google/protobuf/io/coded_stream.cc:478] Reading dangerously large protocol message. If the message turns out to be larger than 67108864 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.

libprotobuf ERROR google/protobuf/io/coded_stream.cc:147] A protocol message was rejected because it was too big (more than 67108864 bytes). To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.

Read: 86209 events

Largest message read: 1815 bytes

[1 ProtoBuf git.hist]$

As you may see the largest message is only 1815 bytes (!). But due to the above error I can not read the rest of the messages.

It does not make sense.

ksamdev

unread,

Mar 6, 2011, 6:45:48 PM3/6/11

to prot...@googlegroups.com

I think I found the source of the problem. The problem is that CodedInputStream has internal counter of how many bytes are read so far with the same object.

In my case, there are a lot of small messages saved in the same file. I do not read them at once and therefore do not care about large messages, limits. I am safe.

So, the problem can be easily solved by calling:

CodedInputStream input_stream(...);

input_stream.SetTotalBytesLimit(1e9, 9e8);

My use-case is really about storing extremely large number (up to 1e9) of small messages ~ 10K each.

Evan Jones

unread,

Mar 7, 2011, 11:28:01 AM3/7/11

to prot...@googlegroups.com

On Mar 6, 2011, at 18:45 , ksamdev wrote:
> I think I found the source of the problem. The problem is that
> CodedInputStream has internal counter of how many bytes are read so
> far with the same object.

Ah, right. With the C++ API, the intention is that you will not reuse
the CodedInputStream, and instead it will be created and destroyed for
each message. It is very cheap to allocate / destroy if it is a local
variable.

In your case, you should do something like change your ::write method
to do:

CodedOutputStream out(_raw_out.get());
out.WriteVarint32(event.ByteSize());
event.SerializeWithCachedSizes(&out);

This will also save the extra copy that your code currently has. Hope

ksamdev

unread,

Mar 7, 2011, 1:03:13 PM3/7/11

to prot...@googlegroups.com

Hmm, thanks for the advice. It may work fine. Nevertheless, I have to skip previously read messages in this case every time CodedInputStream is read.

In fact, I faced different problem recently. It turns out I can write arbitrary long files, even 7GB. No problems.

Unfortunately, reading does not work out after 2^31 bytes are read. Is there a way around?

Evan Jones

unread,

Mar 7, 2011, 1:48:22 PM3/7/11

to prot...@googlegroups.com

On Mar 7, 2011, at 13:03 , ksamdev wrote:
> Hmm, thanks for the advice. It may work fine. Nevertheless, I have
> to skip previously read messages in this case every time
> CodedInputStream is read.

Not true: Creating a CodedInputStream does not change the position in
the underlying stream. Your code can easily look like:

while (still more messages to read) {
CodedInputStream in(&input_stream);
in.Read*
...
msg.ParseFromCodedStream();
}

This creates and destroys the CodedInputStream for each message, which
is efficient.

> Unfortunately, reading does not work out after 2^31 bytes are read.
> Is there a way around?

You will need to destroy and re-create the CodedInputStream object. If
you don't want to do it for each message, you need to at least do it
occasionally.