ProtoBuf and Multi-Threads

ksamdev

unread,

Mar 14, 2011, 1:08:49 PM3/14/11

to prot...@googlegroups.com

Hi,

I have a large set of files with a number of the same type messages saved. My code reads messages in a sequence from these files one after another. I've measured time (with terminal "time" command) of running the code, and get something like:

READING

=======

Read ProtoBuf

Processed events: 50000000

real 7m2.146s

user 5m25.545s

sys 0m31.959s

Then I've adjusted the code to read the files in threads (8 threads on 8 cores machine). The reading procedure is independent and put into separate class. Therefore each thread is really independent of the others. Nevertheless, the time measurement is:

READING (MULTITHREADS)

=====================

Thread read 6000000 events

Thread read 7000000 events

real 5m3.808s

user 5m42.301s

sys 0m35.221s

As you may see, the "user" as well as "real" time is pretty much the same.

So, it seems that there is some internal locks done somewhere. I only use locks between threads and class, that creates and manages threads. The locks are used only when thread finishes reading the file(s).

Does ProtoBuf use some sort of generic static/singleton functions/objects that are used to de-serialize messages and therefore lock when accessed form different threads? If so, is there a way to suppress this and get truly independent messages parsing?

thanks.

P.S. My code can be browsed on github: http://goo.gl/DXCCF . The reading of messages is done by: http://goo.gl/OsHV9

The code uses ROOT framework (root.cern.ch) if one wants to compile it.

Henner Zeller

unread,

Mar 14, 2011, 1:16:14 PM3/14/11

to prot...@googlegroups.com, ksamdev

No, there is no locking between threads being done. And Kenton would
certainly be opposed to Singletons :)

But did you check if you run into file IO limits ? How big are your
files ? 6 Million protocol buffers in 300 seconds is 50 microseconds,
that is very slow, or you have huge data. Are you doing other things,
besides parsing in this benchmark ?

> If so, is there a way to suppress this and get truly
> independent messages parsing?
> thanks.
> P.S. My code can be browsed on github: http://goo.gl/DXCCF . The reading of
> messages is done by: http://goo.gl/OsHV9
> The code uses ROOT framework (root.cern.ch) if one wants to compile it.
>

> --
> You received this message because you are subscribed to the Google Groups
> "Protocol Buffers" group.
> To post to this group, send email to prot...@googlegroups.com.
> To unsubscribe from this group, send email to
> protobuf+u...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/protobuf?hl=en.
>

ksamdev

unread,

Mar 14, 2011, 1:34:19 PM3/14/11

to prot...@googlegroups.com, ksamdev

Thanks for a quick reply.

Honestly, I fill a set of histograms for each event. I've added this feature only recently and have a version of the code without histograms.

Here is the same performance measurement without histograms:

READING

=======

Read ProtoBuf

Processed events: 1000000

real 0m2.510s

user 0m2.105s

sys 0m0.298s

---===---

READING (MULTITHREADS)

=====================

process files

init threads

start threads

run threads

Thread read 1000000 events

real 0m2.358s

user 0m2.085s

sys 0m0.236s

Again, the same situation.

My file is 384MB. I've already tested the use case with files above 1GB. It turs out that ProtoBuf has a "int" limitation on file size. Anyway, I am a way below the limit. The messages are pretty short (~400B).

ksamdev

unread,

Mar 14, 2011, 5:53:34 PM3/14/11

to prot...@googlegroups.com, ksamdev

I've added the synched cout wrapper and fixed C "float" function use. Eventually code started working as expected, for example, in cast of 8 cores computer the performance measurements are:

Generate 20 files with 100000 events in each

WRITING

=======

Generate ProtoBuf

real 0m15.608s

user 0m6.582s

sys 0m1.383s

READING

=======

Read ProtoBuf

Processed events: 2000000

real 0m6.992s

user 0m6.393s

sys 0m0.534s

READING (MULTITHREADS)

=====================

8: data_1.pb

7: data_10.pb

6: data_11.pb

5: data_12.pb

4: data_13.pb

3: data_14.pb

2: data_15.pb

1: data_16.pb

7: data_17.pb

5: data_18.pb

4: data_19.pb

6: data_2.pb

8: data_20.pb

1: data_3.pb

3: data_4.pb

5: data_5.pb

7: data_6.pb

6: data_7.pb

8: data_8.pb

1: data_9.pb

Thread read 200000 events

Thread read 100000 events

Thread read 200000 events

Thread read 300000 events

real 0m1.527s

user 0m7.877s

sys 0m0.432s

So, reading is ~4.66 times faster in the multi-threads case.

ksamdev

unread,

Mar 14, 2011, 6:27:22 PM3/14/11

to prot...@googlegroups.com, ksamdev

btw, ProtoBuf is really fast and easy to use. I like it.

Henner Zeller

unread,

Mar 14, 2011, 7:01:33 PM3/14/11

to prot...@googlegroups.com, ksamdev

Wait, where do you get the 1GB limit ? There is no real limit on the
file size (in fact, the protocol buffer library does not do anything
with files directly). (and: don't try to load everything in memory at
once as a gigantic repeated message, that will give you a lot of
malloc overhead).

Another thing that could be limiting your throughput in your library
is memory allocation: the system libraries are often not very good in
handling threads well; I am usually using tcmalloc (
http://code.google.com/p/google-perftools/ ) which is handling that
case pretty well.

Also: if you read a lot of messages, it is a good idea to Clear() and
re-use a message instead of allocating a new one (protocol buffers
attempt to re-use memory internally).

-h

ksamdev

unread,

Mar 15, 2011, 9:05:22 AM3/15/11

to prot...@googlegroups.com, ksamdev

I like the interest in the topic.

I've put 1GB to emphasize that the use case is safe. In fact, I save messages in file in next way:

XYXYXYXYXY.....

where X is the size of the message and Y is the message itself. Each message is read in the loop and overwritten. Clearly, I do not read the whole file (N GB's) into memory at once.

Now, with this technique, I can generate files with size larger than 2^31 (~ 2GB). The file is successfully written. Consider the case with 5 GB file. Unfortunately, whenever I start reading this 5 GB's file, ProtoBuf fails after 2^31 bytes are read. Of course, I have to push the limit of read bytes with:

CodedInputStream::SetTotalBytesLimit(int, int)

Pay attention at the arguments type: int . I suppose ProtoBuf uses bytes read counter or some internal file read position pointer that is also int and therefore fails whenever reading procedure passes the 2^31 threshold.

Thanks for the link to perftools. Like you mentioned, I reuse the message in my code. Therefore there is no overhead.

I guess, the problem was in the way I measured execution time. My command looked like:

time executable args && echo "-----" && time executable args

So, I've cut it into 3 pieces and time, that is shown on the screen, start make sense:

time executable args

echo ------

time executable args

Henner Zeller

unread,

Mar 15, 2011, 12:48:33 PM3/15/11

to prot...@googlegroups.com, ksamdev

On Tue, Mar 15, 2011 at 06:05, ksamdev <samvel.k...@gmail.com> wrote:
> I like the interest in the topic.
> I've put 1GB to emphasize that the use case is safe. In fact, I save
> messages in file in next way:
> XYXYXYXYXY.....
> where X is the size of the message and Y is the message itself. Each message
> is read in the loop and overwritten. Clearly, I do not read the whole file
> (N GB's) into memory at once.
> Now, with this technique, I can generate files with size larger than 2^31 (~
> 2GB). The file is successfully written. Consider the case with 5 GB file.
> Unfortunately, whenever I start reading this 5 GB's file, ProtoBuf fails
> after 2^31 bytes are read. Of course, I have to push the limit of read bytes
> with:
>
> CodedInputStream::SetTotalBytesLimit(int, int)
> Pay attention at the arguments type: int . I suppose ProtoBuf uses bytes
> read counter or some internal file read position pointer that is also int
> and therefore fails whenever reading procedure passes the 2^31 threshold.

You should just create a new CodedInputStream on the stack for each
message, that way you don't run into this limit and can read files as
large as you want.
(CodedInputStream is cheap to create, so it shouldn't influence your
performance numbers).

> Thanks for the link to perftools. Like you mentioned, I reuse the message in
> my code. Therefore there is no overhead.
> I guess, the problem was in the way I measured execution time. My command
> looked like:
> time executable args && echo "-----" && time executable args
> So, I've cut it into 3 pieces and time, that is shown on the screen, start
> make sense:
> time executable args
> echo ------
> time executable args
>

Samvel Khalatyan

unread,

Mar 15, 2011, 1:09:50 PM3/15/11

to Henner Zeller, prot...@googlegroups.com

Thanks for the clarification. Does it mean that my code should look like the example below?

fstream in("10_GB_file.pb", ios::in | ios::binary);

bool continue_reading = true

while(continue_reading)

{

::google::protobuf::io::IstreamInputStream raw_in(&in);

::google::protobuf::io::CodedInputStream coded_in(&raw_in);

// Read message and use it if read was successfull

if (! /* are there more messages left? */)

continue_reading = false;
}

Henner Zeller

unread,

Mar 15, 2011, 1:16:33 PM3/15/11

to Samvel Khalatyan, prot...@googlegroups.com

On Tue, Mar 15, 2011 at 10:09, Samvel Khalatyan
<samvel.k...@gmail.com> wrote:
> Thanks for the clarification. Does it mean that my code should look like the
> example below?
> fstream in("10_GB_file.pb", ios::in | ios::binary);
> bool continue_reading = true
> while(continue_reading)
> {
> ::google::protobuf::io::IstreamInputStream raw_in(&in);

The ::google::protobuf::io::IstreamInputStream raw_in can probably be
out of the loop. I haven't looked at the code closely, but I think it
is not limited (except an int64 offset, but that is right now big
enough...).

Just try it.

> ::google::protobuf::io::CodedInputStream coded_in(&raw_in);

... and this one should be in the loop, yes.

Reply all

Reply to author

Forward