No, there is no locking between threads being done. And Kenton would
certainly be opposed to Singletons :)
But did you check if you run into file IO limits ? How big are your
files ? 6 Million protocol buffers in 300 seconds is 50 microseconds,
that is very slow, or you have huge data. Are you doing other things,
besides parsing in this benchmark ?
> If so, is there a way to suppress this and get truly
> independent messages parsing?
> thanks.
> P.S. My code can be browsed on github: http://goo.gl/DXCCF . The reading of
> messages is done by: http://goo.gl/OsHV9
> The code uses ROOT framework (root.cern.ch) if one wants to compile it.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Protocol Buffers" group.
> To post to this group, send email to prot...@googlegroups.com.
> To unsubscribe from this group, send email to
> protobuf+u...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/protobuf?hl=en.
>
Wait, where do you get the 1GB limit ? There is no real limit on the
file size (in fact, the protocol buffer library does not do anything
with files directly). (and: don't try to load everything in memory at
once as a gigantic repeated message, that will give you a lot of
malloc overhead).
Another thing that could be limiting your throughput in your library
is memory allocation: the system libraries are often not very good in
handling threads well; I am usually using tcmalloc (
http://code.google.com/p/google-perftools/ ) which is handling that
case pretty well.
Also: if you read a lot of messages, it is a good idea to Clear() and
re-use a message instead of allocating a new one (protocol buffers
attempt to re-use memory internally).
-h
You should just create a new CodedInputStream on the stack for each
message, that way you don't run into this limit and can read files as
large as you want.
(CodedInputStream is cheap to create, so it shouldn't influence your
performance numbers).
> Thanks for the link to perftools. Like you mentioned, I reuse the message in
> my code. Therefore there is no overhead.
> I guess, the problem was in the way I measured execution time. My command
> looked like:
> time executable args && echo "-----" && time executable args
> So, I've cut it into 3 pieces and time, that is shown on the screen, start
> make sense:
> time executable args
> echo ------
> time executable args
>
The ::google::protobuf::io::IstreamInputStream raw_in can probably be
out of the loop. I haven't looked at the code closely, but I think it
is not limited (except an int64 offset, but that is right now big
enough...).
Just try it.
> ::google::protobuf::io::CodedInputStream coded_in(&raw_in);
... and this one should be in the loop, yes.