On Mon, Aug 24, 2009 at 3:29 PM, Kenton Varda<ken...@google.com> wrote:
> Generally the most efficient way to serialize a message to stdout is:
> message.SerializeToFileDescriptor(STDOUT_FILENO);
> (If your system doesn't define STDOUT_FILENO, just use the number 1.)
> If you normally use C++'s cout, you might want to write to that instead:
> message.SerializeToOstream(std::cout);
Does the protobuf library buffer on the file descriptor? Or does it depend
on the OS level buffering, because given a file descriptor i guess it
uses "write" calls
and not fwrite.
I am opening stdout in binary mode, changing the buffer size (setvbuf)
and writing to that
if i give SerializeToFileDescriptor the file descriptor of this new
FILE* object, I guess it won't
use my buffer (I know fwrite uses write, but does write care for the
buffer of the FILE* object?).
> For small messages, it may be slightly faster to serialize to a string and
> then write that. But the difference there would be small, and if it matters
> to you we should probably just fix the protobuf library to do this
> optimization automatically...
I should point out that my messages will be in the kb and definitely
less than an MB.
You mention serializing to string. However I also see a method
"SerializeToArray" .
What is the difference?
To avoid repeated mallocs/free, I intend to keep one global
array(resizing if required)
, writing to that array and keeping a track of the bytes written and
writing th array out to the stream.
Since my app is not threaded, I do not have an issue of multiple
threads writing to that single array.
However if SerializeToFileDescriptor is still better than this
approach there is no need for this.
> All of these methods require that you write the size first if you intend to
> write multiple messages to the stream.
Yes, I will be writing the length first.
I should point out I haven't had much experience with write,fwrite so
my understanding might be incomplete.
Much thanks for advice
Regards
Saptarshi
Hello
I was thinking about this and had some questions
Does the protobuf library buffer on the file descriptor?
On Mon, Aug 24, 2009 at 3:29 PM, Kenton Varda<ken...@google.com> wrote:
> Generally the most efficient way to serialize a message to stdout is:
> message.SerializeToFileDescriptor(STDOUT_FILENO);
> (If your system doesn't define STDOUT_FILENO, just use the number 1.)
> If you normally use C++'s cout, you might want to write to that instead:
> message.SerializeToOstream(std::cout);
I am opening stdout in binary mode, changing the buffer size (setvbuf)
and writing to that
if i give SerializeToFileDescriptor the file descriptor of this new
FILE* object, I guess it won't
use my buffer (I know fwrite uses write, but does write care for the
buffer of the FILE* object?).
> For small messages, it may be slightly faster to serialize to a string andI should point out that my messages will be in the kb and definitely
> then write that. But the difference there would be small, and if it matters
> to you we should probably just fix the protobuf library to do this
> optimization automatically...
less than an MB.
You mention serializing to string. However I also see a method
"SerializeToArray" .
What is the difference?
To avoid repeated mallocs/free, I intend to keep one global
array(resizing if required)
, writing to that array and keeping a track of the bytes written and
writing th array out to the stream.
Since my app is not threaded, I do not have an issue of multiple
threads writing to that single array.
However if SerializeToFileDescriptor is still better than this
approach there is no need for this.
Yes, I will be writing the length first.
> All of these methods require that you write the size first if you intend to
> write multiple messages to the stream.
I tried a typical case (for me), creating R runif(N) object(once),
serialize using ProtoBufs, writing this out and repeating this M
times.
For N say, 125 *FD is better and for larger N(2000, about 15KB) to
String is better. However, i did notice about 10% improvement (not a
very rigorous experiment) for the FD method over *String method when
it came to right tiny messages (~1KB) 10MM(=M) times .
Surprisingly, the output to array is much slower than the other two.
Thanks for your input, it was really helpful.
Regards
Saptarshi
Hello,
Thanks much for the answers. I did perform some tests and your
statements hold true (marginal differences however)
i.e for small messages (~7kb), the FDescriptor method is faster than
SerializeToString. For larger messages the latter is faster.
Surprisingly, the output to array is much slower than the other two.