optimize_for option default

1986 views
Skip to first unread message

Kenton Varda

unread,
Mar 5, 2009, 6:39:40 PM3/5/09
to Protocol Buffers
Hi all,

As you know if you've read the docs carefully, when using C++ or Java protocol buffers, for best performance you need to add a line to your .proto files:

  option optimize_for = SPEED;

Otherwise, by default, the compiler optimizes for code size.  Optimizing for code size results in generated code that around a half to a third of the size, but runs an order of magnitude slower.  We make this the default because inside Google we've found that code bloat from lots of generated code is a serious problem, and we find that we only actually care about speed for a small minority of protocols.  However, now I'm worried that users who care about speed may be missing the documentation mentioning this attribute, and may as a result think protocol buffers are slower than they are.  Code size tends only to be a problem for users who have lots and lots of protocols, where most only have as small number.

So, tell me:  What should the default be?  I think we can pretty safely change the default to SPEED in the next version, but if we do, it will be less safe to change the default back to CODE_SIZE in the future.

Dave Bailey

unread,
Mar 6, 2009, 2:37:29 AM3/6/09
to Protocol Buffers
+1 for SPEED.

-dave

Jon Skeet <skeet@pobox.com>

unread,
Mar 6, 2009, 3:30:55 AM3/6/09
to Protocol Buffers
On Mar 5, 11:39 pm, Kenton Varda <ken...@google.com> wrote:
> As you know if you've read the docs carefully, when using C++ or Java
> protocol buffers, for best performance you need to add a line to your .proto
> files:
>
>   option optimize_for = SPEED;

<snip commentary>

I think there are three issues here:

1) Yes, it's really easy to miss that. Shortly after PBs were released
I saw a blog post showing how "slow" PBs are - and then I pointed out
the optimize_for option...
2) It's a pain to have to use a whole different .proto file just to
specify this option. While I believe many options *should* be in
the .proto file (particularly where they might affect individual
fields etc) I think this would make sense to have as a compiler/
generator flag (it could be in either place, for situations where the
two are split). For instance, you may have a memory-limited client
where speed doesn't matter, and a memory-rich server processing
gazillions of these things - they should be able to use the
same .proto file.
3) Backward compatibility.

I suspect we could really do with the compiler working in four
different modes:

1) Default to SPEED when otherwise undefined; obey proto file
otherwise
2) Default to CODE_SIZE when otherwise undefined; obey proto file
otherwise (current mode)
3) Generate code using SPEED regardless of proto file
4) Generate code using CODE_SIZE regardless of proto file

I think it would we should at least be able to specify "I want the old
behaviour" on the command line just because that makes the backward
compatibility story easy: "use this argument and it's all as it was" -
but I'd be happy for the actual default to be changed.

(Evil thought: make the default a build-time setting for the compiler
itself, so if you want to build protoc with the old behaviour you can.
Almost certainly not a good idea, but it's amusing to think of the
number of places this *could* be set...)

Jon

aepensky

unread,
Mar 6, 2009, 9:22:58 AM3/6/09
to Protocol Buffers
+1 for making it a compiler command-line option.

Pretty much all other IDLs get this wrong to some degree also.
Having annotations or options in the IDL file is nice, but make sure
they are only helping to define the message and the service, not the
implementation.
When I get a service definition from a service author I don't want to
be told how to optimize, or what namespace my generated classes should
go into.
Those things can be different for every client. As it is now, a
client developer would have to mark up the .proto file that s/he
received from the service developer.

- Alex

aepensky

unread,
Mar 6, 2009, 9:23:12 AM3/6/09
to Protocol Buffers

Jon Skeet <skeet@pobox.com>

unread,
Mar 6, 2009, 11:07:58 AM3/6/09
to Protocol Buffers
Obviously I agree about the optimisation, but why the namespace?
Surely the provider of the proto "owns" which namespace it should be
in, don't they?

Jon

aepensky

unread,
Mar 6, 2009, 11:46:44 AM3/6/09
to Protocol Buffers
On Mar 6, 11:07 am, "Jon Skeet <sk...@pobox.com>" <sk...@pobox.com>
wrote:
Why? It's a wire format. Surely someone could use the proto from a
language which doesn't even support namespaces.


aepensky

unread,
Mar 6, 2009, 11:55:19 AM3/6/09
to Protocol Buffers
Sorry, I realize that wasn't a very clear statement...

What I mean is, if there is an option which does not leave any
"fingerprint" in either the serialized message or the
FileDescriptorSet, so that you can't tell how the option was set by
looking at either of these, then the option is controlling only code
generation and is not affecting the service contract. So it should
not be in the .proto file.

I think that applies to the package statement as well as
optimize_for. Protocol Buffers does not put globally unique
signatures into the messages or descriptors based on your package
declaration. It only affects the code generation.

- Alex
Message has been deleted

Jon Skeet <skeet@pobox.com>

unread,
Mar 6, 2009, 12:46:58 PM3/6/09
to Protocol Buffers
It's definitely in the descriptor set - because that's what my C#
generator uses!

I agree that it doesn't affect the wire format of the messages
themselves, but I still think a world in which everyone who uses the
same package/namespace for the same proto for each language is a saner
one. (i.e. all Java users will see one package; all C# users will see
one namespace, etc. There can be differences between languages, but at
least two users of the same language have a common namespace).

It's certainly a personal thing, and again maybe you should be able to
*override* it from the command line, but I think it makes sense to at
least put "default" package/namespace options into the proto file.

Jon

Kenton Varda

unread,
Mar 6, 2009, 12:47:53 PM3/6/09
to Jon Skeet, Protocol Buffers
I agree that there should be a way to specify options on the command-line.  This applies to pretty much *all* options -- optimize_for, java_Package, ctype, etc.  It would even be useful to be able to munge package and class names on the command-line, so that you can generate the same .proto file using two different sets of options and not have the classes conflict.

I would like to create a general way to do this, but it's a slightly complicated problem.  One should be able to express fairly arbitrary things like "For all string fields, set option foo.".  I want to have an expressive syntax but don't want to go off the deep end in inventing a whole DSL for option munging.

For now I think I'm going to go ahead and change the default for this option in the next release.

danila.ermakov

unread,
Apr 24, 2009, 10:04:21 AM4/24/09
to Protocol Buffers
+1 for SPEED.
Reply all
Reply to author
Forward
0 new messages